《黑洞和TT-金属——独立的人工智能计算机及其编程模型.pdf》由会员分享,可在线阅读,更多相关《黑洞和TT-金属——独立的人工智能计算机及其编程模型.pdf(30页珍藏版)》请在三个皮匠报告上搜索。
1、Blackhole&TT-MetaliumThe Standalone AI Computer and its Programming ModelAugust 2024Jasmina Vasiljevic,Senior FellowDavor Capalija,Senior FellowAgenda Architecture Micro-architecture Scale-out Software3AI Silicon RoadmapStandalone AI ComputerWormhole 80 Tensix+Tensix+Cores 12nm 328 TOPS(FP8)336 GB/s
2、 GDDR6 Gen4x16 16x100 Gbps Ethernet 16x100 Gbps Ethernet Blackhole 140 Tensix+Tensix+Cores 6nm 745 TOPS(FP8)512 GB/s GDDR6 Gen5x16 10 x400 Gbps Ethernet10 x400 Gbps Ethernet 16 RISC16 RISC-V CPU coresV CPU coresNetworked AI ProcessorAI ProcessorGrayskull 120 Tensix Cores 12nm 276 TOPS(FP8)100 GB/s L
3、PDDR4 Gen4x16202120222023GEN 1GEN 1GEN 2High Perf AI ASICScalabilityHeterogeny4Tensix coresDRAM coresETH coresPCIe coreARC coreTDEPABlackhole -A Standalone AI ComputerCRISC-V CPUs5Blackhole -A Standalone AI ComputerFeatureSpecTensix745 TFLOPs(8-bit)372 TFLOPs(16-bit)SRAM241 MBsEthernet10 x 400 GbpsD
4、RAM512 GB/s BW32 GBs capacityBaby RISC-Vs752Big RISC-Vs16PCIeGen5x16,64 GB/sNoC2 NOCs2D Torus256 B per core6Big RISC-V&Baby RISC-VCRISC-V CPUsFeatureSpecRISC-V CPUsx16(4 clusters of 4)Compute64-bit,dual-issue,in-order L3 cache2 MB/CPUL2 cache128 KB/CPUL1 I-cache32 KB/CPU(2 way associative)L1 D-cache
5、32 KB/CPU(4 way associative)Runs LinuxOn-device host for the AI acceleratorFeatureSpecTotal Baby RISC-Vs752Compute32-bitInt multiplier/dividerFloating point(FP32/BFLOAT16)128-bit vector(1 per Tensix)I-cache4 KBD-scratch8 KBBaby RISC-VTensix coresDRAM coresETH coresTDEBig RISC-V752 Baby RISC-Vs16 Big
6、 RISC-VsMicro-Architecture:All RISC-V ProgrammableAll RISC-V ProgrammableBaby RISC-Vs8Tile MathEngineRISC-VRouterDRAMBank controllerETHcontrollerVectorMathEngineComputeData MovementStorageRISC-VRISC-VRISC-VRISC-VuserkerneluserkerneluserkerneluserkerneluserkernelFeatureSpecTotal Baby RISC-Vs752Comput