1、WEBNN,WEB 端侧推理的未来胡宁馨 胡宁馨 张敏 张敏 英特尔 SATG Web 平台工程英特尔 SATG Web 平台工程2023 年 12 月2023 年 12 月WebML 客户端推理的优势隐私摄像头、麦克风等传摄像头、麦克风等传感器数据保留在设备感器数据保留在设备中中离线初始资源缓存并离线初始资源缓存并离线后,不再依赖网络后,不再依赖网络延迟无云端网络问题,浏无云端网络问题,浏览器实时推理览器实时推理成本无需云端算力支持无需云端算力支持0 安装浏览器中运行,无需浏览器中运行,无需额外安装,并易于共额外安装,并易于共享享跨平台在几乎所有平台上运在几乎所有平台上运行 AI 应用行 A
2、I 应用WebML 客户端推理突发的突发的延迟敏感延迟敏感持续的持续的电量敏感电量敏感周期的周期的吞吐量敏感吞吐量敏感多样的客户端 AI 场景,多种满足需求的计算单元多样的客户端 AI 场景,多种满足需求的计算单元CPUCPU无处不在无处不在低延迟,单一推理任务低延迟,单一推理任务GPUGPU高并行性,高 batch size高并行性,高 batch size与 3D/渲染/媒体管道集成与 3D/渲染/媒体管道集成NPUNPU专用低功耗AI加速器专用低功耗AI加速器高能耗比,提升电源效率高能耗比,提升电源效率Web 开发者的需求The web needs The web needs its o
3、wnits ownneural networksneural networksspecificationspecification to leverage to leverageApple Silicon,TensorApple Silicon,TensorCores,and others.Cores,and others.“Delighted to find theDelighted to find theworking drafts of WebNN.working drafts of WebNN.Incredible new powerIncredible new powerunlock
4、ed for the free,openunlocked for the free,openand competitive Web!and competitive Web!“Native Tensor supportNative Tensor support!Would be amazing to haveWould be amazing to haveTensor objects and opsTensor objects and opsbuilt into Chrome,andbuilt into Chrome,andavailable as an“ML API”.available as
5、 an“ML API”.“Although some scientificAlthough some scientificcomputing libraries existcomputing libraries existfor JS/TS,having for JS/TS,having built-inbuilt-insupportsupport would be far more would be far moredesirable!desirable!“If go through the code ofIf go through the code ofutils,maths,audio,
6、tensorutils,maths,audio,tensorin JS,it is annoying that Iin JS,it is annoying that Ihad to implement thesehad to implement theseops myself in JS.ops myself in JS.“llama2-7b in the browser llama2-7b in the browser using WebNNusing WebNN is going to is going tobe on-device,localbe on-device,localML cc