1、2/16/251Efficient Deployment of Large Language Models on Resource Constrained Edge Computing PlatformsYiyu Shi,Ph.D.Professor,Dept.of Computer Science and Engineering,Site Director,NSF I/UCRC on Alternative and Sustainable Intelligent Computing,University of Notre Dame yshi4nd.edu11The Success of La
2、rge Language ModelsChemistryMedicineMathBusinessAnalyticsHosted on Cluster2“As models scale,they approach or surpass task-specific baselines,showing promise as universal systems for natural language understanding”-By Scaling Law from OpenAI22/16/252LLM is powerful,butOfflineData PrivacyAI Centraliza
3、tion(Fairness)CustomizationVision:LLM hosted on cluster can achieve many tasks,but is compromised by certain concerns:Offline Internet is unavailable/unstable,but real-time reaction is required(suicide detection,auto-drive)Data Privacy Medical history,personal informationAI Centralization Only large
4、 corps can own models,data,and computational resources(clusters)Customization LLM needs to adapt users with distinct situations33Edge-based LLM can be a solution“Data in local”“Model weights in local”“Free from Internet”“Customize the LLM via local data”LLM deployed on the edge device can avoid thes
5、e concerns.Microsofts Phi model,has successfully demonstrated the power of edge-friendly LLM442/16/253Gap Between LLM and Edge DevicesGAPLLM is growing much faster than the upgrade of edge devicesChallenges:Computation complexityMemory capacityEnergy efficiency55A Successful Edge LLM should be able
6、to Tradeoff:Use resource wisely among model weights and user data during training/inferencePersonalization:Generate user-preferred/related responseRobustness:Continuously growing performance over experienceHandle out-of-distribution scenarios 662/16/254Build Up Efficient LLM on Edge Devices Edge LLM