《爱立信在实际应用中的大型语言模型实践.pdf》由会员分享,可在线阅读,更多相关《爱立信在实际应用中的大型语言模型实践.pdf(12页珍藏版)》请在三个皮匠报告上搜索。
1、Measuring the Quality of GenAISystemsLiang YuGenAI:Generative Artificial IntelligenceA developer works at Ericsson Mobile Financial Services(EMFS)A researcher at BTH(50%)Our Journey Dec 2022 to present New use cases are emerging.How to measure those emerging use cases?GenAIUse Case Use Case Use Case
2、 Use Case Use Case Use Case Use Case Use Case Use CaseAI adoptionYear2022NLP modelPoCingScenarioExperimentsLLMsAPIsUse CaseUse CaseUse Case Use Case Use Case Use CaseTry out20232024NLP:Natural Language Processing;LLM:Large Language Model2You cant manage what you cant measure.Peter Drucker3Our resear
3、chMapping metrics to ISO quality characteristics.4Research finding(1/2)Metrics can be used to measure ISO quality characteristics.5Metric:a quantifiable measure to assess how well a system performs.Research finding(2/2)Baseline:the foundation for evaluating LLMs outputs before implementing any adapt
4、ation or enhancements.6ProcessProcess:Identify metrics and metric data.BaselineIntegrationAI SystemsMonitoringAI System Integration:Extend the baseline to enterprise datasets with industrial context.Monitoring:Dashboard with quantifiable metric scores to ensure the AI systems perform well.Industry-D
5、ataMeasurable elementsInfrastructure LLMsBenchmarksAPIsSkills QualityDataTextFilesCodeImagesInferenceIngestTrainRunTuneGenAI quality measurementStandalone LLMPre-trained dataModel parameters7Industry-use casesCollected use casesCustomer service chatbotProduct information retrievalBusiness communicat
6、ionPersonalized marketingProduct design-rulesCode review and improvementsKnowledge base for onboarding&training Personalized assistants Measurable elementsMeasurement areasInfrastructure ProductivityIndustrial contextsLLMsBenchmarksAPIsSkills QualityDataTextFi