《衡量重要事项:以质量为中心的生产 AI 代理监控.pdf》由会员分享,可在线阅读,更多相关《衡量重要事项:以质量为中心的生产 AI 代理监控.pdf(63页珍藏版)》请在三个皮匠报告上搜索。
1、Measure What MattersQuality-Focused Monitoring for Production Al AgentsEric Peter,Niall TurbittJune 11,2025AgendaApplying software development best practicesWalk through the lifecycle of a production agentMLflow 3 overviewNext steps/try it yourselfChallenges business value with GenAIIs my agent deli
2、vering accurate answers?Challenge#1Low quality answers create riskExample:support agent for a major food delivery serviceBad customer experience higher churnMore escalations to humans increased cost“Where is my food?”AI Agent“Your food is delayed,but”How do I improve my agentsaccuracy?Challenge#2How
3、 do I get my software app to work reliably?10 years agoSoftware has a well-oiled formula 7Write&run code locallyUnit testsQA testingProduction telemetrybut GenAI isnt a deterministic system 8User inputs evolve without warningDomain expertise required to assess output qualityMust trade-off between qu
4、ality&cost/latencyVibe checks arent enough9+t Write&run code locallyUnit testsQA testingPrompt engineer&vibe checkGenAIProduction telemetryVibe checks arent enough10Classical SoftwareWrite&run code locallyUnit testsQA testingPrompt engineer&vibe checkVibe ChecksA GProduction telemetryProduction tele
5、metry w/evalsEvalsRedesigned for the GenAI eraIntegrated with Agent EvaluationProduction grade scalabilityANNOUNCING3Customer Support Agentcustomer satisfaction score%first call resolutionDemo2025 Databricks Inc.All rights reservedMulti-agent Human AgentSolution designAccount AgentPulls customer acc
6、ount informationBilling AgentPulls billing&usage data for accountsProduct AgentPulls plans,devices,and promosTech Support AgentQ&A for technical support&outagesSupervisorRoutes between agents2025 Databricks Inc.All rights reservedMulti-agent Human AgentGenAI App LifecycleAccount AgentPulls customer