《4147 - 一个代理一项任务更优秀的AI.pdf》由会员分享,可在线阅读,更多相关《4147 - 一个代理一项任务更优秀的AI.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、Orlando,FLOctober 69IBM TechXchange 2025Session code(4147)One agent,one job,better AIDavid Jones-GilardiLangflow,Developer Relations EngineerOne agentOne jobBetter AIIm David,GenAI nerdI love learning,coding,and helping others do the sameIBM TechXchange|2025 IBM Corporation2Agenda0102030405Problem s
2、paceWhat are evals?What can I use them for?Demos+CodeWrap up+ResourcesIBM TechXchange|2025 IBM CorporationProblem SpaceIBM TechXchange|2025 IBM Corporation4“Kitchen sink”&Overly complex promptsIBM TechXchange|2025 IBM Corporation5Break tasks down into digestible chunksEach agent is specialized to a
3、taskMulti-AgentIBM TechXchange|2025 IBM Corporation6One agent,ALL instructionsMultiple agents,each specializedSingle AgentMulti-AgentIBM TechXchange|2025 IBM Corporation7One agent,ALL instructionsMultiple agents,each specializedSingle AgentMulti-AgentIBM TechXchange|2025 IBM Corporation8One agent,AL
4、L instructionsMultiple agents,each specializedSingle AgentMulti-AgentIBM TechXchange|2025 IBM Corporation9Single AgentMulti-AgentIBM TechXchange|2025 IBM Corporation10IBM TechXchange|2025 IBM Corporation11What are evals?Evals are structured tests that measure the quality,reliability,and effectivenes
5、s of agentic applications(by assessing how well they perform their intended tasks)IBM TechXchange|2025 IBM Corporation12IBM TechXchange|2025 IBM Corporation13What can I use them for?Tracking regressions&errorsTesting agent effectiveness“Judging”agent responses(LLM-as-Judge,Human-in-the-Loop)Model to
6、 model comparisonsGenerating consistent and measurable resultsEvals are good forIBM TechXchange|2025 IBM Corporation14LangSmithArizeLLM-as-a-JudgeIBM TechXchange|2025 IBM Corporation15IBM TechXchange|2025 IBM Corporation16Demos+CodeIBM TechXchange|2025 IBM Corporation17IBM TechXchange|2025 IBM Corpo