4147 - 一个代理一项任务更优秀的AI.pdf

上传人：竿***

编号：982609

2025-11-29

PDF 24页 2.17MB

《4147 - 一个代理一项任务更优秀的AI.pdf》由会员分享，可在线阅读，更多相关《4147 - 一个代理一项任务更优秀的AI.pdf（24页珍藏版）》请在三个皮匠报告上搜索。

1、Orlando,FLOctober 69IBM TechXchange 2025Session code(4147)One agent,one job,better AIDavid Jones-GilardiLangflow,Developer Relations EngineerOne agentOne jobBetter AIIm David,GenAI nerdI love learning,coding,and helping others do the sameIBM TechXchange|2025 IBM Corporation2Agenda0102030405Problem s

2、paceWhat are evals?What can I use them for?Demos+CodeWrap up+ResourcesIBM TechXchange|2025 IBM CorporationProblem SpaceIBM TechXchange|2025 IBM Corporation4“Kitchen sink”&Overly complex promptsIBM TechXchange|2025 IBM Corporation5Break tasks down into digestible chunksEach agent is specialized to a

3、taskMulti-AgentIBM TechXchange|2025 IBM Corporation6One agent,ALL instructionsMultiple agents,each specializedSingle AgentMulti-AgentIBM TechXchange|2025 IBM Corporation7One agent,ALL instructionsMultiple agents,each specializedSingle AgentMulti-AgentIBM TechXchange|2025 IBM Corporation8One agent,AL

4、L instructionsMultiple agents,each specializedSingle AgentMulti-AgentIBM TechXchange|2025 IBM Corporation9Single AgentMulti-AgentIBM TechXchange|2025 IBM Corporation10IBM TechXchange|2025 IBM Corporation11What are evals?Evals are structured tests that measure the quality,reliability,and effectivenes

5、s of agentic applications(by assessing how well they perform their intended tasks)IBM TechXchange|2025 IBM Corporation12IBM TechXchange|2025 IBM Corporation13What can I use them for?Tracking regressions&errorsTesting agent effectiveness“Judging”agent responses(LLM-as-Judge,Human-in-the-Loop)Model to

6、 model comparisonsGenerating consistent and measurable resultsEvals are good forIBM TechXchange|2025 IBM Corporation14LangSmithArizeLLM-as-a-JudgeIBM TechXchange|2025 IBM Corporation15IBM TechXchange|2025 IBM Corporation16Demos+CodeIBM TechXchange|2025 IBM Corporation17IBM TechXchange|2025 IBM Corpo

4147 - 一个代理一项任务更优秀的AI.pdf

相关报告