10-Keeping it small - Agentic workflows with SLMs on k8s-Frank Fan.pdf

上传人：张**

编号：621016

2025-03-31

PDF 24页 1.57MB

《10-Keeping it small - Agentic workflows with SLMs on k8s-Frank Fan.pdf》由会员分享，可在线阅读，更多相关《10-Keeping it small - Agentic workflows with SLMs on k8s-Frank Fan.pdf（24页珍藏版）》请在三个皮匠报告上搜索。

1、Keeping it Small:Agentic Workflows with SLMs on K8SFrank Fan-Senior Container Solution Architect,AWSAddress challenges of Agentic workload010302Key takeaways04Multi-agent workflowsImplementation on k8sAddress challenges of Agentic workloadPart 01What is a Gen AI AgentAccess to enterprise dataAbility

2、 to use toolsIntelligent,autonomous systems Plan,reason,and act.it leads to challengesCoding gets complicated Complex prompts to limit hallucinations Fragile,hard to maintainAgent gets confused Calling wrong tools Passing wrong arguments Inconsistent responsesAgent gets slower and more expensive Fro

3、ntier models needed Prompt sizes grow Agents retry stepsCostAccuracyComplexityHints to improve cost,performance and accuracyShorter promptsSmaller LLMsControl over workflowConcise prompts/contextCost effective chipsetsFast ChipsetsTask Decomposition TechniqueAI Chips ChoiceMulti-agent workflows Part

4、 028Static WorkflowAgent&workflowsI am an“agent”specializing in a task or just to coordinate(again specialized!)Meta-AgentUserUserDynamic WorkflowWhy Small Language Model?CustomizeResourceSpeedFrugal Architecture Design PatternsTask DecompositionCascaded LLMTeacher-Student LLMRLAIF with Large LLMsOn

5、e Large LLMTotal TaskSmall LLMTask-ASmall LLMTask-BSmall LLMTask-CwhenneededwhenneededSmall LLMUsersLarge LLMfeedbackFine-tuned/SpecializedLarge LLMSmall LLMPre-post processing for Task SimplificationContextJSONNatural LanguageSmall LLMClassical MLSmall LLMSupervisedLearningReinforcementLearningUsin

6、g Small and Large LLMs FrugallyWorkflow:Routing Example Routing Implementation on K8SPart 03Why self-host Language Model on K8SData privacy and securityConnecting to data sourcesAccessing multiple models and newer versionsCustomizing NVIDIA GPUsInferentiaTrainiumNeuron RuntimeRunning Agentic workloa

10-Keeping it small - Agentic workflows with SLMs on k8s-Frank Fan.pdf

相关报告