《张驰-AppAgentX可进化GUI智能体框架.pdf》由会员分享,可在线阅读,更多相关《张驰-AppAgentX可进化GUI智能体框架.pdf(49页珍藏版)》请在三个皮匠报告上搜索。
1、AppAgentX可进化GUI智能体框架张驰|西湖大学AGI实验室 助理教授张驰 博士连续两年入选美国斯坦福大学发布的全球前2顶尖科学家榜单2021年博士毕业于新加坡南洋理工大学计算机学院,博士毕业后加入腾讯公司担任研究科学家。2024年秋加入西湖大学担任博导,并创建 AGI 实验室。博士研究课题专注于高效低成本的机器学习和计算机视觉算法,目前的研究方向是大模型时代下的多模态模型与AIGC模型。曾在CVPR、ICCV、NeurIPS、TPAMI等顶级国际会议期刊上发表多篇学术论文。目 录CONTENTSI.背景:大语言模型与 AgentII.基于 GUI 的Agent:AppAgentIII.
2、可进化的GUI Agent:AppAgentX大语言模型作为智能体AgentPrompt:Imagine that you are a home robot,and is asked to go to the kitchen to fetch something from the fridge.The image below shows your current position.Please plan your next action.GPT-4V:My next action would be turn right and move forward towards the hallway,
3、as I can see that the kitchen is likely located in that direction.I would then continue to navigate through the hallway until I reach the kitchen and locate the fridge.大语言模型作为智能体AgentPrompt:Now you are at the kitchen,this is what you can see from there.Can you see the fridge from here?What is your n
4、ext action?GPT-4V:From this position,I can see the fridge on the right side of the kitchen.My next action would be to move forward and turn slightly to the right to approach the fridge大语言模型作为智能体AgentPrompt:Imagine that you are a robot operating a computer You are asked to find a detailed recipe of M
5、apo Tofu.Below is what you see on the computer screen,predict your next moveGPT-4V:Next move:I would move the mouse to the Google Chrome icon located on the left side of the screen,approximately 1/3 of the way down from the top.I would then click on the icon to open the web browser.A Simple Agent Pr
6、ototype决 策感 知执 行挑战Limitation of text-only input in current agent designControl alone is a big disciplineInsufficient knowledge in LLM&high learning costSignificance of InterfaceAgent 作用在用户界面上(UI)Advantages Actions have clear semantic meaning,providing a user-friendly interaction method Large languag