1、多模态大语言模型中的上下文学习杨旭|东南大学杨旭东南大学计算机学院副教授/博导杨旭博士2021年6月从南洋理工大学计算机科学与技术系获工学博士学位,导师为蔡剑飞,张含望教授。现为东南大学计算机科学与工程学院、软件学院、人工智能学院副教授。新一代人工智能技术与应用教育部重点实验室副主任,江苏省双创博士。主要研究方向为多模态视觉语言任务,基于多模态大语言模型的上下文学习。在过去的3年内,以第一作者身份在人工智能顶级会议期刊发表论文多篇,包括 TPAMI,CVPR,ICCV,NeurIPS 等。目 录CONTENTSI.BackgroundII.Diverse Configuration Strat
2、egiesIII.Shift Vector-based ICL ApproximationIV.Multi-Modal Reasoning EnhancementBackgroundPART 01The Development of GPT7GPT(2018)1.5B ParametersPrompt EngineeringGPT-2(2019)175B ParametersIn-context LearningGPT-3(2020)GPT-4(2023)1324DataDataPre-training Fine-tuningPromptImageTextMultimodal VideoIn-
3、contextExamples117M Parameters1.76T ParametersMultimodalGPT-2s Capability of Prompt Engineering 8 GPT-2 exhibits a distinctive feature known as“prompt engineering”.This can be compared to the architecture of modern computers,where both data and commands exist in the form of 0s and 1s encoding.GPT-3s
4、 Capability of Analogy:In-Context Learning9 GPT-3 possesses a unique capability known as“In-context learning”.It will learn the representation of tasks from the provided in-context examples.In-Context Learning Prompt EngineeringYield precise responsesUnlock the potential of LLMsA specialized prompt
5、engineeringAdapt to a task using a few examplesfew shotWhy In-Context Learning?10outside-in methodologies to unravel the inner properties of LLMsPros of ICLHow many meters does a 1-kilogram object fall in 1 second?4.9 mObjects fall with a constant acceleration due to gravity,regardless of their mass
6、.What about 10-kilogram?4.9 mProviding incorrect examples does not affect the LLMs ability to make correct judgments.Positive“Best movie ever.”Sentiment:Positive.“I like it.”Sentiment:?Positive“Best movie ever.”Sentiment:Negative.“I like it.”Sentiment:?Flexible controllability Encapsulate more infor