1、大模型智能体能力对齐与超越Tao GuiFudan University2024/10/152Fudan NLP LabWhat is An Agent?If they find a parrot who could answer to everything,I would claim it to be an intelligent being without hesitation.Denis Diderot,1875Agent in PhilosophyAgency-individuality-asymmetry-normativityGenerally Speaking:Entities
2、with the capacity to act.Narrowly Speaking:Entities possessing desires,beliefs,intentions,and the ability to take actions.3Fudan NLP Lab3OpenAIs Mission&Goalhttps:/ thus building a living metric which measures how well an agentcan achieve its users intended goal in a wide range of environments.4Fuda
3、n NLP LabWhat is AI Agent?Agents:Artificial entities that are capable of perceiving their surroundings using sensors,making decisions,and then taking actions in response using actuators.Perceiving surroundingsMaking decisionsTaking actions1 Russell,S.J.Artificial intelligence a modern approach.Pears
4、on Education,Inc.,2010.2 Wooldridge,M.J.,N.R.Jennings.Intelligent agents:theory and practice.Knowl.Eng.Rev.,10(2):115152,1995.为谁服务?5Fudan NLP Lab56Fudan NLP Lab67Fudan NLP LabTraining language models to follow instructions with human feedbackWhat is Alignment?HelpfulFollow instructionsAsk relevant f
5、ollow-up questions and obtain necessary detailsRe-direct ill-informed requestsHonestKnow who it is,and what can/cannot it do/knowHarmlessRefuse inappropriate requests8Fudan NLP LabTraining language models to follow instructions with human feedbackTwo Steps of RLHF AlignmentAlignmentTrainingPreferenc
6、esModeling9Fudan NLP LabHard for Alignment TrainingLanguage Environment;Reward Design;Optimization Algorithm10Fudan NLP Lab1.Evaluation Metrics for Monitor Training Process2.Implement Details in PPO3.PPO-max SetupPPO Max for Stable TrainingTechnical reportSecrets of RLHF in Large Language Models Par