当前位置：首页 > 报告详情

大模型智能体能力对齐.pdf

上传人：哆哆编号：186292 2024-11-01 PDF PDF 20页 7.07MB

该报告所属合集： 2024第二十一届自然语言处理青年学者研讨会嘉宾演讲PPT合集

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/20

立即下载

《大模型智能体能力对齐.pdf》由会员分享，可在线阅读，更多相关《大模型智能体能力对齐.pdf（20页珍藏版）》请在三个皮匠报告上搜索。

1、Aligning Agents to Follow Human ValuesTao GuiFudan University2024年6月YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP20242Fudan NLP LabWhere Is the Attention?Zhao,Wayne Xin,et al.A survey of large language models.arXiv preprint arXiv:2303.18223(2023).YSSNLP2024YS

2、SNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP20243Fudan NLP LabWhere Is the Attention?Achiam,Josh,et al.Gpt-4 technical report.arXiv preprint arXiv:2303.08774(2023).Ganguli,Deep,et al.Predictability and surprise in large generative models.Proceedings of the 2022 ACM Conf

3、erence on Fairness,Accountability,and Transparency.2022.YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP20244Fudan NLP LabExciting and Perilous Journey toward AGISelf-preservationSpecific versus General Principles for Constitutional AI AnthropicThe exciting,peri

4、lous journey toward AGI Ilya SutskeverSelf-identityYSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP20245Fudan NLP Lab Responsible Scaling Policy(RSP)AnthropicNo meaningful catastrophic riskEarly signs of dangerous capabilitiesSubstantially increase the risk of c

5、atastrophic misuseNot yet defined as it is too far from present systemsYSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP20246Fudan NLP LabTraining language models to follow instructions with human feedbackWhat is Alignment?H He el lp pf fu ul lFollow instructions

6、Ask relevant follow-up questions and obtain necessary detailsRe-direct ill-informed requestsH Ho on ne es st tKnow who it is,and what can/cannot it do/knowH Ha ar rmml le es ss sRefuse inappropriate requestsYSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP2024YSSNLP20247Fu

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

本文主要探讨了大型语言模型在遵循人类价值观方面的研究进展。文章提到了一些关键的技术和概念，如大型语言模型的调查、注意力机制、预测性和惊喜、以及宪法AI的原理。同时，还讨论了如何通过强化学习来训练语言模型遵循指令，并介绍了相关的研究数据和分析。此外，文章还涉及了大型语言模型在工具学习中的偏差问题，以及如何通过过程监督和编译器反馈来改进数学推理和代码生成。最后，文章还介绍了一个用于在线交互式训练和评估大型语言模型代理的平台AgentGym，并展示了一些在不同环境中使用该平台进行训练和评估的案例。

"如何确保大型语言模型的人类价值观对齐？" "大型语言模型在自我保护和自我认同方面有何进展？" "如何通过反向课程强化学习训练大型语言模型进行推理？"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询网站充值下载问题

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙景略智创信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备17000430号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠