《OpenAI:2025 ChatGPT Agent技术报告(英文版)(42页).pdf》由会员分享,可在线阅读,更多相关《OpenAI:2025 ChatGPT Agent技术报告(英文版)(42页).pdf(42页珍藏版)》请在三个皮匠报告上搜索。
1、ChatGPT Agent System CardOpenAIJuly 17,20251Contents1Introduction42Standard Model Safety Evaluations42.1Disallowed Content.42.2Jailbreaks.52.3Hallucinations.62.4Image Input.72.5Multilingual Performance.72.6Fairness and Bias.82.6.1BBQ Evaluation.82.6.2First-person fairness evaluation.92.7Jailbreaks t
2、hrough User Messages.93Product-Specific Risk Mitigations103.1Prompt injections.103.1.1Risk Description.103.1.2Mitigations and Evaluations.113.1.2.1Safety training.113.1.2.2Automated monitors and filters.113.1.2.3User confirmations.113.1.2.4“Watch mode”for ChatGPT agent using the visual browser tooli
3、n sensitive contexts.123.1.2.5Terminal network restrictions.123.1.2.6ChatGPTs memory is disabled.123.2Agent makes a mistake.123.2.1Risk Description.123.2.2Mitigations.123.2.2.1User confirmations.1213.2.2.2“Watch mode”for ChatGPT agent using the visual browser toolin sensitive contexts.133.2.3Evaluat
4、ions.133.3User asks agent to do a harmful or disallowed task.133.3.1Risk Description.133.3.2Mitigations.133.3.2.1Safety training.143.3.2.2Watch mode.143.3.2.3Usage Policy Enforcement.144Red Teaming145Preparedness Framework155.1Capabilities Assessment.155.1.1Biological and Chemical.165.1.1.1Long-form
5、 Biological Risk Questions.175.1.1.2Multimodal Troubleshooting Virology.175.1.1.3ProtocolQA Open-Ended.185.1.1.4Tacit Knowledge and Troubleshooting.185.1.1.5Structured expert probing campaign novel design.185.1.1.6SecureBio External Assessment.195.1.1.6.1Static Evaluations.205.1.1.6.2Agent Evaluatio
6、ns.205.1.1.6.3Manual Red Teaming.215.1.1.7Expert Deep Dives.215.1.2Cybersecurity.215.1.2.1Capture the Flag(CTF)Challenges.225.1.2.2Cyber range.235.1.3AI Self-Improvement.275.1.3.1OpenAI Research Engineer Interviews(Multiple Choice&Codingquestions).2825.1.3.2SWE-bench Verified.285.1.3.3OpenAI PRs.295