《Sonar:2025主流大语言模型(LLM)的编码个性研究报告(英文版)(21页).pdf》由会员分享,可在线阅读,更多相关《Sonar:2025主流大语言模型(LLM)的编码个性研究报告(英文版)(21页).pdf(21页珍藏版)》请在三个皮匠报告上搜索。
1、A State of Code ReportThe CodingPersonalities of Leading LLMsExpanded edition,including GPT-5!2/21October 2025The Coding Personalities of Leading LLMs A State of Code ReportTable of ContentsIntroduction3Our approach4Foundation of shared strengths and flaws5Shared strengths5Shared flaws7Coding person
2、alities12Personality traits12Coding archetypes15The baseline performer16The senior architect16The balanced predecessor17The efficient generalist17The unfulfilled promise18The rapid prototyper18Why“more capable”can be riskier19C3/21October 2025The Coding Personalities of Leading LLMs A State of Code
3、ReportIntroduction:Beyond the performance benchmarkAI has embedded itself in the software development lifecycle(SDLC)at an extraordinary speed.Tools such as Claude Code,Cursor,and GitHub Copilot are increasingly standard and necessary tools for software developers.Underlying all of these tools are L
4、arge Language Models(LLMs),some general purpose from companies like OpenAI,Anthropic,Meta,and Google,and some specially built for coding use cases.Understanding the true capabilities of these models is of critical importance as the industry develops.However,the typical methods for evaluating these c
5、apabilities do not give a complete,high-resolution picture.A primary evaluation approach focuses on assessing LLM performance against benchmarks that test their ability to solve difficult coding challengeswhat we consider to be an important but narrow test.This relentless focus on performance benchm
6、arks leads to what experts describe as“super spiky capability distributions.”As we will show in this report,this focus on performance benchmarks leads to LLMs that can solve difficult coding challenges,but do not necessarily write good codethat is,code that is reliable,secure,and maintainable.It is