从代码完成到自主软件工程代理.pdf

上传人： Fl****zo

编号：718730

2025-06-22

PDF 27页 4.17MB

《从代码完成到自主软件工程代理.pdf》由会员分享，可在线阅读，更多相关《从代码完成到自主软件工程代理.pdf（27页珍藏版）》请在三个皮匠报告上搜索。

1、 benchmarks2.Agent Basics3.SWE-agent4.Training LMs&whats nextFrom Code Completion to Autonomous Software Engineering AgentsBenchmarksBasicsSWE-agentTrainingSWE-benchBenchmarks:HumanEval-style2Perfect benchmark for evaluating code generation&autocomplete!BenchmarksBut:Saturated,little context,andmost

2、 developer time is not spent writing code!BenchmarksBasicsSWE-agentTrainingSWE-benchBenchmarks:Software engineering3BugReproduceFind/read codeEditTestDoneBenchmarksBenchmarksBasicsSWE-agentTrainingSWE-benchBenchmarks:Software engineering4DoneBugReproduceFind/read codeEditTestother Bugsother Bugsothe

3、r BugsBenchmarksBenchmarksBasicsSWE-agentTrainingSWE-benchBenchmarks:SWE-bench5Solve real-world GitHub issuesVery challenging:Extreme context,reasoning-heavy,enormous action spaceBenchmarksBenchmarksBasicsSWE-agentTrainingSWE-benchBenchmarks:SWE-bench multimodal6BenchmarksFar from being saturated:25

4、%SotABenchmarksBasicsSWE-agentTrainingSWE-benchBenchmarks:SWE-bench7BenchmarksTaking results for best comparability,not best performance.Generally same scaffolding between 24 and 25 but different tools,promptsBenchmarksBasicsSWE-agentTrainingPut the LM into the terminalbash commands are hardHumans u

5、se tools like VSCode or vimLets build&optimize an agent-computer interface(ACI)!(Yang et.al 2023)8BasicsBenchmarksBasicsSWE-agentTrainingTools910%18%Apr 24Many things have changed since Apr 24Most editing tools now search&replace basedOpen source models might still benefit from linting,but less need

6、ed for SoTA modelsBUT:edit is still the most important part of the ACI to optimizeLMs got much better at using many toolsBasicsBenchmarksBasicsSWE-agentTrainingPrompts10Initial promptsExplain mission,strategy&give tipsCan be very short with Claude 3.5+Do not need to explain tools if using function c

从代码完成到自主软件工程代理.pdf

相关报告