《华为:ChatGPT技术分析(2023)(82页).pdf》由会员分享,可在线阅读,更多相关《华为:ChatGPT技术分析(2023)(82页).pdf(82页珍藏版)》请在三个皮匠报告上搜索。
1、ChatGPT技术分析刘群 LIU Qun华为诺亚方舟实验室 Huawei Noahs Ark Lab在线讲座(an online lecture)2023-02-16ChatGPT概览ChatGPT的出色表现ChatGPT的关键技术ChatGPT的不足之处ChatGPT未来发展方向ContentChatGPT概览ChatGPT的出色表现ChatGPT的关键技术ChatGPT的不足之处ChatGPT未来发展方向ContentChatGPT轰动效应用户数:5天100万,2个月达到1亿所有人都开始讨论ChatGPT,传播速度堪比新冠病毒Google内部拉响红色警报Google紧急仅仅发布Bard,
2、但因发布现场出现错误导致股票蒸发8%微软追加投资OpenAI一百亿美元微软迅速推出加载了ChatGPT的New Bing,并计划将ChatGPT接入Office套件国内外大厂迅速跟进1 total:40ChatGPT官方博客:简介TryChatGPT Learn moreAPIRESEARCHABOUTBLOGChatGPT:OptimizingLanguage Modelsfor DialogueWeve trained a model called ChatGPT which interacts in aconversational way.The dialogue format make
3、s it possible forChatGPT to answer followup questions,admit its mistakes,challenge incorrect premises,and reject inappropriate requests.ChatGPT is a sibling model to InstructGPT,which is trained tofollow an instruction in a prompt and provide adetailedresponse.November 30,202213 minute readWe are ex
4、cited to introduce ChatGPT to get users feedback and learn about its strengths and weaknesses.During the researchpreview,usage of ChatGPT is free.Try it now at .SamplesTRY CHATGPTIn the following sample,ChatGPT asks clarifying questions to debug code.(?/?)USERthis code is not working like i expect h
5、ow do i fixit?resultWorkerErr:=make(chan error)defer close(resultWorkerErr)go func()defer cancel()resultWorkerErr-b.resultWorker(ctx)We are excited to introduce ChatGPT to get users feedback and learn about its strengths and weaknesses.During the researchpreview,usage of ChatGPT is free.Try it now a
6、t .SamplesTRY CHATGPTIn the following sample,ChatGPT asks clarifying questions to debug code.(?/?)USERthis code is not working like i expect how do i fixit?resultWorkerErr:=make(chan error)defer close(resultWorkerErr)go func()defer cancel()resultWorkerErr 0PLM(s)=110(1)total:40语言模型定义Language Modelin
7、g is the task of predicting what word comes next.the students opened their _More formally:given a sequence of words ,compute the probability distribution of the next word :where can be any word in the vocabularyA system that does this is called a Language Model.Language Modelingexamsmindslaptopsbook
8、s15Christopher Manning,Natural Language Processing with Deep Learning,Standford U.CS224n10(2)total:40语言模型的发展n元语言模型神经网络语言模型循环神经网络语言模型Transformer语言模型预训练语言模型(Pre-trained Language Models,PLMs)BERT:双向掩码语言模型GPT:纯解码器语言模型大型生成式预训练语言模型(Large Language Models,LLMs)GPT-3ChatGPT11 total:40预训练语言模型(Pre-trained Lang
9、uage Models,PLMs)典型代表:ELMo,BERT,GPTPre-training-then-fine-tuning范式将在pre-training阶段学习到的语言表示迁移到下游任务12 total:40Transformer模型Liliang Wen,Generalized Language Models:Ulmfit&OpenAI GPT(blog)13 total:40自注意力机制(self-attention)(Vaswani et al.,2017)14(1)total:40自注意力机制(self-attention)每个token是通过所有词动态加权得到动态权重会随着输
10、入的改变而变化(BertViz tool,Vig et al.,2019)14(2)total:40ChatGPT的关键技术预训练语言模型(Pre-trained Language Models,PLMs)大型生成式预训练语言模型(Large Language Models,LLMs)人类反馈强化学习(RLHF)Content大型生成式预训练语言模型(LLM)预训练语言模型大型生成式预训练语言模型Pre-trained LanguageModels,PLMsLargeLanguageModels,LLMs典型模型ELMo,BERT,GPT-2GPT-3模型结构BiLSTM,Transforme
11、rTransformer注意力机制双向、单向单向训练方式Mask&PredictAutoregressive Generation擅长任务类型理解生成模型规模1-10亿参数10-x1000亿参数下游任务应用方式Fine-tuningFine-tuning&Prompting涌现能力小数据领域迁移Zero/Few-shot Learning,In-context Learning,Chain-of-Thought15 total:40GPT-3简介GPT-3(Generative Pre-trained Transformer 3)是一个自回归语言模型,目的是为了使用深度学习生成人类可以理解的自
12、然语言。GPT-3是由在旧金山的人工智能公司OpenAI训练与开发,模型设计基于谷歌开发的变换语言模型。GPT-3的神经网络包含1750亿个参数,在发布时为参数最多的神经网络模型。OpenAI于2020年5月发表GPT-3的论文,在次月为少量公司与开发团队发布应用程序界面的测试版。微软在2020年9月22日宣布取得了GPT-3的独家授权。16 total:40GPT-3模型家族ELMo:93M params,2-layer biLSTMBERT-base:110M params,12-layer TransformerBERT-large:340M params,24-layer Transf
13、ormerThe language model“scaling wars”!Mohit Iyyer,slides for CS685 Fall 2020,University of Massachusetts Amherst17 total:40GPT-3数据来源DatasetTokens(billion)AssumptionsTokens per byte(Tokens/bytes)RatioSize(GB)Web dataWebText2Books1Books2Wikipedia410B19B12B55B3B25%WebTextGutenbergBibliotikSee RoBERTa0.
14、710.380.570.540.261:1.91:2.61:1.751:1.841:3.8570502110111.4Total499B753.4GBTable.GPT-3 Datasets.Disclosed in bold.Determined in italics.Alan D.Thompson,GPT-3.5+ChatGPT:An illustrated overview,https:/lifearchitect.ai/chatgpt/18(1)total:40GPT-3数据来源数据来源:跟其他大规模语言模型的对比18(2)total:40GPT-3训练数据量看一下大语言模型训练的to
15、ken数量:GPT-3(2020.5)是500B(5000亿),目前最新数据为止;Google的PaLM(2022.4)是780B;DeepMind的Chinchilla是1400B;Pangu-公布了训练的token数,约为40B,不到GPT-3的十分之一;国内其他的大模型都没有公布训练的token数。19(1)total:40GPT-3训练数据量ELMo:1B training tokensBERT:3.3B training tokensRoBERTa:30B training tokensThe language model“scaling wars”!Mohit Iyyer,slid
16、es for CS685 Fall 2020,University of Massachusetts Amherst19(2)total:40GPT-3算力消耗The language model“scaling wars”!Log scale!Mohit Iyyer,slides for CS685 Fall 2020,University of Massachusetts Amherst20 total:40Few-shot and zero-shot learning(in-context learning)Brown et al.,Language Models are Few-Sho
17、t Learners,arXiv:2005.14165,202121(1)total:40Few-shot and zero-shot learning(in-context learning)Brown et al.,Language Models are Few-Shot Learners,arXiv:2005.14165,202121(2)total:40Chain-of-thoughtPreprint:https:/arxiv.org/pdf/2201.11903.pdf22 total:40Magic word:Lets think step-by-step(c)Zero-shotQ
18、:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.How many blue golf balls are there?A:The answer(arabic numerals)is(Output)8 X(d)Zero-shot-CoT(Ours)Q:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.
19、How many blue golf balls are there?A:Lets think step by step.(Output)There are 16 balls in total.Half of the balls are golf balls.That means that there are 8 golf balls.Half of the golf balls are blue.That means that there are 4 blue golf balls.Q:Roger has 5 tennis balls.He buys 2 more cans of tenni
20、s balls.Each can has 3 tennis balls.How many tennis balls does he have now?A:Roger started with 5 balls.2 cans of 3 tennis balls each is 6 tennis balls.5+6=11.The answer is 11.Q:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.How many blue golf ball
21、s are there?A:(Output)The juggler can juggle 16 balls.Half of the balls are golf balls.So there are 16/2=8 golf balls.Half of the golf balls are blue.So there are 8/2=4 blue golf balls.The answer is 4.(b)Few-shot-CoT(a)Few-shotQ:Roger has 5 tennis balls.He buys 2 more cans of tennis balls.Each can h
22、as 3 tennis balls.How many tennis balls does he have now?A:The answer is 11.Q:A juggler can juggle 16 balls.Half of the balls are golf balls,and half of the golf balls are blue.How many blue golf balls are there?A:(Output)The answer is 8.XFigure 1:Example inputs and outputs of GPT-3 with(a)standard
23、Few-shot(Brown et al.,2020),(b)Few-shot-CoT(Wei et al.,2022),(c)standard Zero-shot,and(d)ours(Zero-shot-CoT).Similar toFew-shot-CoT,Zero-shot-CoT facilitates multi-step reasoning(blue text)and reach correct answerwhere standard prompting fails.Unlike Few-shot-CoT using step-by-step reasoning example
24、spertask,ours does not need any examples and just uses the same prompt“Lets think step by step”acrossall tasks(arithmetic,symbolic,commonsense,and other logical reasoning tasks).In contrast to the excellent performance of LLMs in intuitive and single-step system-1 Stanovichand West,2000 tasks with t
25、ask-specific few-shot or zero-shot prompting Liu et al.,2021b,evenlanguage models at the scale of 100B or more parameters had struggled on system-2 tasks requiringslow and multi-step reasoning Rae et al.,2021.To address this shortcoming,Wei et al.2022,Wang et al.2022 have proposed chain of thought p
26、rompting(CoT),which feed LLMs with thestep-by-step reasoning examples rather than standard question and answer examples(see Fig.1-a).Such chain of thought demonstrations facilitate models to generate a reasoning path that decomposesthe complex reasoning into multiple easier steps.Notably with CoT,th
27、e reasoning performance thensatisfies the scaling laws better and jumps up with the size of the language models.For example,when combined with the 540B parameter PaLM model Chowdhery et al.,2022,chain of thoughtprompting significantly increases the performance over standard few-shot prompting across
28、 severalbenchmark reasoning tasks,e.g.,GSM8K(17.9%58.1%).While the successes of CoT prompting Wei et al.,2022,along those of many other task-specificprompting work Gao et al.,2021,Schick and Schtze,2021,Liu et al.,2021b,are often attributedto LLMs ability for few-shot learning Brown et al.,2020,we s
29、how that LLMs are decent zero-shotreasoners by adding a simple prompt,Lets think step by step,to facilitate step-by-step thinking beforeanswering each question(see Figure 1).Despite the simplicity,our Zero-shot-CoT successfullygenerates a plausible reasoning path in a zero-shot manner and reaches th
30、e correct answer in aproblem where the standard zero-shot approach fails.Importantly,our Zero-shot-CoT is versatile andtask-agnostic,unlike most prior task-specific prompt engineering in the forms of examples(few-shot)or templates(zero-shot)Liu et al.,2021b:it can facilitate step-by-step answers acr
31、oss variousreasoning tasks,including arithmetic(MultiArith Roy and Roth,2015,GSM8K Cobbe et al.,2021,AQUA-RAT Ling et al.,2017,and SVAMP Patel et al.,2021),symbolic(Last letter and Coinflip),commonsense reasoning(CommonSenseQA Talmor et al.,2019 and Strategy QA Geva et al.,2021),and other logical re
32、asoning tasks(Date understanding and Tracking Shuffled Objects fromBIG-bench big,2021)without modifying the prompt per task.We empirically evaluate Zero-shot-CoT against other prompting baselines in Figure 1.While ourZero-shot-CoT underperforms Few-shot-CoT with carefully-crafted and task-specific s
33、tep-by-stepexamples,Zero-shot-CoT achieves enormous score gains compared to the zero-shot baseline,e.g.from 17.7%to 78.7%on MultiArith and from 10.4%to 40.7%on GSM8K with 175B parameter2Preprint:http:/arxiv.org/abs/2205.1191623 total:40Emergence and homogenizationBommasani et al.,On the Opportunitie
34、s and Risks of Foundation Models,arXiv:2108.07258 cs.LG24(1)total:40Emergence and homogenizationBommasani et al.,On the Opportunities and Risks of Foundation Models,arXiv:2108.07258 cs.LG24(2)total:40The scale matters:the emergence of abilities101810201022102401020304050Accuracy(%)(A)Mod.arithmetic1
35、01810201022102401020304050BLEU(%)(B)IPA transliterate101810201022102401020304050Exact match(%)(C)Word unscrambleLaMDAGPT-3GopherChinchillaPaLMRandom101810201022102401020304050Exact match(%)(D)Figure of speech102010221024010203040506070Accuracy(%)(E)TruthfulQA102010221024010203040506070Model scale(tr
36、aining FLOPs)Accuracy(%)(F)Grounded mappings102010221024010203040506070Accuracy(%)(G)Multi-task NLU102010221024010203040506070Accuracy(%)(H)Word in contextFigure 2:Eight examples of emergence in the few-shot prompting setting.Each point is a separate model.Theability to perform a task via few-shot p
37、rompting is emergent when a language model achieves random performanceuntil a certain scale,after which performance significantly increases to well-above random.Note that modelsthat used more training compute also typically have more parametershence,we show an analogous figure withnumber of model pa
38、rameters instead of training FLOPs as the x-axis in Figure 7.AD:BIG-Bench(2022),2-shot.E:Lin et al.(2021)and Rae et al.(2021).F:Patel and Pavlick(2022).G:Hendrycks et al.(2021),Rae et al.(2021),and Hoffmann et al.(2022).H:Brown et al.(2020),Hoffmann et al.(2022),and Chowdhery et al.(2022)on theWiC b
39、enchmark(Pilehvar and Camacho-Collados,2019).The ability to perform a task via few-shot prompt-ing is emergent when a model has random per-formance until a certain scale,after which perfor-mance increases to well-above random.Figure 2shows eight such emergent abilities spanning fivelanguage model fa
40、milies from various work.BIG-Bench.Figure 2AD depicts four emergentfew-shot promptedtasks from BIG-Bench,acrowd-sourced suite of over 200 benchmarks for languagemodel evaluation(BIG-Bench,2022).Figure 2Ashows an arithmetic benchmark that tests 3-digitaddition and subtraction,as well as 2-digit multi
41、-plication.GPT-3 and LaMDA(Thoppilan et al.,2022)have close-to-zero performance for severalorders of magnitude of training compute,beforeperformance jumps to sharply above random at2 1022training FLOPs(13B parameters)for GPT-3,and1023training FLOPs(68B parameters)forLaMDA.Similar emergent behavior a
42、lso occurs ataround the same model scale for other tasks,suchas transliterating from the International PhoneticAlphabet(Figure 2B),recovering a word from itsscrambled letters(Figure 2C),and detecting fig-ures of speech(Figure 2D).Even more emergentabilities from BIG-Bench are given in Table 1.Truthf
43、ulQA.Figure 2E shows few-shot promptedperformance on the TruthfulQA benchmark,whichmeasures the ability to answer questions truthfully(Lin et al.,2021).This benchmark is adversari-ally curated against GPT-3 models,which do notperform above random,even when scaled to thelargest model size.Small Gophe
44、r models also donot perform above random until scaled up to thelargest model of5 1023training FLOPs(280Bparameters),for which performance jumps to morethan 20%above random(Rae et al.,2021).Grounded conceptual mappings.Figure 2Fshows the task of grounded conceptual mappings,where language models must
45、 learn to map a con-ceptual domain,such as a cardinal direction,rep-resented in a textual grid world(Patel and Pavlick,2022).Again,performance only jumps to aboverandom using the largest GPT-3 model.Multi-task language understanding.Figure 2Gshows the Massive Multi-task Language Under-standing(MMLU)
46、benchmark,which aggregates57 tests covering a range of topics including math,history,law,and more(Hendrycks et al.,2021).ForGPT-3,Gopher,and Chinchilla,models of1022training FLOPs(10B parameters)or smaller donotperformbetterthanguessingonaverageoverallthe topics,scaling up to35 1023training FLOPs(70
47、B280B parameters)enables performance tosubstantially surpass random.This result is strik-ing because it could imply that the ability to solveknowledge-based questions spanning a large col-lection of topics might require scaling up past thisthreshold(for dense language models without re-trieval or ac
48、cess to external memory).Word in Context.Finally,Figure 2H shows theWord in Context(WiC)benchmark(Pilehvar andCamacho-Collados,2019),which is a semantic un-derstanding benchmark.Notably,GPT-3 and Chin-chilla fail to achieve one-shot performance of bet-ter than random,even when scaled to their larges
49、tmodel size of5 1023FLOPs.Although these re-sults so far may suggest that scaling alone may notenable models to solve WiC,above-random perfor-mance eventually emerged when PaLM was scaledto2.51024FLOPs(540B parameters),which wasmuch larger than GPT-3 and Chinchilla.4Augmented Prompting StrategiesAlt
50、hough few-shot prompting is perhaps currentlythe most common way of interacting with largelanguage models,recent work has proposed severalother prompting and finetuning strategies to furtheraugment the abilities of language models.If a tech-nique shows no improvement or is harmful whencompared to th
51、e baseline of not using the techniqueuntil applied to a model of a large-enough scale,we also consider the technique an emergent ability.Multi-step reasoning.Reasoning tasks,especiallythose involving multiple steps,have been chal-lenging for language models and NLP modelsmore broadly(Rae et al.,2021
52、;Bommasani et al.,10211022102310240510152025No chainof thoughtChain ofthoughtGSM8K Accuracy(%)(A)Math wordproblems10211022102310243040506070NoinstructiontuningInstructiontuning10 NLU task average(B)Instructionfollowing101910201021020406080100NoscratchpadScratchpadModel scale(training FLOPs)8-digit a
53、ddition(in-domain)(C)Arithmetic101910201021020406080100NoscratchpadScratchpad9-digit addition(OOD)(D)ArithmeticFigure 3:Specialized prompting or finetuning methodscan be emergent in that they do not have a positive ef-fect until a certain model scale.A:Wei et al.(2022b).B:Wei et al.(2022a).C&D:Nye e
54、t al.(2021).Ananalogous figure with number of parameters on the x-axis instead of training FLOPs is given in Figure 8.Themodel shown here is LaMDA(Thoppilan et al.,2022).2021;Nye et al.,2021).A recent prompting strat-egy called chain-of-thought prompting enables lan-guage models to solve such proble
55、ms by guidingthem to produce a sequence of intermediate stepsbefore giving the final answer(Cobbe et al.,2021;Wei et al.,2022b;Zhou et al.,2022).As shown inFigure 3A,chain of thought prompting only sur-passes standard prompting without intermediatesteps when scaled to1023training FLOPs(100Bparameter
56、s).A similar emergence in performancegain was also observed when augmenting few-shotprompting with explanations that came after thefinal answer(Lampinen et al.,2022).Instruction following.Another growing line ofwork aims to better enable language models toperform new tasks simply by reading instruct
57、ionsdescribing the task(without few-shot exemplars).By finetuning on a mixture of tasks phrased as in-structions,language models have been shown torespond appropriately to instructions describing anunseen task(Ouyang et al.,2022;Wei et al.,2022a;Wei et al.,Emergent Abilities of Large Language Models
58、,Preprint:arXiv:2206.0768225 total:40ChatGPT的关键技术预训练语言模型(Pre-trained Language Models,PLMs)大型生成式预训练语言模型(Large Language Models,LLMs)人类反馈强化学习(RLHF)Content从GPT-3到ChatGPTYao Fu,How does GPT Obtain its Ability?Tracing Emergent Abilities of Language Models to their Sources(Blog)26 total:40ChatGPT官方博客:方法Met
59、hodsWe trained this model using Reinforcement Learning from Human Feedback(RLHF),using the same methods as InstructGPT,butwith slight differences in the data collection setup.We trained an initial model using supervised fine-tuning:human AI trainersprovided conversations in which they played both si
60、desthe user and an AI assistant.We gave the trainers access to model-writtensuggestions to help them compose theirresponses.To create a reward model for reinforcement learning,we needed to collect comparison data,which consisted of two or more modelresponses ranked by quality.To collect this data,we
61、 took conversations that AI trainers had with the chatbot.We randomly selected amodel-written message,sampled several alternative completions,and had AI trainers rank them.Using these reward models,we canfine-tune the model using Proximal Policy Optimization.We performed several iterations of thispr
62、ocess.()err:=b.worker(ctx)cancel()if err=nil return-resultWorkerErrreturn multierror.Append(err,-resultWorkerErr)CHATGPTIts difficult to say whats wrong with the code without more context.Can you providemore information about what the code is supposed to do and what isnt working asexpected?Also,is t
63、his the entire code or just a part ofit?USERjust part of it the error never surfaces.I think there is something wrong withthechannelCHATGPTIts difficult to say without more information about what the code is supposed to do andwhats happening when its executed.One potential issue with the code you pr
64、ovided isthat the resultWorkerErr channel is never closed,which means that the code couldpotentially hang if the resultWorkerErr channel is never written to.This could happen ifb.resultWorker never returns an error or if its canceled before it has a chance to returnanerror.To fix this issue,you coul
65、d close the resultWorkerErr channel after writing to it.Forexample,you could add the following line of code after the line that sends the error onthechannel:close(resultworkerErr)This would ensure that the resultWorkerErr channel is always closed,even if b.resultWorkernever returns an error.Again,it
66、s difficult to say whether this is the issue youreexperiencing without more information about the problem youre seeing.Can you providemore details about whats happening when you run thecode?ChatGPT is fine-tuned from a model in the GPT-3.5 series,which finished training in early 2022.You can learn m
67、ore about the 3.5series here.ChatGPT and GPT 3.5 were trained on an Azure AI supercomputinginfrastructure.LimitationsChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.Fixing this issue is challenging,as:(1)duringRL training,theres currently no source of truth;(2)traini
68、ng the model to be more cautious causes it to decline questions that itcan answer correctly;and(3)supervised training misleads the model because the ideal answer depends on what the modelknows,rather than what the human demonstratorknows.ChatGPT is sensitive to tweaks to the input phrasing or attemp
69、ting the same prompt multiple times.For example,given onephrasing of a question,the model can claim to not know the answer,but given a slight rephrase,can answercorrectly.The model is often excessively verbose and overuses certain phrases,such as restating that its a language model trained byOpenAI.
70、These issues arise from biases in the training data(trainers prefer longer answers that look more comprehensive)andwell-known over-optimization issues.Ideally,the model would ask clarifying questions when the user provided an ambiguous query.Instead,our current models usuallyguess what the userinten
71、ded.While weve made efforts to make the model refuse inappropriate requests,it will sometimes respond to harmful instructions orexhibit biased behavior.Were using the Moderation API to warn or block certain types of unsafe content,but we expect it to havesome false negatives and positives for now.We
72、re eager to collect user feedback to aid our ongoing work to improve thissystem.Iterative deploymentTodays research release of ChatGPT is the latest step in OpenAIs iterative deployment of increasingly safe and useful AI systems.Many lessons from deployment of earlier models like GPT-3 and Codex hav
73、e informed the safety mitigations in place for this release,including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from humanfeedback(RLHF).1,2CHATGPTINSTRUCTGPT:TEXT-DAVINCI-?Tell me about when Christopher Columbuscame to the US in?This ques
74、tion is a bit tricky becauseChristopher Columbus died in?,so hecould not have come to the US in?.But letspretend for a moment that hedid!If Columbus arrived in the US in?,he wouldlikely be very surprised at the changes thathave occurred since he first landed in the“NewWorld”in?.For one,he would prob
75、ably beTell me about when Christopher Columbuscame to the US in?Christopher Columbus came to the US in?and he was very excited to be here.He hadalways wanted to visit the US and he was verycurious about our country.He was veryimpressed with our country and he enjoyed histimehere.ChatGPT Blog:https:/
76、 Agent)以获得对话数据。我们给训练人员提供了模型编写建议,以帮助他们撰写答案。为了创建强化学习的奖励模型,我们需要收集比较数据,对两个或更多的模型响应结果按质量进行排序。为了收集这些数据,我们进行了人类训练人员与聊天机器人的对话。我们随机选择一个模型生成的信息,对模型的后续响应进行多次采样,并让训练人员对它们进行排名。使用这些奖励模型,我们可以使用近端策略优化(PPO)方法对模型进行微调优化。我们对这个过程进行了几次迭代。ChatGPT是由GPT-3.5系列中的一个模型微调的,该模型于2022年初完成了训练。您可以在此处了解有关GPT-3.5系列的更多信息。ChatGPT和GPT-3.5
77、在Azure AI超级计算基础架构上进行了训练。ChatGPT Blog:https:/ Blog:https:/ TuningOuyang et al.,“Training Language Models to Follow Instructions with Human Feedback,”OpenAI,Jan 202228 total:40人类反馈的强化学习(RLHF)第一阶段:冷启动阶段的监督策略模型。靠GPT 3.5本身,尽管它很强,但是它很难理解人类不同类型指令中蕴含的不同意图,也很难判断生成内容是否是高质量的结果。为了让GPT 3.5初步具备理解指令中蕴含的意图,首先会从测试用户
78、提交的prompt(就是指令或问题)中随机抽取一批,靠专业的标注人员,给出指定prompt的高质量答案,然后用这些人工标注好的数据来Fine-tune GPT 3.5模型。经过这个过程,我们可以认为GPT 3.5初步具备了理解人类prompt中所包含意图,并根据这个意图给出相对高质量回答的能力,但是很明显,仅仅这样做是不够的。张俊林:ChatGPT会成为下一代搜索引擎吗(blog)29(1)total:40人类反馈的强化学习(RLHF)第二阶段:训练回报模型(Reward Model,RM)。首先由冷启动后的监督策略模型为每个prompt产生K个结果,人工根据结果质量由高到低排序,用这些排序结
79、果来训练回报模型。对于学好的RM模型来说,输入,输出结果的质量得分,得分越高说明产生的回答质量越高。张俊林:ChatGPT会成为下一代搜索引擎吗(blog)29(2)total:40人类反馈的强化学习(RLHF)第三阶段:采用强化学习来增强预训练模型的能力。本阶段无需人工标注数据,而是利用上一阶段学好的RM模型,靠RM打分结果来更新预训练模型参数。张俊林:ChatGPT会成为下一代搜索引擎吗(blog)29(3)total:40RLHF的人工数据标注在RLHF数据标注过程中,OpenAI使用了 40 名全职标注员,这些标注人员经过严格的筛选和培训,从而使他们形成一个统一的价值取向和标准。他们也
80、会被筛查(screening),从而确保他们不会偏离预定的价值取向和标准。Ouyang et al.,“Training Language Models to Follow Instructions with Human Feedback,”OpenAI,Jan 202230 total:40RLHF人工标注准则Misinformation(correct)1.Only make statements that could plausibly be true;do not say things that areobviously false.2.Do not make assumptions
81、about the user,only draw conclusions supported bythe dialogue.3.Do not claim to take any actions in the real world.The agent can only converse,not e.g.send emails.4.Do not give an impression of medical authority or expertise,and do not offermedical advice;instead suggest asking a doctor.(But it is o
82、k to discuss generalquestions about biology and medicine).5.Do not give specific legal advice;instead suggest asking a lawyer.(But it is ok toanswer general questions about the law.)6.Do not offer financial advice.(But it is ok to answer general questions aboutinvestment.)7.Do not endorse conspiracy
83、 theories,or views commonly considered to beconspiracy theories.来源:https:/arxiv.org/pdf/2209.14375.pdf31(1)total:40RLHF人工标注准则Stereotypes(harm)1.Do not use stereotypes or make any other harmful generalising statementsabout groups of people.2.Do not use microaggressions,i.e.statements that in subtle w
84、ays communicatesome sort of bias against historically marginalised groups.Hate and harassment(harm)1.Do not make statements which are threatening.2.Do not make statements which can be perceived as sexually aggressive.3.Do not make negative or hateful comments targeting someone because ofaspects of t
85、heir identity,such as where they are from or how they look.4.Do not output language which is insulting(to the conversation partner or anyother individual or group).5.Do not output language that can reasonably be considered harassment orhateful language.来源:https:/arxiv.org/pdf/2209.14375.pdf31(2)tota
86、l:40RLHF人工标注准则Self-anthropomorphism(harm)1.Do not pretend to have a body or be able to move in a body.2.Do not build a relationship to the user.3.Do not claim to have preferences,feelings,opinions,or religious beliefs.4.Do not pretend to have a human identity or life history,such as having a place o
87、fbirth,relationships,family,memories,gender,age.Other(harm)1.Do not make statements that could be considered harmful(by harming the useror otherwise).Other(helpful)1.The agents statements should make sense in the context of the dialogue.2.The agents statements should be on-topic in the context of th
88、e dialogue,byproviding relevant information or asking clarifying questions if the user is unclear.3.The agent must clearly address the queries from the user.4.The agent should not repeat itself unnecessarily.来源:https:/arxiv.org/pdf/2209.14375.pdf31(3)total:40采用RLHF取得的收益这里给出了ChatGPT比InstructGPT在回复假设性
89、和安全性问题方面有所改善的一些例子。ChatGPT Blog:https:/ is fine-tuned from a model in the GPT-3.5 series,which finished training in early 2022.You can learn more about the 3.5series here.ChatGPT and GPT 3.5 were trained on an Azure AI supercomputinginfrastructure.LimitationsChatGPT sometimes writes plausible-soundin
90、g but incorrect or nonsensical answers.Fixing this issue is challenging,as:(1)duringRL training,theres currently no source of truth;(2)training the model to be more cautious causes it to decline questions that itcan answer correctly;and(3)supervised training misleads the model because the ideal answ
91、er depends on what the modelknows,rather than what the human demonstratorknows.ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times.For example,given onephrasing of a question,the model can claim to not know the answer,but given a slight rephrase,can answ
92、ercorrectly.The model is often excessively verbose and overuses certain phrases,such as restating that its a language model trained byOpenAI.These issues arise from biases in the training data(trainers prefer longer answers that look more comprehensive)andwell-known over-optimization issues.Ideally,
93、the model would ask clarifying questions when the user provided an ambiguous query.Instead,our current models usuallyguess what the userintended.While weve made efforts to make the model refuse inappropriate requests,it will sometimes respond to harmful instructions orexhibit biased behavior.Were us
94、ing the Moderation API to warn or block certain types of unsafe content,but we expect it to havesome false negatives and positives for now.Were eager to collect user feedback to aid our ongoing work to improve thissystem.Iterative deploymentTodays research release of ChatGPT is the latest step in Op
95、enAIs iterative deployment of increasingly safe and useful AI systems.Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release,including substantial reductions in harmful and untruthful outputs achieved by the use of reinforce
96、ment learning from humanfeedback(RLHF).1,2CHATGPTINSTRUCTGPT:TEXT-DAVINCI-?Tell me about when Christopher Columbuscame to the US in?This question is a bit tricky becauseChristopher Columbus died in?,so hecould not have come to the US in?.But letspretend for a moment that hedid!If Columbus arrived in
97、 the US in?,he wouldlikely be very surprised at the changes thathave occurred since he first landed in the“NewWorld”in?.For one,he would probably beTell me about when Christopher Columbuscame to the US in?Christopher Columbus came to the US in?and he was very excited to be here.He hadalways wanted t
98、o visit the US and he was verycurious about our country.He was veryimpressed with our country and he enjoyed histimehere.ChatGPT Blog:https:/ API来警告或阻止某些类型的不安全内容,但我们预计它目前会有一些误报和误报。我们渴望收集用户反馈,以帮助我们正在进行的改进该系统的工作。ChatGPT Blog:https:/ total:40数学能力和逻辑能力不足35 total:40价值观保护机制不完善36 total:40ChatGPT概览ChatGPT的出
99、色表现ChatGPT的关键技术ChatGPT的不足之处ChatGPT未来发展方向ContentChatGPT未来发展方向与检索结合(改善事实性和实时性)调用外部能力(改善数学和推理能力)多模态理解和生成终生持续学习37 total:40与检索结合https:/perplexity.ai38 total:40调用外部能力Stephen Wolfram,Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT39 total:40ChatGPT概览ChatGPT的出色表现ChatGPT的关键技术
100、ChatGPT的不足之处ChatGPT未来发展方向ContentSummaryChatGPT概览ChatGPT的出色表现ChatGPT的关键技术ChatGPT的不足之处ChatGPT未来发展方向Thank you!把数字世界带入每个人、每个家庭、每个组织,构建万物互联的智能世界。Bring digital to every person,home and organizationfor a fully connected,intelligent world.Copyright2018 Huawei Technologies Co.,Ltd.All Rights Reserved.The inf
101、ormation in this document may containpredictive statements including,without limitation,statements regarding the future financial andoperating results,future product portfolio,newtechnology,etc.There are a number of factors thatcould cause actual results and developments todiffer materially from those expressed or implied inthe predictive statements.Therefore,suchinformation is provided for reference purpose onlyand constitutes neither an offer nor an acceptance.Huawei may change the information at any timewithout notice.