《2020年终大会-对话智能:14-1.pdf》由会员分享,可在线阅读,更多相关《2020年终大会-对话智能:14-1.pdf(46页珍藏版)》请在三个皮匠报告上搜索。
1、DoDo PretrainingPretraining LanguageLanguage ModelsModels ReallyReally UnderstandUnderstand Language?Language? Huang, Minlie (黄民烈黄民烈) CoAI (Conversational AI) Tsinghua University http:/ Outline 1 Meaning, understanding, Common Crawl (410B), WebText2 (19B), Books1 (12B), Books2 (55B), Wikipedia (3B)
2、Blender Reddit (1.5B comments, 56.8B BPE tokens, 88.8B context tokens) Meena 40B words, social media XLNet 130GB Book, ClueWeb, Common Crawl GPT2 WebText (8M documents, 40GB text) BERT 3.3B words; BookCorpus, English Wikipedia 13 What is learned / not learned in pretraining language models? BERTBERT
3、 Linguistic Knowledge Linguistic Knowledge Pre-trained language models can do well in many linguistic probing tasks: Part of Speech Tagging. Grammatical error detection. Named Entity Recognition. Sentence structures are encoded in model parameters. BERT can discover the sentence structure. 1 Liu et al. Linguistic Knowledge and Transferability of Contextual Representations. 2019. In proceedings of