《1-1 结构化和长序列中的信息抽取及其应用.pdf》由会员分享,可在线阅读,更多相关《1-1 结构化和长序列中的信息抽取及其应用.pdf(64页珍藏版)》请在三个皮匠报告上搜索。
1、Information Extraction from Structure and Long Input with Its ApplicationsQifan Wang/Meta AI/Senior Staff Research Scientist|OutlineIntroductionBackgroundStructure and Long Input Data RepresentationExtended Transformer Construction(ETC)/BigBirdInformation Extraction ApplicationsAttribute Value Extra
2、ctionStructure Information Extraction from Web PagesConclusion|What is Information Extraction?Information extraction refers to the task of automatically extracting structured information,from unstructured and/or semi-structured documents.In most of the cases this activity concerns processing texts i
3、nformation by means of natural language processing.Useful forSearch and RankingDocument understandingRecommendation systemsQuestion Answeringetc.|brandstoragecolorAttribute:brand Value:LenovoAttribute:storage Value:512 GBAttribute:color Value:mineral gray|Key Challenges in Information ExtractionEffe
4、ctiveness-how to effectively and efficiently represent/model structure and long input data(e.g.,web documents)Multiple information sources(title,description,image,metadata,etc.)Long input sequencesCorrelation between sourcesScalability how to design a unified model that works different type of infor
5、mation extractionE.g.,there are millions of attributes/fields of interest that need to be extractedGeneralizability how to make the model work with new/unseen data E.g.,a new color of a product that does not appear in the training data|BackgroundEarly works include rule based approachesUtilize domai
6、n-specific regular expressionsTemplate based extraction-wrapper inductionNamed entity recognition(NER)methodsTreat each field as an entityBuild deep extraction models to identify the entities/values from the input textRecent advance-NLP sequential modeling techniquesRNN/LSTM/Attention based sequence