《使用类似编码器的大型语言模型进行在线招聘广告分类.pdf》由会员分享,可在线阅读,更多相关《使用类似编码器的大型语言模型进行在线招聘广告分类.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、ONLINE JOB ADVERTISEMENTS CLASSIFICATION USING ENCODER-LIKE LARGE LANGAUGE MODELMIKOAJ TYM,JAKUB EREBECKIWEB INTELLIGENCE CHALLENGEEUROPEAN STATISTICS AWARDSWeb intelligence classification challenge Challenge was announced by European statistics awards Each team could submit 10 submissions which con
2、tain classification of online job advertisements occupations Predicted classes were evaluated by Lowest Common Ancestor metric Competitors must provide fully documented scripts in R or Python Approaches were evaluated not only for accuracy but also reusability,so they should be scalable and openThe
3、International Standard Classification of Occupations Four-level classification of occupation groups managed by the International Labour Organisation There are 436 occupation classes Despite of some of the classes are strongly semantically related,they occur in different ISCO tree branches Accountant
4、s Professionals Accounting and bookkeeping clerks Clerical support workers LCA metric heavily penalizes such mistakesDataset The competition dataset contains 26,000 multilingual online job advertisements They were retrieved from around 400 websites active in the European Union These advertisements w
5、ere scrapped from the web,so they contain many irrelevant data GDPR clause HTML tags Job benefits Company policiesDUTY CLASSIFIERDATA PREPROCESSING STEP TO CLEAN JOB ADVERTISEMENTSJob offer example Advertisement contains many sections Not all of them are relevant in case of classification Employer o
6、verview is misleading The key part of job offer are requirements but the text which describes them is often shorter compared to other details1.Employer overview2.Requirements3.Benefits4.Equal employmentopportunity statementFiltering non-meaningful informations We have trained a m