《评估从网络数据中获取的企业特征和在线招聘广告分类的质量.pdf》由会员分享,可在线阅读,更多相关《评估从网络数据中获取的企业特征和在线招聘广告分类的质量.pdf(19页珍藏版)》请在三个皮匠报告上搜索。
1、Assessing the Quality of Enterprise Characteristics and Online Job Advertisements derived from Web DataVille Auno,Statistics FinlandJohannes Gussenbauer,Statistics AustriaWIN Conference,Gdansk 06/02/2025Trusted Smart Statistics Web Intelligence NetworkIntroduction Assessing the quality and usability
2、 of web scraped data for officialstatistics production was one of the tasks carried out in the Web Intelligence Network(WIN)project Focus on two different data:Open Job Advertisements(OJA)Online-Based Enterprise Characteristics(OBEC)Findings provide insights into the challenges and strengths of web
3、scraped dataQuality Assessment of OJA Data Quality of OJA data was assessed with two different ways:Use of pre-defined quality indicators for source evaluation Manual annotation exercises for evaluating classification accuracy Quality indicators:Number of relevant(500 OJAs)and very relevant(5000 OJA
4、s)sources overtime Ranking of the relevant sources over time Time series plots for number of OJAs for all very relevant sources Stability of data over different versions of dataOJA:Quality indicators Relevant sources Fairly stable Some fluctuation in Portugal for example Very relevant sources Simila
5、r with relevantsources Larger fluctuations in relative terms in smallercountriesYearATBGDEFIFRITNLPLPTRO20181973963827231251420191816441044322914102420202116405413126122516202127215494734341444212022202161756313615511920231516435422826154414YearATBGDEFIFRITNLPLPTRO20187127224169722201913531625231511
6、411202094284221712810720218428425211581482022752633119141116920234421326171311126OJA:Quality indicators Stability of the relevant and veryrelevant sources were analyzedfurther:Very relevant sources do not remainthe same over the years Relative significance of the sourcesvary from year to year source