《利用企业的网络影响力来改进 NACE 代码分类.pdf》由会员分享,可在线阅读,更多相关《利用企业的网络影响力来改进 NACE 代码分类.pdf(19页珍藏版)》请在三个皮匠报告上搜索。
1、www.statistik.atUnabhngige Statistiken fr faktenbasierte EntscheidungenExploiting the Web Presence of Enterprises to Improve NACE Code ClassificationJohannes GussenbauerWIN 2025 CONFERENCE Danzig,05.02.2025Johannes.Gussenbauerstatistik.gv.atAlexander KowarikAlexander.Kowarikstatistik.gv.atwww.statis
2、tik.atFolie 2Outline Aim of classification task Data acquisition and processing Modelling and performance evaluation Hierarchical performance measuresFolie 3www.statistik.atAim of classification taskwww.statistik.atFolie 4Aim of classification task NACE editing labour intensive task+NACE revision co
3、ming 2025 Possible to predict NACE of entrprise using text from enterprise website?Test NACE predicion during ESSNet Web Intelligence Network Main focus on developing model used in recommendation system for editing task reduceediting timeFolie 5www.statistik.atData Acquisition and pre-processingwww.
4、statistik.atFolie 6Data Acquisition Collect web data during ICT-survey cycles Collected data from 2019 to 2023(results limited up to 2021)Google Custom API Search withname and address ofenterpriseSelenium+R Scrape text fromwebsite;especially searchfor imprint“Link Websiten and address Process text a
5、nd deterministicallylink via VAT orCRN found in imprint“www.statistik.atFolie 7www.statistik.atFolie 8Text data processing Process collected text from website Transform each word with the German morphological lexicon available on https:/www.openthesaurus.de/about/download Lemmetization and stemming
6、did not improve classification performance Removing all digits and punctuations Remove characters not part of the German dictionary Remove German stop words.Folie 9www.statistik.atModelling&Resultswww.statistik.atFolie 10NACE Classification Make NACE level 2 prediction using text as features=Pre-pro