《Web Intelligence Hub (WIH) 的数据采集服务 (DAS) 的现状.pdf》由会员分享,可在线阅读,更多相关《Web Intelligence Hub (WIH) 的数据采集服务 (DAS) 的现状.pdf(11页珍藏版)》请在三个皮匠报告上搜索。
1、2025/02/04State of play of the Data Acquisition Service(DAS)of the Web Intelligence Hub(WIH)Mszros MtysWeb Intelligence Network Conference-From Web to Data Gdansk,4-5 February2025Web Intelligence Network Conference-From Web to Data GdanskWIH platform componentsOnline Job Advertisements Data Producti
2、on System(OJA-DPS)DatalabWIH Data Acquisition Service(DAS)DAS development principlesScalableBuild on open-source toolsTry to use the state of the artCan handle static and dynamic contentNo coding,only configuration by the userUniversal,can be used for several use cases(OJA,MNE,price,etc.)Separation
3、of use cases with possible collaboration in the same use caseThe beginning of the DASVersion 1 was released in 2021 NovGeneric data acquisition service with API access for static and dynamic web pagesUsing StormCrawler and SeleniumEUi frontend(Dashboard)with user authentication(EU Login and AWS Cogn
4、ito)Deployment using Infrastructure as Code(IaC)Version 2 was released in 2022 AprAdding the playground for data acquisition to test filters and dynamic web pagesVersion 3 was released in 2022 SepAdditional Selenium filters based on the needs of tourism websitesDashboard:EUi frontend to manage the A
5、PIsDAS:SpringBootAPI with StormCrawlerand Selenium in the backgroundPlayground:SpringBoot with Selenium in the backgroundFirst testing by the WINBased on the feedbackVersion 4 was released in 2022 DecMultitenancy was introducedMoving authentication from AWS Cognito to KeycloakVersion 5 was released
6、in 2023 Feb1stSecurity testing and update Adding new functionalities like advanced search,copy configuration and acquisition action historyVersion 6 was released in 2023 MarIntroduction of user roles(guest,developer and admin)Possibility to use sitemap discove