1、2025/02/04State of play of the Data Acquisition Service(DAS)of the Web Intelligence Hub(WIH)Mszros MtysWeb Intelligence Network Conference-From Web to Data Gdansk,4-5 February2025Web Intelligence Network Conference-From Web to Data GdanskWIH platform componentsOnline Job Advertisements Data Producti
2、on System(OJA-DPS)DatalabWIH Data Acquisition Service(DAS)DAS development principlesScalableBuild on open-source toolsTry to use the state of the artCan handle static and dynamic contentNo coding,only configuration by the userUniversal,can be used for several use cases(OJA,MNE,price,etc.)Separation
3、of use cases with possible collaboration in the same use caseThe beginning of the DASVersion 1 was released in 2021 NovGeneric data acquisition service with API access for static and dynamic web pagesUsing StormCrawler and SeleniumEUi frontend(Dashboard)with user authentication(EU Login and AWS Cogn
4、ito)Deployment using Infrastructure as Code(IaC)Version 2 was released in 2022 AprAdding the playground for data acquisition to test filters and dynamic web pagesVersion 3 was released in 2022 SepAdditional Selenium filters based on the needs of tourism websitesDashboard:EUi frontend to manage the A
5、PIsDAS:SpringBootAPI with StormCrawlerand Selenium in the backgroundPlayground:SpringBoot with Selenium in the backgroundFirst testing by the WINBased on the feedbackVersion 4 was released in 2022 DecMultitenancy was introducedMoving authentication from AWS Cognito to KeycloakVersion 5 was released
6、in 2023 Feb1stSecurity testing and update Adding new functionalities like advanced search,copy configuration and acquisition action historyVersion 6 was released in 2023 MarIntroduction of user roles(guest,developer and admin)Possibility to use sitemap discove