1、Measuring construction activities using advertisements from real estate portals.ESSnet WIN Work Package 3,Use Case 2WIN conference,Gdansk,February 4th-5th2025 Tobias Gramlich(Hesse State Statistical Office,Germany)Overall goalFrom ads appearing at online real estate portals:can we produce somemeanin
2、gful early estimate of the number of newly constructedbuildings that have become available in a specific year of reference?Implicit limitation by the way of data collection:become available on the(online)market in a specific year.We?SE-SCB,DE-AfS,DE-HSLCan we?Yes,we can!In principle.Well,Maybe,we co
3、uldnt in every detail.What do we need?data Ads from all“/several relevant portals,relevant info from each ad Over a longer period of time(three years)Either collect it or get it by means of an agreement Both typically mean you need to spend some money staff to program and maintain scrapers R and Pyt
4、hon some infrastructure to develop,test and run scrapers,to store data In our case:no production level,so no production level“infrastructure (gain)experience(to)make decisionsThings to decide,experience to gain Which portals to choose?Relevance,accesability,availabbility,reliability,stability,covera
5、ge,overlap Collect or buy?How to define or identify newly constructed objects“?Speficiation error,completeness,validity When to start scraping?How often to scrape?Prospective“scraping,scraping frequency(weekly,daily?Even shorter interval?)Undercoverage How to identify duplicates within and between d
6、ata sources?Completeness,validity,stabilityExamples:available characteristicsExample II:charcteristicsExamples:projects/deduplicationExample:prospective“scraping?Results for DE-HE:2023 Comparison of scraped ads vs.official statistics(completed buildings,2023)Overall coverage(ads to buildings):210%Ov