Dariya Ordanovich , Esri
Diego Ramiro-Fariñas, Consejo Superior de Investigaciones Científicas (CSIC)
Francesco Billari, Bocconi University
Antonia Tugores, Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB)
Francisco Viciana, Institute of Statistics and Cartography of Andalusia
José J. Ramasco, Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB)
This paper aims to show the potential of Twitter as a rich and timely data source valid for monitoring (‘nowcasting’) and anticipation of demographic trends, shifts and spatial distribution, introducing a significant added value to the official statistical production. Study is built upon an extensive sample of over 100.000.000 tweets with geographic coordinates or location tags retrieved using real-time streaming API for the period from 2015 to 2018 in continental Spain. As a reference data we use official national and regional statistical datasets provided as aggregated time-series and in the form of spatial grid with fertility estimates at minimal spatial resolution of 250 sq.m. in Andalusia. To identify focus group of Twitter users, analyze sentimental scores and latent semantic structures of their timelines a wide range of machine learning techniques is applied.
Presented in Session 3. Population, Development, & the Environment; Data & Methods; Applied Demography