The Generalizability of Twitter Data for Population Research

Guangqing Chi , Pennsylvania State University
Junjun Yin, Pennsylvania State University
Jennifer Van Hook, Pennsylvania State University
Eric Plutzer, Pennsylvania State University
Heng Xu, American University

Social media data such as from Twitter have been used in many fields. Demography, the discipline dealing with numbers the most among all social science disciplines, has been slow in taking advantage of the abundance of Twitter data. The biggest concern is the representativeness of Twitter users for population. This study is to evaluate the extent to which Twitter users (mis)represent the population across different demographic groups. We conduct the research at the county level in the U.S. from 2014–2017 using 96% geotagged tweets. The specific aims are to: extend and refine already developed methods for imputing the gender, age, race/ethnicity, and county of residence of each Twitter user; use these imputed values to assess the (mis)representativeness of Twitter samples at the county level; and explain the determinants of biases. If successful, this research will open the door for demographers to take advantage of rich Twitter data.

 Presented in Session 208. Computational Demography