Using Twitter for Demographic and Social Science Research: Tools for Data Collection
Hedwig Lee, University of Washington
Nina Cesare, University of Washington
Ali Shojaie, University of Washington
Despite widespread success in using Twitter data to explain what people are doing or talking about, little attention has been paid to developing systematic ways of gathering demographic information from this source. This paper develops a scalable, sustainable toolkit for social science researchers interested in using Twitter data to examine behaviors and attitudes, as well as understand the populations expressing them. We begin by describing how to collect Twitter data on a particular population – in this case, individuals who did not plan to vote in the 2012 U.S. presidential election. We then describe and evaluate a method for processing data to retrieve demographic information reported by users that is not encoded as text (e.g., details of images) and assess the reliability of these techniques. We end by assessing the challenges of this data collection method and discussing how large-scale social media data may benefit demographic researchers.