Publications

Recent publications

April 26, 2021

PDF
Sarah Shugars, Adina Gitomer, Stefan McCabe, Ryan J. Gallagher, Kenneth Joseph, Nir Grinberg, Larissa Doroshenko, Brooke Foucault Welles, David Lazer

Journal of Quantitative Description: Digital Media

Abstract

+

As an integral component of public discourse, Twitter is among the main data sources for scholarship in this area. However, there is much that scholars do not know about the basic mechanisms of public discourse on Twitter, including the prevalence of various modes of communication, the types of posts users make, the engagement those posts receive, or how these things vary with user demographics and across different topical events. This paper broadens our understanding of these aspects of public discourse. We focus on the first nine months of 2020, studying that period as a whole and giving particular attention to two monumentally important topics of that time: the Black Lives Matter movement and the COVID-19 pandemic. Leveraging a panel of 1.6 million Twitter accounts matched to U.S. voting records, we examine the demographics, activity, and engagement of 800,000 American adults who collectively posted nearly 300 million tweets during this time span. We find notable variation in user activity and engagement, in terms of modality (e.g., retweets vs. replies), demographic subgroup, and topical context. We further find that while Twitter can best be understood as a collection of interconnected publics, neither topical nor demographic variation perfectly encapsulates the "Twitter public." Rather, Twitter publics are fluid, contextual communities which form around salient topics and are informed by demographic identities. Together, this paper presents a disaggregated, multifaceted description of the demographics, activity, and engagement of American Twitter users in 2020.

January 13, 2021

PDF
Jon Green, Matthew Baum, James Druckman, David Lazer, Katherine Ognyanova, Matthew Simonson, Roy Perlis, Mauricio Santillana

Abstract

+

An individual’s issue preferences are non-separable when they depend on other issue outcomes (Lacy 2001a), presenting measurement challenges for traditional survey research.We extend this logic to the broader case of conditional preferences, in which policy preferences depend on the status of conditions with inherent levels of uncertainty - and are not necessarily policies themselves. We demonstrate new approaches for measuring conditional preferences in two large-scale survey experiments regarding the conditions under which citizens would support reopening schools in their communities during the COVID-19 pandemic. By drawing on recently developed methods at the intersection of machine learning and causal inference, we identify which citizens are most likely to have school reopening preferences that depend on additional considerations. The results highlight the advantages of using such approaches to measure conditional preferences, which represent an under appreciated and general phenomenon in public opinion.

January 8, 2021

PDF
Stefan Wojcik, Avleen S. Bijral, Richard Johnston, Juan M. Lavista Ferres, Gary King, Ryan Kennedy, Alessandro Vespignani & David Lazer

Nature Communications

Abstract

+

While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users’ online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.

September 15, 2020

PDF
Ryan J. Gallagher, Larissa Doroshenko, Sarah Shugars, David Lazer, Brooke Foucault Welles

Social Media + Society

Abstract

+

The ongoing, fluid nature of the COVID-19 pandemic requires individuals to regularly seek information about best health practices, local community spreading, and public health guidelines. In the absence of a unified response to the pandemic in the United States and clear, consistent directives from federal and local officials, people have used social media to collectively crowdsource COVID-19 elites, a small set of trusted COVID-19 information sources. We take a census of COVID-19 crowdsourced elites in the United States who have received sustained attention on Twitter during the pandemic. Using a mixed methods approach with a panel of Twitter users linked to public U.S. voter registration records, we find that journalists, media outlets, and political accounts have been consistently amplified around COVID-19, while epidemiologists, public health officials, and medical professionals make up only a small portion of all COVID-19 elites on Twitter. We show that COVID-19 elites vary considerably across demographic groups, and that there are notable racial, geographic, and political similarities and disparities between various groups and the demographics of their elites. With this variation in mind, we discuss the potential for using the disproportionate online voice of crowdsourced COVID-19 elites to equitably promote timely public health information and mitigate rampant misinformation.

August 28, 2020

PDF
David M. J. Lazer, Alex Pentland, Duncan J. Watts, Sinan Aral, Susan Athey, Noshir Contractor, Deen Freelon, Sandra Gonzalez-Bailon, Gary King, Helen Margetts, Alondra Nelson, Matthew J. Salganik, Markus Strohmaier, Alessandro Vespignani, Claudia Wagner

Abstract

+

The field of computational social science (CSS) has exploded in prominence over the past decade, with thousands of papers published using observational data, experimental designs, and large-scale simulations that were once unfeasible or unavailable to researchers. These studies have greatly improved our understanding of important phenomena, ranging from social inequality to the spread of infectious diseases. The institutions supporting CSS in the academy have also grown substantially, as evidenced by the proliferation of conferences, workshops, and summer schools across the globe, across disciplines, and across sources of data. But the field has also fallen short in important ways. Many institutional structures around the field—including research ethics, pedagogy, and data infrastructure—are still nascent. We suggest opportunities to address these issues, especially in improving the alignment between the organization of the 20th-century university and the intellectual requirements of the field.

May 19, 2020

PDF
Jason Radford and Kenneth Joseph

Frontiers in Big Data

Abstract

+

Research at the intersection of machine learning and the social sciences has provided critical new insights into social behavior. At the same time, a variety of issues have been identified with the machine learning models used to analyze social data. These issues range from technical problems with the data used and features constructed, to problematic modeling assumptions, to limited interpretability, to the models' contributions to bias and inequality. Computational researchers have sought out technical solutions to these problems. The primary contribution of the present work is to argue that there is a limit to these technical solutions. At this limit, we must instead turn to social theory. We show how social theory can be used to answer basic methodological and interpretive questions that technical solutions cannot when building machine learning models, and when assessing, comparing, and using those models. In both cases, we draw on related existing critiques, provide examples of how social theory has already been used constructively in existing work, and discuss where other existing work may have benefited from the use of specific social theories. We believe this paper can act as a guide for computer and social scientists alike to navigate the substantive questions involved in applying the tools of machine learning to social data.

May 14, 2020

PDF
Briony Swire-Thompson, Joseph DeGutis, David Lazer

Journal of Applied Memory and Cognition

Abstract

+

One of the most concerning notions for science communicators, fact-checkers, and advocates of truth, is the backfire effect; this is when a correction leads to an individual increasing their belief in the very misconception the correction is aiming to rectify. There is currently a debate in the literature as to whether backfire effects exist at all, as recent studies have failed to find the phenomenon, even under theoretically favorable conditions. In this review, we summarize the current state of the worldview and familiarity backfire effect literatures. We subsequently examine barriers to measuring the backfire phenomenon, discuss approaches to improving measurement and design, and conclude with recommendations for fact-checkers. We suggest that backfire effects are not a robust empirical phenomenon, and more reliable measures, powerful designs, and stronger links between experimental design and theory, could greatly help move the field ahead.

May 11, 2020

PDF
Brennan Klein, Timothy LaRock, Stefan McCabe, Leo Torres, Lisa Friedland, Filippo Privitera, Brennan Lake, Moritz U. G. Kraemer, John S. Brownstein, David Lazer, Tina Eliassi-Rad, Samuel V. Scarpino, Alessandro Vespignani, and Matteo Chinazzi

Abstract

+

In March 2020, many state and local governments in the United States enacted stay-at-home policies banning mass gatherings, closing schools, and promoting remote working. By analyzing anonymized location data from millions of mobile devices, we quantify how much people have reduced their daily mobility and physical contacts in accordance with these guidelines. At the regional level, we measure declines in daily commute volume as well as transit between major urban areas. At the individual level, we measure changes in the average user's daily range of mobility, number of unique contacts, and number of co-location events. According to these five measures, we estimate that the average person in the United States had reduced their daily mobility by between 45-55% as of late April, 2020, and had reduced their daily contacts between 65-75%. The United States' physical distancing guidelines expired on April 30, 2020 and are not set to be renewed; as of early May, 2020, we report increases in mobility and contact patterns across most states (up to 10-14%, compared to the last week of April), though we do not observe a commensurate increase in commute volume. The response to the COVID-19 pandemic has amounted to one of the largest disruptions of economic, social, and mobility behavior in history, and quantifying these disruptions is vital for forecasting the further spread of this pandemic and crafting our collective response.

March 31, 2020

PDF
Brennan Klein, Timothy LaRock, Stefan McCabe, Leo Torres, Filippo Privitera, Brennan Lake, Moritz U. G. Kraemer, John S. Brownstein, David Lazer, Tina Eliassi-Rad, Samuel V. Scarpino, Matteo Chinazzi, and Alessandro Vespignani

Abstract

+

On March 16, 2020, the United States government issued new guidelines promoting public health social social distancing interventions to reduce the spread of the COVID-19 epidemic in the country [1]. In addition, many state and local governments in the United States have enacted stay-at-home policies banning mass gatherings, enforcing school closures, and promoting smart working. So far, however, the extent to which these policies have resulted in reduced people's mobility has not been quantified. By analyzing data from millions of (anonymized, aggregated, privacy-enhanced) devices, we estimate that by March 23 the policies have generally reduced by half the overall mobility in several major U.S. cities. In order to gauge the observed results we know events, we note that the commuting volume on Monday, March 16, approached those of a typical snow day or analogous day when public schools are partially closed (i.e. January 2). By Friday, March 20, we observe commuting numbers that resemble those measured on federal holidays (i.e. Martin Luther King Jr. Day in January or Presidents' Day in February). Currently, we are unable to quantify the extent to which this reduced commuting volume is driven by people working from home or simply an increase in unemployment, though it is surely a mixture of both. Whether this reduction in mobility is enough to change the course of this pandemic is not yet known, but it does provide guidance for further measures that can be implemented at a national scale in the United States.

July 6, 2019

PDF
Kenneth Joseph, Briony Swire-Thompson, Hannah Masuga, Matthew A. Baum, David Lazer

Proceedings of the International AAAI Conference on Web and Social Media

Abstract

+

Using both survey- and platform-based measures of support, we study how polarization manifests for 4,313 of President Donald Trump's tweets since he was inaugurated in 2017. We find high levels of polarization in response to Trump's tweets. However, after controlling for mean differences, we surprisingly find a high degree of agreement across partisan lines across both survey and platform-based measures. This suggests that Republicans and Democrats, while disagreeing on an absolute level, tend to agree on the relative quality of Trump's tweets. We assess potential reasons for this, for example, by studying how support changes in response to tweets containing positive versus negative language. We also explore how Democrats and Republicans respond to tweets containing insults of individuals with particular socio-demographics, finding that Republican support decreases when Republicans, relative to Democrats, are insulted, and Democrats respond negatively to insults of women and members of the media.