Publications | Lazer Lab

Recent publications

July 12, 2021

Backfire effects after correcting misinformation are strongly associated with reliability

Briony Swire-Thompson, Nicholas Miklaucic, John Wihbey, David Lazer, Joseph DeGutis

Journal of Experimental Psychology: General

Abstract

The backfire effect is when a correction increases belief in the very misconception it is attempting to correct, and it is often used as a reason not to correct misinformation. The current study aimed to test whether correcting misinformation increases belief more than a no-correction control. Furthermore, we aimed to examine whether item-level differences in backfire rates were associated with test-retest reliability or theoretically meaningful factors. These factors included worldview-related attributes, namely perceived importance and strength of pre-correction belief, and familiarity-related attributes, namely perceived novelty and the illusory truth effect. In two nearly identical experiments, we conducted a longitudinal pre/post design with N = 388 and 532 participants. Participants rated 21 misinformation items and were assigned to a correction condition or test-retest control. We found that no items backfired more in the correction condition compared to test-retest control or initial belief ratings. Item backfire rates were strongly negatively correlated with item reliability (‚ç¥ = -.61 / -.73) and did not correlate with worldview-related attributes. Familiarity-related attributes were significantly correlated with backfire rate, though they did not consistently account for unique variance beyond reliability. While there have been previous papers highlighting the non-replicable nature of backfire effects, the current findings provide a potential mechanism for this poor replicability. It is crucial for future research into backfire effects to use reliable measures, report the reliability of their measures, and take reliability into account in analyses. Furthermore, fact-checkers and communicators should not avoid giving corrective information due to backfire concerns.

July 1, 2021

PDF

Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong, Denis Peskov, Jordan Boyd-Graber, Philip Resnik

Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

View PDF

Abstract

Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.

June 30, 2021

PDF

Meaningful measures of human society in the twenty-first century

David Lazer, Eszter Hargittai, Deen Freelon, Sandra Gonzalez-Bailon, Kevin Munger, Katherine Ognyanova, Jason Radford

Nature

View PDF

Abstract

Science rarely proceeds beyond what scientists can observe and measure, and sometimes what can be observed proceeds far ahead of scientific understanding. The twenty-first century offers such a moment in the study of human societies. A vastly larger share of behaviours is observed today than would have been imaginable at the close of the twentieth century. Our interpersonal communication, our movements and many of our everyday actions, are all potentially accessible for scientific research; sometimes through purposive instrumentation for scientific objectives (for example, satellite imagery), but far more often these objectives are, literally, an afterthought (for example, Twitter data streams). Here we evaluate the potential of this massive instrumentation‚Äîthe creation of techniques for the structured representation and quantification‚Äîof human behaviour through the lens of scientific measurement and its principles. In particular, we focus on the question of how we extract scientific meaning from data that often were not created for such purposes. These data present conceptual, computational and ethical challenges that require a rejuvenation of our scientific theories to keep up with the rapidly changing social realities and our capacities to capture them. We require, in other words, new approaches to manage, use and analyse data.

April 26, 2021

PDF

Pandemics, Protests, and Publics

Sarah Shugars, Adina Gitomer, Stefan McCabe, Ryan J. Gallagher, Kenneth Joseph, Nir Grinberg, Larissa Doroshenko, Brooke Foucault Welles, David Lazer

Journal of Quantitative Description: Digital Media

View PDF

Abstract

As an integral component of public discourse, Twitter is among the main data sources for scholarship in this area. However, there is much that scholars do not know about the basic mechanisms of public discourse on Twitter, including the prevalence of various modes of communication, the types of posts users make, the engagement those posts receive, or how these things vary with user demographics and across different topical events. This paper broadens our understanding of these aspects of public discourse. We focus on the first nine months of 2020, studying that period as a whole and giving particular attention to two monumentally important topics of that time: the Black Lives Matter movement and the COVID-19 pandemic. Leveraging a panel of 1.6 million Twitter accounts matched to U.S. voting records, we examine the demographics, activity, and engagement of 800,000 American adults who collectively posted nearly 300 million tweets during this time span. We find notable variation in user activity and engagement, in terms of modality (e.g., retweets vs. replies), demographic subgroup, and topical context. We further find that while Twitter can best be understood as a collection of interconnected publics, neither topical nor demographic variation perfectly encapsulates the "Twitter public." Rather, Twitter publics are fluid, contextual communities which form around salient topics and are informed by demographic identities. Together, this paper presents a disaggregated, multifaceted description of the demographics, activity, and engagement of American Twitter users in 2020.

January 13, 2021

PDF

Identifying and Measuring Conditional Policy Preferences: The Case of Opening Schools During a Pandemic

Jon Green, Matthew Baum, James Druckman, David Lazer, Katherine Ognyanova, Matthew Simonson, Roy Perlis, Mauricio Santillana

OSF Preprints

View PDF

Abstract

An individual‚Äôs issue preferences are non-separable when they depend on other issue outcomes (Lacy 2001a), presenting measurement challenges for traditional survey research.We extend this logic to the broader case of conditional preferences, in which policy preferences depend on the status of conditions with inherent levels of uncertainty - and are not necessarily policies themselves. We demonstrate new approaches for measuring conditional preferences in two large-scale survey experiments regarding the conditions under which citizens would support reopening schools in their communities during the COVID-19 pandemic. By drawing on recently developed methods at the intersection of machine learning and causal inference, we identify which citizens are most likely to have school reopening preferences that depend on additional considerations. The results highlight the advantages of using such approaches to measure conditional preferences, which represent an under appreciated and general phenomenon in public opinion.

January 8, 2021

PDF

Survey data and human computation for improved flu tracking

Stefan Wojcik, Avleen S. Bijral, Richard Johnston, Juan M. Lavista Ferres, Gary King, Ryan Kennedy, Alessandro Vespignani & David Lazer

Nature Communications

View PDF

Abstract

While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users‚Äô online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.

September 15, 2020

PDF

Sustained Online Amplification of COVID-19 Elites in the United States

Ryan J. Gallagher, Larissa Doroshenko, Sarah Shugars, David Lazer, Brooke Foucault Welles

Social Media + Society

View PDF

Abstract

The ongoing, fluid nature of the COVID-19 pandemic requires individuals to regularly seek information about best health practices, local community spreading, and public health guidelines. In the absence of a unified response to the pandemic in the United States and clear, consistent directives from federal and local officials, people have used social media to collectively crowdsource COVID-19 elites, a small set of trusted COVID-19 information sources. We take a census of COVID-19 crowdsourced elites in the United States who have received sustained attention on Twitter during the pandemic. Using a mixed methods approach with a panel of Twitter users linked to public U.S. voter registration records, we find that journalists, media outlets, and political accounts have been consistently amplified around COVID-19, while epidemiologists, public health officials, and medical professionals make up only a small portion of all COVID-19 elites on Twitter. We show that COVID-19 elites vary considerably across demographic groups, and that there are notable racial, geographic, and political similarities and disparities between various groups and the demographics of their elites. With this variation in mind, we discuss the potential for using the disproportionate online voice of crowdsourced COVID-19 elites to equitably promote timely public health information and mitigate rampant misinformation.

August 28, 2020

PDF

Computational social science: Obstacles and opportunities

David M. J. Lazer, Alex Pentland, Duncan J. Watts, Sinan Aral, Susan Athey, Noshir Contractor, Deen Freelon, Sandra Gonzalez-Bailon, Gary King, Helen Margetts, Alondra Nelson, Matthew J. Salganik, Markus Strohmaier, Alessandro Vespignani, Claudia Wagner

Science

View PDF

Abstract

The field of computational social science (CSS) has exploded in prominence over the past decade, with thousands of papers published using observational data, experimental designs, and large-scale simulations that were once unfeasible or unavailable to researchers. These studies have greatly improved our understanding of important phenomena, ranging from social inequality to the spread of infectious diseases. The institutions supporting CSS in the academy have also grown substantially, as evidenced by the proliferation of conferences, workshops, and summer schools across the globe, across disciplines, and across sources of data. But the field has also fallen short in important ways. Many institutional structures around the field‚Äîincluding research ethics, pedagogy, and data infrastructure‚Äîare still nascent. We suggest opportunities to address these issues, especially in improving the alignment between the organization of the 20th-century university and the intellectual requirements of the field.

June 9, 2020

PDF

Using Opt-in Non-Probability Surveys for Over-Time State Level Estimates: The Role‬ ‭ of Guardrails and Data Quality‬

Alexi Quintana-Mathe, Ata A. Uslu, Jason Radford‬‭,‬‭ James N. Druckman‬‭, Kristin Lunz Trujillo‬‭, Alauna Safarpour‬‭, Katherine Ognyanova‬‭, Matthew A.‬‭ Baum‬‭, Jonathan Schulman‬‭, Roy H. Perlis‬‭‬,‬ Mauricio Santillana‬‭, David Lazer

The COVID States Project

View PDF

Abstract

Socially impactful phenomena often occur across time and space – examples include pandemics, climate change, mass protests, and political campaigns. Survey data can play an important role in understanding these situations. Yet, there is a tension such that obtaining longitudinal, spatially expansive probability samples can be logistically and financially untenable. We offer a strategy that relies on (accessible) opt-in non-probability samples to obtain temporal and spatial granularity. It requires the use of multiple guardrail benchmarks (e.g., probability samples, administrative data) to validate a contemporaneous estimate, and acute attention to sampling representativeness and response validity. We demonstrate the utility of this approach with data on vaccination and infection rates during the COVID-19 pandemic in the U.S. We show that the opt-in non-probability data not only offer accurate estimates but also outperform problematic administrative benchmarks during certain time periods. We encourage further discussion about how to establish infrastructure to address emergent topics where probability samples are not readily available.

May 19, 2020

PDF

Theory In, Theory Out: The Uses of Social Theory in Machine Learning for Social Science

Jason Radford and Kenneth Joseph

Frontiers in Big Data

View PDF

Abstract

Research at the intersection of machine learning and the social sciences has provided critical new insights into social behavior. At the same time, a variety of issues have been identified with the machine learning models used to analyze social data. These issues range from technical problems with the data used and features constructed, to problematic modeling assumptions, to limited interpretability, to the models' contributions to bias and inequality. Computational researchers have sought out technical solutions to these problems. The primary contribution of the present work is to argue that there is a limit to these technical solutions. At this limit, we must instead turn to social theory. We show how social theory can be used to answer basic methodological and interpretive questions that technical solutions cannot when building machine learning models, and when assessing, comparing, and using those models. In both cases, we draw on related existing critiques, provide examples of how social theory has already been used constructively in existing work, and discuss where other existing work may have benefited from the use of specific social theories. We believe this paper can act as a guide for computer and social scientists alike to navigate the substantive questions involved in applying the tools of machine learning to social data.

Year

2025

2024

2023

2022