We study causal interaction in factorial experiments, in which several factors, each with multiple levels, are randomized to form a large number of possible treatment combinations. Examples of such experiments include conjoint analysis, which is often used by social scientists to analyze multidimensional preferences in a population. To characterize the structure of causal interaction in factorial experiments, we propose a new causal interaction effect, called the average marginal interaction effect (AMIE). Unlike the conventional interaction effect, the relative magnitude of the AMIE does not depend on the choice of baseline conditions, making its interpretation intuitive even for higher-order interactions. We show that the AMIE can be nonparametrically estimated using ANOVA regression with weighted zero-sum constraints. Because the AMIEs are invariant to the choice of baseline conditions, we directly regularize them by collapsing levels and selecting factors within a penalized ANOVA framework. This regularized estimation procedure reduces false discovery rate and further facilitates interpretation. Finally, we apply the proposed methodology to the conjoint analysis of ethnic voting behavior in Africa and find clear patterns of causal interaction between politicians’ ethnicity and their prior records. The proposed methodology is implemented in an open source software package.
Although social scientists have long been interested in the process through which ideas and behavior diffuse, the identification of causal diffusion effects, also known as peer effects, remains challenging. Many scholars consider the commonly used assumption of no omitted confounders to be untenable due to contextual confounding and homophily bias. To address this long-standing identification problem, I introduce a class of stationary causal directed acyclic graphs (DAGs), which represent the time-invariant nonparametric causal structure. I first show that this stationary causal DAG implies a new statistical test that can detect a wide range of biases, including the two types mentioned above. The proposed test allows researchers to empirically assess the contentious assumption of no omitted confounders. In addition, I develop a difference-in-difference style estimator that can directly correct biases under an additional parametric assumption. Leveraging the proposed methods, I study the spatial diffusion of hate crimes in Germany. After correcting large upward bias in existing studies, I find hate crimes diffuse only to areas that have a high proportion of school dropouts. To highlight the general applicability of the proposed approach, I also analyze the network diffusion of human rights norms. The proposed methodology is implemented in a forthcoming open source software package.
How stable is support for radical right parties? In one view, radical right voters are antisystem voters, beyond capture by established parties. In another, they form frustrated issue publics, gravitating towards parties that represent their preferences. We evaluate these hypotheses in Germany, where the Alternative für Deutschland (AfD) is presently the largest opposition party. Using an original panel survey, we show that AfD voters resemble stable partisans with entrenched anti-establishment views. Yet, this consistency does not simply reflect antisystem voting, but is also rooted in unchanging party-issue positioning: our experimental evidence reveals that many AfD voters change allegiances when established parties accommodate their preferences. Gridlocked party positioning thus gives rise to the “illusion” of radical right partisan stability. We further demonstrate that, while mainstream parties can attract radical right voters via restrictive immigration policies, they alienate their own voters in doing so - suggesting the status quo is an equilibrium.
Scientists are interested in generalizing causal effects estimated in an experiment to a target population. However, analysts are often constrained by available covariate information, which has limited applicability of existing approaches that assume rich covariate data from both experimental and population samples. As a concrete context, we focus on a large-scale development program, called the Youth Opportunities Program (YOP), in Uganda. Although more than 40 pre-treatment covariates are available in the experiment, only 8 of them were also measured in a target population. To tackle this common issue of data constraints, we propose a method to estimate a separating set – a set of variables affecting both the sampling mechanism and treatment effect heterogeneity – and show that the population average treatment effect (PATE) can be identified by adjusting for estimated separating sets. Our approach has two advantages. First, our algorithm only requires a rich set of covariates in the experimental data, not in the target population. Second, the algorithm can estimate separating sets under researcher-specific constraints on what variables are measured in the population. Using the YOP experiment, we find that the proposed algorithm can allow for estimation of the PATE in situations where conventional methods fail due to data requirements.
New text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories of interest from large collections of text. We introduce a conceptual framework for making causal inferences with discovered measures as a treatment or outcome. Our framework enables researchers to discover high-dimensional textual interventions and estimate the ways that observed treatments affect text-based outcomes. We argue that nearly all text-based causal inferences depend upon a latent representation of the text and we provide a framework to learn the latent representation. But estimating this latent representation, we show, creates new risks: we may introduce an identification problem or overfit. To address these risks, we describe a split-sample framework and apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic response. Our work provides a rigorous foundation for text-based causal inferences.
A landmark in election forensics was the 2012 PNAS paper by Klimek, Yegorov, Hanel and Thurner, which presents the first positive empirical model of election fraud: positive in the sense that the model describes what a fraudulent election looks like and then estimates the amount of fraud (of which there are two types) occurring in a particular election. The model also has the remarkable property of giving pretty much the same estimates regardless of the level of aggregation at which vote counts are observed. Being inspired by complex systems ideas, the Klimek model falls short from a statistical perspective. We modify the Klimek model to improve the shortcomings, introducing chi-squared and finite mixture variants. The resulting models do not appear to be as invariant over levels of aggregation as the original Klimek model. We show that the election fraud probability estimates from the Klimek model (using the chi-squared variant) relate meaningfully to postelection complaints in the 2009 German election and to nullification petitions in the 2006 Mexican election. We also assess how well the fraud parameter estimates predict the complaints and petitions. The complaints and petitions are likely prompted in part by election frauds, so the estimated fraud parameters should relate meaningfully and predictably to them. We also show that the model also is sensitive to the strategies voters are using, at least in Germany. So what the “fraud statistics” measure may be ambiguous.