The intersection of causal inference and machine learning is a rapidly advancing field. We propose a new approach, the method of direct estimation, that draws on both traditions in order to obtain nonparametric estimates of treatment effects. The approach focuses on estimating the effect of fluctuations in a treatment variable on an outcome. A tensor-spline implementation enables rich interactions between functional bases allowing for the approach to capture treatment/covariate interactions. We show how new innovations in Bayesian sparse modeling readily handle the proposed framework, and then document its performance in simulation and applied examples. Furthermore we show how the method of direct estimation can easily extend to structural estimators commonly used in a variety of disciplines, like instrumental variables, mediation analysis, and sequential g-estimation.
We introduce a method for scaling two data sets from different sources. The proposed method estimates a latent component common to both datasets as well as an idiosyncratic component unique to each. The scaled locations can be modeled as a function of covariates, and efficient implementation allows for inference through resampling methods. A simulation study shows that our proposed method outperforms existing alternatives in recovering latent dimensions. We employ our method to recover latent policy positions of Federal Open Market Committee (FOMC) members around the financial crisis of 2008 using both their speech and policy recommendations obtained from monetary policy meetings.
We introduce a framework for combining vote data and text data within a single formal and statistical framework. Formally, we model vote choice and word choice in terms of a common set of underlying preference parameters. Statistically, we implement a method for recovering these preference and location parameters. The method estimates the number of underlying ideological dimensions, models zero inflation, and is robust to extreme outliers. We apply the method to rollcall and floor speech from recent US Senates. We find two stable dimensions, one ideological and the other capturing leadership. We then show how the method can leverage common speech in order to impute missing data, to estimate rank-and-file ideal points using only their words and the vote history of party leaders, and even to scale newspaper editorials.
We introduce a Bayesian method, LASSOplus, that unifies recent contributions in the sparse modeling literatures, while substantially extending pre-existing estimators in terms of both performance and flexibility. Unlike existing Bayesian variable selection methods, LASSOplus both selects and estimates effects while returning estimated confidence intervals for discovered effects. Furthermore, we show how LASSOplus easily extends to modeling repeated observations and permits a simple Bonferroni correction to control coverage on confidence intervals among discovered effects. We situate LASSOplus in the literature on how to estimate subgroup effects, a topic that often leads to a proliferation of estimation parameters. We also offer a simple preprocessing step that draws on recent theoretical work to estimate higher-order effects that can be interpreted independently of their lower-order terms. A simulation study illustrates the method’s performance relative to several existing variable selection methods. In addition, we apply LASSOplus to an existing study on public support for climate treaties to illustrate the method’s ability to discover substantive and relevant effects. Software implementing the method is publicly available in the R package sparsereg.
Marginal structural models (MSMs) are becoming increasingly popular as a tool for causal inference from longitudinal data. Unlike standard regression models, MSMs can adjust for time-dependent observed confounders while avoiding the bias due to the direct adjustment for covariates affected by the treatment. Despite their theoretical appeal, a main practical difficulty of MSMs is the required estimation of inverse probability weights. Previous studies have found that MSMs can be highly sensitive to misspecification of treatment assignment model even when the number of time periods is moderate. To address this problem, we generalize the covariate balancing propensity score (CBPS) methodology of Imai and Ratkovic to longitudinal analysis settings. The CBPS estimates the inverse probability weights such that the resulting covariate balance is improved. Unlike the standard approach, the proposed methodology incorporates all covariate balancing conditions across multiple time periods. Since the number of these conditions grows exponentially as the number of time period increases, we also propose a low-rank approximation to ease the computational burden. Our simulation and empirical studies suggest that the CBPS significantly improves the empirical performance of MSMs by making the treatment assignment model more robust to misspecification. Open-source software is available for implementing the proposed methods.
The propensity score plays a central role in a variety of causal inference settings. In particular, matching and weighting methods based on the estimated propensity score have become increasingly common in the analysis of observational data. Despite their popularity and theoretical appeal, the main practical difficulty of these methods is that the propensity score must be estimated. Researchers have found that slight misspecification of the propensity score model can result in substantial bias of estimated treatment effects. We introduce covariate balancing propensity score (CBPS) methodology, which models treatment assignment while optimizing the covariate balance. The CBPS exploits the dual characteristics of the propensity score as a covariate balancing score and the conditional probability of treatment assignment. The estimation of the CBPS is done within the generalized method-of-moments or empirical likelihood framework. We find that the CBPS dramatically improves the poor empirical performance of propensity score matching and weighting methods reported in the literature. We also show that the CBPS can be extended to other important settings, including the estimation of the generalized propensity score for non-binary treatments and the generalization of experimental estimates to a target population. Open source software is available for implementing the methods proposed.
When evaluating the efficacy of social programs and medical treatments using randomized experiments, the estimated overall average causal effect alone is often of limited value and the researchers must investigate when the treatments do and do not work. Indeed, the estimation of treatment effect heterogeneity plays an essential role in (1) selecting the most effective treatment from a large number of available treatments, (2) ascertaining subpopulations for which a treatment is effective or harmful, (3) designing individualized optimal treatment regimes, (4) testing for the existence or lack of heterogeneous treatment effects, and (5) generalizing causal effect estimates obtained from an experimental sample to a target population. In this paper, we formulate the estimation of heterogeneous treatment effects as a variable selection problem. We propose a method that adapts the Support Vector Machine classifier by placing separate sparsity constraints over the pre-treatment parameters and causal heterogeneity parameters of interest. The proposed method is motivated by and applied to two well-known randomized evaluation studies in the social sciences. Our method selects the most effective voter mobilization strategies from a large number of alternative strategies, and it also identifies the characteristics of workers who greatly benefit from (or are negatively affected by) a job training program. In our simulation studies, we find that the proposed method often outperforms some commonly used alternatives.
Many social processes are stable and smooth in general, with discrete jumps. We develop a sequential segmentation spline method that can identify both the location and the number of discontinuities in a series of observations with a time component, while fitting a smooth spline between jumps, using a modified Bayesian Information Criterion statistic as a stopping rule. We explore the method in a large-n, unbalanced panel setting with George W. Bush's approval data, a small-n time series with median DW-NOMINATE scores for each Congress over time, and a series of simulations. We compare the method to several extant smoothers, and the method performs favorably in terms of visual inspection, residual properties, and event detection. Finally, we discuss extensions of the method.