I am currently the Henry Putnam University Professor of Sociology here at Princeton University. I am also an Adjunct Professor of Community and Preventative Medicine at the Mount Sinai School of Medicine; a Research Associate at the National Bureau of Economic Research; and I serve in a pro bono capacity as the Dean of Arts and Sciences at the University of the People.
My sociology doctoral thesis--written eons ago--explored the role of parental net-worth in perpetuating racial inequality among the post-Civil Rights cohorts of black and white Americans. It was not until 1984 that multiple U.S., nationally-representative social science surveys such as the Panel Study of Income Dynamics (PSID) and the Survey of Income and Program Participation (SIPP) among others started collecting decent data on households' assets and debts. This allowed for a more complete picture of total family economic resources than just measuring income, occupation and education. In fact, when thrown all together in a model, parental income and occupation did not seem to matter to children's life chances. Only parental education and net worth retained predictive power. This observation was especially relevant to race dynamics in the U.S., since data showed that the median black family had an order of magnitude less wealth than the typical white family--and that income differences explained only about half this gap. Indeed, when comparing young adults who came from families with the same parental education and wealth levels, there were no observable gaps in educational attainment, welfare usage, labor market attachment and so on. Not just race, but effects of family structure (such as single parenthood) also seemed to be proxies for wealth effects.
As I was turning my thesis into a book (Being Black, Living in the Red), I came across What Money Can’t Buy: Family Income and Children’s Life Chances by Susan Mayer of the University of Chicago Harris School. Her book challenged my assumptions and altered my research trajectory forever. In this clever volume, Mayer deploys a number of counterfactuals and natural experiments to show that the traditional estimates of the effect of income on children’s life chances have been grossly overstated. For example, she showed that a dollar from a transfer payment had little to no effect on children while a dollar from earnings had a much bigger effect—suggesting that it was the underlying attributes of the parents that led them to earn money that were having the positive effect, not the dollars per se. Further, controlling for parental income after the fact—i.e. when the offspring was already an adult—wiped out the effect of parental income that temporally preceded the child measure in question. She also showed that additional income did not usually result in the purchase of goods or services that we would expect to improve the human capital or life chances of children. While there are certainly limitations to her work and some questionable assumptions in her models, she upended the world of poverty research as far as I was concerned, lifting a methodological veil to reveal biased parameter estimates plaguing the literature.
While I went on to publish my book with the appropriate warnings against interpretation of parental wealth “effects” as causal, the Mayer work sent me off in search of a correctly specified way to assess the impact of parental resources and family conditions on children’s outcomes. This journey led me first to econometrics and labor economics, which I viewed as well ahead of sociology in confronting the issue of endogeneity and selection bias. Though I found difference-in-differences, instrumental variable, and regression discontinuity approaches helpful in generating more consistent estimates, such approaches all suffered from the limitation that the researcher had to take what she could get. In other words, the research questions that one can answer were driven by the availability of a natural experiment. There is—as far as I know—no good instrumental variable for parental wealth, for example. There is no regression discontinuity for race. Even if we considered randomized controlled trials, there remained the well-documented issues such as limited external validity as pointed out by Angus Deaton and others. What's more, reliance on RCTs or natural experiments puts narrow boundaries on the sorts of factors that are manipulable and therefore amenable to being studied in a causal, counterfactual framework. To quote Penn sociologist Herb Smith, “Nobody denies that the moon causes the tides even though we can’t perform an experiment on it.”
This frustration, in turn, led me to study genetics. (I received a Ph.D. in biology [genomics] from NYU in 2014.) The recent addition of genetic markers (single nucleotide polymorphisms or SNPs) to large datasets such as the Health and Retirement Study, the National Longitudinal Survey of Adolescent to Adult Health and the Wisconsin Longitudinal Survey has opened up a new frontier for the social sciences. We now enjoy the possibility of directly confronting, measuring and controlling for one of the two main "lurking" variables bias traditional models of socioeconomic attainment. That lurking variable is, of course, genetic endowment. (The other being the influence of cultural practices that are also transmitted across generations.) In my current research, I attempt to model directly what has until now been a latent variable in models of socioeconomic attainment. By constructing and including polygenic scores (PGSs) for outcomes, we can obtain better specified, less biased parameter estimates for the variables (such as parental education, etc.) that typically interest social scientists. Further, we can then interact genetic propensity with exogenous environmental variables to go from the adage "a gene for aggression lands you in the board room if you are to the manor born but in prison if you're from the ghetto" to a robust research agenda on GxE effects. I believe this is one of the next frontiers in stratification research: integrating the big data of genomics with the established social scientific models of mobility.
The collection of these data and advances in econometric methods represent a major potential shift for the social sciences as they broaden and deepen the study of the transmission of social behaviors. To date, modeling genetic influences on social outcomes among humans was mainly the province of behavioral geneticists who relied on adoption studies and MZ v. DZ twin comparisons in order to quantify the degree of (unmeasured) genetic influence on behavioral phenotypes. These studies are controversial and the assumptions underlying them have been questioned (e.g., Goldberger 1979). The shift to the study of specific genetic markers offers hope for those interested in an explicit research program aimed at specifying and measuring gene-specific effects for complex traits such as behavioral phenotypes (Manski 2011).
The first payoff to actual, measured genotypes to mobility research is in the specification of proper models for traditional variables. While replicable “hits” for individual markers at the genome wide significance level (5x10-8) for behavioral outcomes have been far and few between thanks to power problems, researchers have had more success in generating overall polygenic scores by progressively adding the coefficients associated with genetic markers to generate a predictive index that performs fairly well in terms of R2. By controlling for PGSs, we can observe what happens to the “traditional” variables in mobility research such as parental education, income, occupation or wealth. (Ideally, with some datasets we can control for both parents’ genotypes in addition to the offspring genotype.)
In addition to generating genetically less biased effects of social variables that were previously resistant to econometric techniques such as IV regression, the move to studying SNPs and other genetic polymorphisms has opened up a particularly promising research program on genetic-(social) environmental (GxE) interactions in human populations. The estimation of such interaction effects has long been a goal of social scientists fond of expressing the dependence of genetic expression on social structure. Caspi et al. (2002, 2003) suggested an important, genetic source of heterogeneity in responses to adverse early-life events, attempting to partially answer the question of why some individuals are resilient to stressors while others suffer deleterious psychological sequelae. While these studies created substantial interest in potential gene-by-environment interactions, they also required replication and extension by other researchers using alternative data. Indeed, there are now competing meta-analyses suggesting either that the original results linking differential response to stress by genotype are reasonably robust (Karg et al. 2011) or lack consistent supporting replication (Risch et al. 2009).
The discussion generated by this line of research in the biological and social science communities has been productive because it has led to a greater appreciation of the shortcoming of Caspi et al.’s research design - namely that the alleles and the proposed environmental modifiers may not be randomly assigned in the population and may therefore correlate with unobserved causal factors. For example, it may be the case that an observed interaction between a genetic variant and environmental exposure actually reflects differential risk of exposure (e.g., “genes selecting environments”) rather than the genetic moderation of exogenous environmental exposures. This is known as gene-environment correlation (rGE). In this way, measured environments may be correlated with unmeasured genetic variation and thus could be acting as proxy for a gene-by-gene interaction rather than a GxE interaction. Conversely, early studies of candidate genes left open the possibility that the measured genotypes were themselves serving as proxies for unmeasured environments, leading to ExE effects disguised as GxE.
While there is active involvement in enrolling individuals in RCTs and examining genetic heterogeneity of causal effects to help solve these inference problems, this is only a small area of research (typically in pharmacology or toxicology) and likely does not have the capacity to answer many important GxE questions of broader relevance to social science. Because many social interventions occur on a large scale, such as state soda taxation, federal alcohol access policies (e.g. the Minimum Legal Drinking Age of 21), and federal guidelines for clinical care, only large epidemiological and social science data and methods, combined with genetic and biomarker measures, can be deployed to examine issues related to broad questions of major policy import—such as why some interventions (like sin taxes) work for certain individuals and sub-populations and not for others or why certain pedagogical styles (whole word v. phonics language instruction, for example) are more or less effective with given student populations
Thus, my current work applies econometric methods for causal inference--namely, a natural experiment framework--to genome-wide data available in social surveys to model gene-by-environment interaction effects. Examples in this vein include deploying the Vietnam draft lottery, twin differences in birth weight, exogenous job loss (such as plant closure), and sibling differences in genotype (polygenic scores) to questions of health, development and socioeconomic attainment across the life course. I am also interested in mapping the genetic architecture of phenotypic plasticity, interrogating the assumptions underlying models for heritability, and characterizing social and genetic sorting (e.g., assortative mating and differential fertility) as distinct processes.
 I should note that subsequent experimental research like the American Dream Demonstration project and the Oklahoma SEED study has shown very minor to modest effects of wealth, highlighting the bias inherent in bread and butter regressions such as those I ran at the beginning of my career.
 Similar efforts are also underway in Europe, for example with the Biobank Project in the United Kingdom (Ollier et al., 2005) and large-scale genotyping of subjects at several European twin registries.