Recent advances in research tools for the systematic analysis oftextual data are enabling exciting new research throughout the socialsciences. For comparative politics scholars who are often interestedin non-English and possibly multilingual textual datasets, theseadvances may be difficult to access. This paper discusses practicalissues that arise in the the processing, management, translation andanalysis of textual data with a particular focus on how proceduresdiffer across languages. These procedures are combined in two appliedexamples of automated text analysis using the recently introducedStructural Topic Model. We also show how the model can be used toanalyze data that has been translated into a single language viamachine translation tools. All the methods we describe here are implemented in open-source software packages available from the authors.
Content analysis, a widely-applied social science research method, is increasingly being supplemented by topic modeling. However, while the discourse on content analysis centers heavily on reproducibility, computer scientists often focus more on scalability and less on coding reliability, leading to growing skepticism on the usefulness of topic models for automated content analysis. In response, we introduce TopicCheck, an interactive tool for assessing topic model stability. Our contributions are threefold. First, from established guidelines on reproducible content analysis, we distill a set of design requirements on how to computationally assess the stability of an automated coding process. Second, we devise an interactive alignment algorithm for matching latent topics from multiple models, and enable sensitivity evaluation across a large number of models. Finally, we demonstrate that our tool enables social scientists to gain novel insights into three active research questions.
Dealing with the vast quantities of text that students generate in a Massive Open Online Course (MOOC) is a daunting challenge. Computational tools are needed to help instructional teams uncover themes and patterns as MOOC students write in forums, assignments, and surveys. This paper introduces to the learning analytics community the Structural Topic Model, an approach to language processing that can (1) find syntactic patterns with semantic meaning in unstructured text, (2) identify variation in those patterns across covariates, and (3) uncover archetypal texts that exemplify the documents within a topical pattern. We show examples of computationally- aided discovery and reading in three MOOC settings: mapping students’ self-reported motivations, identifying themes in discussion forums, and uncovering patterns of feedback in course evaluations.