101 Days of Evaluation: August 25, 2020

Another resurrected post from the past!

Lately, I’ve been focused on better ways to synthesize the academic literature. Even when I was a graduate student (and the job description was pretty much 50% reading), I couldn’t keep up with new articles that were coming out seemingly daily. For people who don’t have the luxury of dedicated reading time, like those who actually practice and do evaluation work, it’s nearly impossible to read the latest articles. It certainly doesn’t help that academic articles are generally jargon-filled and written in indecipherable prose.

To get around this problem, I’ve started gathering and synthesizing the evaluation literature using techniques from Natural Language Processing (NLP). In future posts, I’ll actually detail the mechanics of this process, but for now, let’s dive into the results.

I pulled all the recently published articles from Evaluation, Evaluation and the Health Professions, Evaluation and Program Planning, Evaluation Review, Research Evaluation, The Canadian Journal of Program Evaluation, The American Journal of Evaluation, The Journal of Multidisciplinary Evaluation, New Directions for Evaluation. Obviously, I’ve left some things out, like a lot of the education journals, but this seemed like a good place to start. In the past 101 days, there have been 144! new articles. That would be a lot to read. That would be a lot to skim!

Network diagram of word pairs

This first figure is a network diagram of the top 30 occurring word pairs. These are the phrases that occur the most frequently across all the articles. The thicker the line, the more frequently a pair occurs. I did some pre-processing first, so many words are reduced to the “stem” form (e.g., “mixed” becomes “mix”). So the expected relationships show up, like “mix[ed] method” and “evidence base”. There are some other interesting concepts, though, like “dissemination readiness,” “situational awareness,” and “cultural narratives.”

LDA results

To take a deeper dive, I then used Latent Dirichlet Allocation (LDA) to model the topics that occurred. In Topic Modeling, topics represent clusters of words that occur together, and documents represent clusters of topics. For example, if the words “home run,” “out,” and “LOL Mets” occur together, then the topic is likely to be “baseball.”

*Disclosure: I am a Phillies fan, and yes, it is tough this year*

So, this figure shows ten topics and the words most likely to occur in those topics. These are empirically generated, so it’s up to the reader to try to make sense of them. Computers can’t replace us yet. Topic 1 seems to be about community-based processes, while Topic 8 seems to be more about measuring outcomes following the implementation of an intervention (so something, maybe summative?)

I then looked at the article clustering as a whole. The above figure was generated using T-distributed Stochastic Neighbor Embedding which basically takes high dimensional data and reduces it down (think of it as a souped-up Principle Components Analysis). In this figure, we can see how closely the specific articles are related to one another in 2-dimensional space. I highlight one article from Topic 6.

But finally, which articles should I read if I want to stay on top of things? I sorted the articles to find the one that is most representative of each topic. So for Topic 10, which had the keywords research, system, gender, article, and support, we end up with this: Geographically-related outcomes of U.S. funding for small business research and development: Results of the research grant programs of a component of the National Institutes of Health. And hey, this would explain the SBIR-STTR word pair from above. Pretty cool!


This is a quick tour of the last few months of evaluation research. Soon, Dawn Chorus is going to launch PubTrawlr, which will help people stay on top of the literature using these methods. Stay tuned