Note: This story was originally published in July 2020. We had some updated versions since then, but we’re pulling this back up in case you missed it the first time around.
COVID has been incredibly disruptive, not just to our lifestyles but to our moods as well. Passions have been inflamed by differences in how (or if) we should respond. The Institute for Healthcare Improvement (IHI) has been collecting stories of COVID at this link. This seemed like a good opportunity to understand the underlying mood using sentiment analysis.
Sentiment Analysis is a sub-field of Natural Language Processing (NLP) that identifies the emotional valence of words (Silge & Robinson, 2020). I haven’t really seen sentiment analysis used in community-based projects, but it has the potential to be a useful tool in identifying trends and concepts that are associated with emotionally reactive topics. With that in mind, 1) Are IHI’s COVID stories positive or negative, and 2) how have the emotions changed over time? I looked at this in R using the phenomenal tidytext package.
I pulled and collected all the stories and associated metadata (like author and date published) off of IHI’s website. I then used two natural language processing techniques to analyze the stories. To obtain descriptive data, I used a Bag-of-Words (BOW) approach. BOW treats the words within a text as meaningful in and of themselves. The more frequently that a word occurs, the more relevant it is in descriptive analysis.
I used two preprocessing steps to prepare the text. I first removed stopwords: words that add little informative value to the overall text, such as “and,” “the,” and “with.” I then lemmatized the remaining words. Lemmatization involves converting a word down to its base form. In some cases, this involves making singular words plural. In others, it consists of normalizing the tense of the word (which is especially relevant when dealing with irregular verbs in English, such as “be”).
I used the bing lexicon (or group of words) for the sentiment analysis, which assigns words to a positive or negative sentiment (Liu, 2015).
As of July 17, 2020, 55 stories were posted on the website. The histogram below shows the distribution of word counts across all stories (Figure 1).
One general challenge in NLP is negation, or when a word or phrase signifies that the word or phrase that follows should be interpreted in the opposite direction. Commonly, these are preceded by words like “not,” “won’t,” and so on. Therefore, I had to correct, for instance, the stories where phrases were preceded by negation. For example, “this is not great” has a negative valence. For the bing analysis, this involved switching the direction of the word (from positive to negative and vice versa).
I then looked at the overall sentiment of the story, whether it was mostly positive or negative. Figure 5 shows the top stories for most positive or negative (net sentiment either greater than ten or less than -10.) The longer the bar, the more positive or negative the stories are.
I then looked at trends in sentiment over time, using the dates of when the stories were posted on the website. This figure shows wide swings in net sentiment over time before leveling out into a more neutral state.
The body of stories gathered by IHI displays a wide range of emotions, which seems on-the-face consistent with the experiences of many people over time. Although the number of stories has lessened over time, so has the intensity of emotional reactions.
Thinking more broadly, this brief analysis shows how sentiment analysis can be used to look at changes in community context on an emotional level over time. Community practitioners could use similar methods to look at specific issues that are of concern (e.g., police-community relations, housing policies.)
From a methods note, I could have made the code cleaner, but ¯\_(ツ)_/¯
References
Ignatow, G, & Mihalcea, R. (2018). An Introduction to Text Mining. Sage. Thousand Oaks, CA.
Liu, B. (2015). Sentiment Analysis: mining sentiments, opinions, and emotions.
Mohammad, S. M., Kiritchenko, S., & Zhu, X. (2013). NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242.
Silge, J. & Robinson, D. (2020). Text Mining. A Tidy Approach. found here.