Sentiment Analysis of Stories of COVID

Posted by

COVID has been incredibly disruptive, not just to our lifestyles but to our mood as well. Passions have been inflamed by difference in how (or if) we should respond. The Institute for Healthcare Improvement (IHI) has been collecting stories of COVID at this link. This seemed like a good opportunity to understand the underlying mood using sentiment analysis.

Sentiment Analysis is a sub-field of Natural Language Processing (NLP) that identifies the emotional valence of words (Silge & Robinson, 2020). I haven’t really seen sentiment analysis used in community-based projects, but it has the potential to be a useful tool in identifying trends and concepts that are associated with emotionally reactive topics. With that in mind, 1) Are IHI’s COVID stories positive or negative, and 2) how have the emotions changed over time? I looked at this in R using the phenomenal tidytext package.


I pulled and collected all the stories and associated metadata (like author and date published) off of IHI’s website. I then used two natural language processing techniques to analyze the stories. To obtain descriptive data, I used a Bag-of-Words (BOW) approach. BOW treats the words within a text as meaningful in and off themselves. The more frequently that a word occurs, the more relevant it is descriptive analysis.

I used two preprocessing steps to prepare the text. I first removed stopwords: words that add little informative value to the overall text, such as “and,” “the,” and “with.” I then lemmatized the remaining words. Lemmatization involves converting a word down to its base form. In some cases, this involve making singular words plural. In other, it involves normalizing the tense of the word (which is especially relevant when dealing with irregular verbs in English, such at “be”).

I used the bing lexicon (or group of words) for the sentiment analysis, which assigns words to a positive or negative sentiment (Liu, 2015).


As of July 17, 2020, there were 55 stories posted on the website. The histogram below shows the distribution of word counts across all stories (Figure 1).

One general challenge in NLP in general is negation, or when a word or phrase signifies that the word of phrase that follows should be interpreted in the opposite direction. Commonly, these are preceded by words like “not,” “won’t,” and so on. Therefore, I had to correct for instance in the stories where phrases were preceded by negation. For example, “this is not great,” has a negative valence. For the bing analysis, this involved switching the direction of the word (from positive to negative and vice versa).

I then looked at overall sentiment of the story; whether is was mostly positive or negative. Figure 5 shows the top stories for most positive or negative (net sentiment either greater than 10 or less than -10.) The longer the bar, the more positive or negative the stories are.

I then looked at trends in sentiment over time, using the dates whether the looking at when the stories were posted on the website. This figure shows wide swings in net sentiment over time before leveling out into a more neutral state.

The body of stories gathered by IHI display a wide range of emotions, which seems on-the-face consistent with experiences on many people over time. Although the number of stories has lessened over time, so has the intensity of emotional reactions.

Thinking more broadly, this brief analysis shows how sentiment analysis can be used to look at changes in community context over time on an emotional level. Community practitioners could use similar methods to look at specific issues that are of concern (e.g., police-community relations, housing policies.)

From a methods note, I could have made the code cleaner, but  ¯\_(ツ)_/¯


References

Ignatow, G, & Mihalcea, R. (2018). An Introduction to Text Mining. Sage. Thousand Oaks, CA.

Liu, B. (2015). Sentiment Analysis: mining sentiments, opinions, and emotions.

Mohammad, S. M., Kiritchenko, S., & Zhu, X. (2013). NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242.

Silge, J. & Robinson, D. (2020). Text Mining. A Tidy Approach. found here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s