Understanding the Emotions Hidden in Words with Sentiment Analysis

33 reviews

Abstract

Sentiment analysis helps us understand the emotions behind text, such as whether people feel positive, negative, or neutral about a topic. It is useful for analyzing opinions on social media, reviews, or other text data. In this project, you will gather text data on a topic of your choice and use a sentiment analysis tool called VADER (Valence Aware Dictionary and sEntiment Reasoner).

Summary

Areas of Science

Artificial Intelligence

Difficulty

Method

Scientific Method

Time Required

Very Short (≤ 1 day)

Prerequisites

None

Material Availability

Readily available

Cost

Very Low (under $20)

Safety

No issues

Credits

Tracey Ngo, Science Buddies

Science Buddies is committed to creating content authored by scientists and educators. Learn more about our process and how we use AI.

https://www.youtube.com/watch?v=y9rrd6zhXv8

Objective

Collect text data from social media posts, news articles, books, etc., and explore how well the VADER sentiment analysis tool can identify positive and negative sentiments in human text.

Introduction

Language is a powerful tool we use to express our thoughts and feelings. But how do we know if someone is happy or upset? Is it in the way they talk or the words they choose? Sometimes it is clear – someone might say, “I’m so happy!” with excitement, or “I’m mad” while storming off. Other times, it is less obvious. For example, someone might say, “I’m fine,” but their tone and body language suggest they’re not.

Therapists and behavior experts spend years learning to understand the hidden emotions in people's words and actions. But they can only focus on one person at a time. What if we need to understand how millions of people feel about something?

For instance, how can companies analyze thousands of reviews to see if customers like their product? How can celebrities track public opinion across social media comments? How can politicians see whether news coverage about them is mostly positive or negative?

This is where sentiment analysis comes into play. Sentiment analysis is a technique in Natural Language Processing (NLP), a branch of Artificial Intelligence (AI) that teaches computers to understand human language. With NLP, computers can process large amounts of text to figure out if the overall mood is positive, negative, or neutral.

Watch this video for a more detailed explanation of NLP:

https://www.youtube.com/watch?v=fLvJ8VdHLA0

One tool for sentiment analysis is VADER (Valence Aware Dictionary and sEntiment Reasoner). In this project, you will collect text data from sources like social media, news articles, reviews, or books. Then, you will use VADER to analyze the sentiments in these texts and compare the results. This will show how sentiment analysis can help us understand emotions and opinions on a larger scale.

Terms and Concepts

Natural Language Processing (NLP)
Artificial Intelligence (AI)
Sentiment analysis
VADER (Valence Aware Dictionary and sEntiment Reasoner)
Tokenization
Part-of-Speech (POS) tagging

Questions

Why might it sometimes be difficult to tell how someone feels based on their words alone?
What is Natural Language Processing (NLP), and why is it important?
What is sentiment analysis, and how can it be useful?
How might a business use sentiment analysis to improve its products or services?
What challenges do you think might arise when analyzing text from different cultures or languages?

Bibliography

To learn more about NLP:

CrashCourse. (2019, September). Natural Language Processing: Crash Course AI #7. YouTube. Retrieved December 12, 2024.
IBM Technology. (2021, August). What is NLP (Natural Language Processing)?. YouTube. Retrieved December 12, 2024.
Simplilearn. (2021, May) Natural Language Processing In 5 Minutes | What is NLP and How Does It Work? Simplilearn. YouTube. Retrieved December 12, 2024.

VADER (Valence Aware Dictionary and sEntiment Reasoner) source code:

cjhutto. (2022, April). VADER-Sentiment-Analysis. GitHub. Retrieved December 12, 2024.

RoBERTa source code:

FacebookAI. (n.d.). roberta-large-mnli. Hugging Face. Retrieved December 12, 2024.

Hugging Face Transformers NLP source code:

huggingface. (n.d.) Transformers. GitHub. Retrieved December 12, 2024.

Materials and Equipment

Computer with Internet access

Experimental Procedure

Download PDF of Procedure

This project follows the

Scientific Method. Review the steps before you begin.

Overview

This project introduces you to sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) via the NLTK (Natural Language ToolKit) library. You will gather and analyze text data to understand patterns in sentiment across various categories.

1. Setting Up the Project Environment

You will need a Google account. If you do not have one, make one when prompted.
Navigate to Google Drive, click on ‘MyDrive,’ then create a new folder and rename it “sentiment_analysis.”
Download the sentiment_analysis.ipynb file from Science Buddies. This is the code you will need to process your data. Upload this file to your “sentiment_analysis” folder in MyDrive.
Double-click on the sentiment_analysis.ipynb file. This should automatically open in Google Colab.
1. Read the Troubleshooting Tips and How to Use This Notebook sections. Follow the instructions you find in that section.
2. Run the blocks under Importing Libraries to ensure you have access to all the functions we will use for this project. We are importing quite a few libraries for this project, so do not worry if this takes a bit of time to run.

2. Gathering Our Data

In this section, you will collect data from a source of your choice, giving you the flexibility to design your project as you see fit. You have a wide range of options, such as social media posts, news articles, reviews, and more.

For example, you could compare one set of the following:

Social media posts from your friends
News articles from different websites
Movie or product reviews

Pick at least two different sources for whichever type of text you choose (e.g. social media posts from two friends, news articles from two different websites, reviews for two different movies).

Within the “sentiment_analysis” folder on your MyDrive, create subfolders for each category you are analyzing.
1. For example, if you are comparing tweets from your friends, create a separate folder for each friend and name it accordingly (e.g., one folder named “Andy,” another “Wendy”) and so on.
Using a text editor, copy and paste the text you want to analyze, then save it as a plain text file (.txt). Finally, upload the .txt file to the corresponding subfolder in your “sentiment_analysis” folder.
1. You might find that it is helpful to name the .txt files after the folder as well (e.g., Andy01.txt, Andy02.txt, and so on).
Aim to have at least 10 different .txt files for each subfolder.

3. Natural Language ToolKit (NLTK) Basics

Before diving into the project, we will first review the basics of the Natural Language ToolKit (NLTK) library, which includes the VADER sentiment analysis tool.

(Code Block 3A) Run this code block to tokenize a sentence of your choice. Tokenization is the process of breaking down a string of text into smaller components, such as words or punctuation marks.
1. To try it out, replace the current sentence inside the quotation marks with any sentence you like and observe how the NLTK library tokenizes it!
(Code Block 3B) This code block uses the NLTK library to perform Part-of-Speech (POS) tagging on a list of tokens. To explore the available POS tags, refer to the text block above this code block.
1. This code block processes the same sentence you used in Code Block 3A. If you would like to see the POS tagging for a different sentence, simply update the sentence in Code Block 3A and rerun both code blocks.

4. Test VADER Sentiment Scoring

Before applying VADER to the entire dataset, let’s first explore how it analyzes a few individual sentences.

(Code Block 4A) This code block creates a SentimentIntensityAnalyzer, which is part of the VADER sentiment analysis tool. Run this code block.
(Code Block 4B) This code block uses the SentimentIntensityAnalyzer to analyze the sentiment of the given sentence. Experiment by inputting different positive sentences to see how the analysis responds. Here is what each key means:
1. neg: The proportion of the text that conveys negative sentiment (e.g., neg: 0.0 means the text does not contain any negative sentiment).
2. neu: The proportion of the text that is neutral (e.g., neu: 0.4 means about 40% of the text is considered neutral).
3. pos: The proportion of the text that conveys positive sentiment (e.g., pos: 0.6 means about 60% of the text is considered positive).
4. compound: An overall sentiment score normalized to a range from -1 (most negative) to +1 (most positive) (e.g., compound: 0.8395 means that the overall sentiment is strongly positive. The compound score is a weighted average of the other scores, with adjustments for the intensity of positive or negative words).
(Code Block 4C) This code block also uses the SentimentIntensityAnalyzer to analyze the sentiment of the given sentence. Experiment by inputting different negative sentences to see how the analysis responds.
(Code Block 4D) This code block is the same as Code Block 4B and 4C. This time, try to challenge the SentimentIntensityAnalyzer by crafting sentences that could confuse it–such as making a negative sentence seem positive or vice versa (e.g., using sarcasm).

5. Using VADER on your Dataset

(Code Block 5A) Run this code block to create a DataFrame, which is like a table that will be used to load and manipulate the data in the notebook. You will see the data from your MyDrive populate in a table below the code block.
(Code Block 5B) This code block calculates sentiment scores for each .txt file you uploaded earlier and stores the results. Run this code block.
(Code Block 5C) This code block takes the results from Code Block 5B and adds the results to the original DataFrame.

6. Visualize the Data

(Code Block 6A) This code block creates a bar plot to visualize the average compound sentiment score for each label in the DataFrame. Run this code block.
1. Which label had on average a higher compound score? Lower compound score? Why do you think so?
(Code Block 6B) This code block creates a series of bar plots to compare the sentiment scores (positive, neutral, negative) across different labels in the dataset. Run this code block.
1. What trends or patterns do you notice in the positive, neutral, and negative sentiment scores for each label?
2. Which label has the highest/lowest positive sentiment? How about neutral and negative sentiment?
3. Are there any labels with similar sentiment distributions? What does this suggest?

7. Review Individual Examples

(Code Block 7A) You can review examples one at a time in this code block. To view different examples, change the row_number to any number between 0 and one less than the total number of .txt files you have. (e.g. If you have 30 .txt files, you can change this number to be between 0 and 29).
1. Can you find any examples that you thought were positive but VADER gave them a more negative score, or vice versa?
2. Why do you think VADER may have a hard time interpreting sarcasm?

Ask an Expert

Do you have specific questions about your science project? Our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Post a Question

Variations

Compare the results of the VADER sentiment analysis tool with the RoBERTa model. You can download the roberta_sentiment_analysis.ipynb notebook here. Additional details about the RoBERTa model are available in the Bibliography. Will RoBERTa outperform VADER?
For a unique challenge, focus exclusively on analyzing sarcastic text. How well does the VADER model handle sarcasm?

Careers

If you like this project, you might enjoy exploring these related careers:

Data Scientist

Career Profile

Many aspects of peoples' daily lives can be summarized using data, from what is the most popular new video game to where people like to go for a summer vacation. Data scientists (sometimes called data analysts) are experts at organizing and analyzing large sets of data (often called "big data"). By doing this, data scientists make conclusions that help other people or companies. For example, data scientists could help a video game company make a more profitable video game based on players'… Read more

Sociologist

Career Profile

Any time there is more than one person in a room, there is potential for a social interaction to occur or for a group to form. Sociologists study these interactions—how and why groups and societies form, and how outside events like health issues, technology, and crime affect both the societies and the individuals. If you already like to think about how people interact as individuals and in groups, then you're thinking like a sociologist! Read more

News Feed on This Topic

, ,

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Ngo, Tracey. "Understanding the Emotions Hidden in Words with Sentiment Analysis." Science Buddies, 7 July 2025, https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p023/artificial-intelligence/sentiment_analysis. Accessed 31 July 2026.

APA Style

Ngo, T. (2025, July 7). Understanding the Emotions Hidden in Words with Sentiment Analysis. Retrieved from https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p023/artificial-intelligence/sentiment_analysis

Last edit date: 2025-07-07

Explore Our Science Videos

How to Measure Acceleration with Google's Science Journal App

Balloon Car | STEM Lesson Plan

Bending Plant Roots with Gravity | STEM Lesson Plan