Jump to main content

Classify Animals with Machine Learning

1
2
3
4
5
18 reviews

Abstract

How do you tell the difference between a bird and a fish? Birds tend to have feathers and can fly, while fish have no legs and can breathe underwater. In this project, you will explore how to create a decision tree using machine learning that can classify different animals based on multiple characteristics. This project is designed for beginners and requires little to no coding experience. Ready to give it a shot?

Summary

Areas of Science
Difficulty
Method
Time Required
Very Short (≤ 1 day)
Prerequisites

None

Material Availability

Readily available

Cost
Very Low (under $20)
Safety

No issues

Credits
Science Buddies is committed to creating content authored by scientists and educators. Learn more about our process and how we use AI.

Objective

Collect animal data to improve a decision tree model and explore which characteristics contribute most to determining animal groups. 

Introduction

Imagine you are surrounded by bears, dolphins, gorillas, and more at the zoo. We typically divide animals into groups like mammals, birds, reptiles, fish, amphibians, insects, and other invertebrates. This practice of grouping animals reflects a basic application of taxonomy. Taxonomy is the branch of science related to classifying all living organisms into systematically arranged groups based on shared characteristics and genetic relationships. These categories help us understand not only the visible differences, but also the evolutionary connections between different species. By applying this systematic classification, we can more easily communicate and study the vast diversity of life forms. Whether in a natural setting or a curated one like a zoo, taxonomy provides a structured way to comprehend and discuss the relationships and distinctions among the various species encountered. But have you ever mixed them up? How do you tell them apart?

In your head, you might create a mental decision tree to classify animals based on their characteristics. For example, you might think, “Does it lay eggs?” If yes, it could be a bird, reptile, or amphibian. If not, you might ask yourself, “Does it create milk?” If yes, it’s probably a mammal. This process of asking questions and making decisions based on the answers is similar to how a decision tree works. You are essentially creating a mental flowchart to categorize animals based on their distinguishing characteristics. 

Figure 1. Example of a decision tree. This diagram demonstrates how to classify an animal based on its characteristics by navigating through the tree. Please note that this tree is illustrative and not entirely accurate.

As you continue mentally sorting through the animals, you might find that this process becomes increasingly complex, especially as you encounter more diverse species. It can be challenging to keep track of all the characteristics and make accurate classifications solely in your head. Even putting pen to paper to calculate these distinctions can be daunting. However, that it where machine learning and decision trees come in.

Artificial Intelligence (AI) is a branch of computer science focused on the creation of tools that can solve problems and analyze information. Machine learning is a subdivision of AI. Its goal is to create tools that can learn and improve over time using data. In this project, we will dive into decision trees, one type of machine learning algorithm. Decision trees mimic the human decision-making process by breaking down a problem into a series of sequential questions or decisions. Decision trees often serve as powerful tools for classifying data and solving various problems, including animal identification.

Watch this video to learn more about decision trees. We recommend watching the video from 0:18 to 9:15:

In this project, your task will be to gather more data to improve our decision tree model. Then, you will analyze this data to understand which features are most important in determining each animal's group.  

Terms and Concepts

Questions

Bibliography

To learn more about decision trees, you can watch these videos:

The dataset we will be using is a modified version of this: 

  • Forsyth, Richard. (1990). Zoo. UCI Machine Learning Repository. Zoo. Retrieved May 7, 2024. 

These are the sites we recommend for researching animals:

To learn more about why we split data into train and test:

Materials and Equipment

Experimental Procedure

This project follows the Engineering Design Process. Confirm with your teacher if this is acceptable for your project, and review the steps before you begin.

Experimental Procedure

In this project, we will give you a short list (dataset) of animals, their characteristics, and the groups they belong to. You will put this dataset into a decision tree machine learning model and see how well it can classify the animals. Then, you’ll keep adding more and more animals to the list and see how that improves the model's ability to classify them. You will also figure out which characteristics are most important for classifying animals into these groups.

1. Explore the Starting Dataset

The dataset we give you to start has only two animals from each group: mammals, birds, reptiles, fish, amphibians, insects, and other invertebrates (note that insects are invertebrates, but for this project, we put them in their own group because they are so common). Since we have a limited amount of data, it might be hard for the model to identify the animals correctly. You will do your first test with this small dataset. Then, you will gradually add more animals to the dataset (4, 6, 8, and 10 in each group) to help the model learn and see if and how it improves the model's accuracy.

  1. First, download the zoo2.xlsx file.
  2. Choose your preferred option to open the file:
    1. Google Sheets: Navigate to Google Sheets, click on the 'Open file picker' in the middle of the page (the icon is a folder next to the A-Z icon), then select 'Upload'>'Browse' to locate the zoo2.xlsx file. Once uploaded, you will be redirected to the spreadsheet with the data.
    2. Microsoft Excel: Open the Microsoft Excel app on your computer, click on 'File' in the upper left corner, then select 'Open' and navigate to the zoo2.xlsx file and open it. After opening, you will automatically see the data in spreadsheet format. 
  3. Take a minute to look over and make sure you understand the dataset. 
    1. All the animals are named in the first column, which is Column A. 
    2. The characteristics of the animals, like whether they have hair, feathers, or produce milk, are in columns B through Q. If an animal has one of these characteristics, you will see a '1' in that column. If it does not, you will see a '0'. The only exception is for legs, where the column shows how many legs the animal has instead of '1' or '0'.
    3. The group the animal belongs to, like mammal or bird, is in Column R. 'group_number' will be one of the following:
      • 1 for Mammal
      • 2 for Bird
      • 3 for Reptile
      • 4 for Fish
      • 5 for Amphibian
      • 6 for Insect
      • 7 for Other Invertebrate
  4. When you are done, save the file as a CSV file by clicking 'File'>'Save as'  (in Google Sheets, it is 'File'>'Download') then choose the 'CSV (Comma delimited) (*.csv)' option. Make sure you are on the sheet called "zoo2.csv" You may get a popup that says you cannot save multiple sheets as a CSV. Click OK to save only the active sheet. Make sure to keep the file name the same. 

2. Loading the Data into a Pandas DataFrame

  1. Download the animal_classification.ipynb file from Science Buddies. This is the code you will need to process your data.
  2. Within your Google Drive, click on 'MyDrive,' then create a new folder and label it 'Animal Classification.' Inside this folder, upload both your animal_classification.ipynb file and your zoo2.csv file. 
  3. Double-click the animal_classification.ipynb file. This should automatically open the Google Colab.
    1. Read the Troubleshooting Tips and How to Use This Notebook sections. Follow the instructions you find in that section.
    2. Run the block under Importing Libraries to ensure you have access to all the functions we will use for this project.
    3. (Code Block 2A) Run this code block to make the files on your Google Drive available to use in the notebook. 
    4. (Code Block 2B) Make sure the file name variable (file_name) is correct. If you kept the file name "zoo2.csv", you do not need to change anything. However, if you changed the name of the file to something else, make sure to change the file_name variable to match that before you run the code blocks in this section. You will update this again later with the file names for your larger datasets after you make them (see Step 9).

3. Preprocessing the Dataset

Preprocessing a dataset is an important step in machine learning. It involves getting the data ready before using it in a machine-learning model. Preprocessing has different tasks, and we will explain and show each one separately. Run each code block.

  1. (Code Block 3A) Dropping Features: First, we will remove features that we believe will not be useful for modeling. In this instance, we will remove 'animal_name' and 'group_name.' 
    1. The feature 'animal_name' lists the names of each animal, like "lion," "eagle," or "snake." These names help us tell the animals apart, but they do not help us figure out what group an animal is part of (i.e. whether it is a mammal, bird, reptile, etc.). So, we need to remove this so the model doesn’t try to use it. 
    2. The feature 'group_name' is the list of the groups each animal belongs to. However, machine learning algorithms, like decision trees, only understand numbers and not words, which is why we have 'group_number,' which shows each group as a number instead of by its name. 
  2. (Code Block 3B) Separating the Dataset into Inputs and Target: The second block of code separates the dataset into two parts:
    1. 'inputs' contains the characteristics of the animals (feathers, milk, legs, etc.)
    2. 'target' contains the numbers representing each animal group that the model is trying to predict (1 for mammal, 2 for bird, etc.).
  3. (Code Block 3C) Splitting the Training and Testing Data: Splitting data into training and testing sets is important in machine learning. It helps to see how well your model works on new data. Watch this video to learn more about why we split datasets. We have provided the code to split the dataset into training and testing parts. Pay attention to how X and y look after this step, and the sizes of X_train, X_test, y_train, and y_test. The numbers inside the parentheses represent the number of rows and the number of characteristics in each set. For example, if you see the X_train shape as (10, 16), that means there are 10 animals in the training data, each with 16 characteristics (e.g., hair, feathers, eggs, etc.).
Coding Tip:
Following the standard coding conventions, X is commonly written in uppercase, while y is usually in lowercase.

4. Training the Model

  1. (Code Block 4A) We have provided the code to make a Decision Tree classifier. Run this code.
  2. (Code Block 4B) This code trains the classifier using the training data you gave it. Run this code (this is like pressing play to let the computer do its job and learn from the examples we've given it).

5. Evaluating the Model

Now that our model is trained, we can see how it classifies animals into groups based on their characteristics. 

  1. (Code Block 5A) In the first code block, we will use a Model Accuracy Score (called 'model.score(X_test, y_test)' in the code) to figure out how accurate the model was.
    1. The Model Accuracy Score will be a number from 0 to 1, which shows the percentage of animals that were put into the correct group in the testing data.
    2. For example, a score of 0.6 means there was a 60% accuracy rate. In other words, 6 out of 10 animals were classified into the correct group. 
    3. You will want to create a data table, like the one shown below, and write down the Model Accuracy Score for this first dataset. You will fill out the rest of the columns later.
Swipe left to see more
# of Animals in Each Group Model Accuracy Score Groups that Were Misclassified
(e.g. Reptile was classified as an Amphibian)
Misclassified Animals Top 3 Most Important Characteristics for Classification
2 (zoo2.csv)
4 (zoo4.csv)
6 (zoo6.csv)
8 (zoo8.csv)
10 (zoo10.csv)
  1. (Code Block 5B) In the second code block, we can see the model's predictions by looking at something called 'y_hat.' When we print  'y_hat' (i.e. display it on the screen), we can see the model's predictions for each animal in the testing dataset.
  2. (Code Block 5C and Code Block 5D) In the third and fourth code blocks, we compare the actual animal groups with the model's predicted animal groups. The third block shows the number corresponding to each animal group, and the fourth block shows the actual name of the animal group. Comparing them side by side, can you see where the model misclassified an animal?
    1. For example, if your y_test was (1, 1, 2) and your y_hat was (1, 1, 3), we can see that the last animal in the list (the last one) was identified incorrectly. The animal was actually a 2, but the model thought it was a 3. 
    2. This same data is repeated again but with the group name for ease of reading. The y_test would be (Mammal, Mammal, Bird) while the y_hat would be (Mammal, Mammal, Reptile). We can see that the animal was actually a bird, but the model thought it was a reptile.

6. Visualize the Model

  1. (Code Block 6A) When you run this code block, a graph showing the model's classifications (which group each animal was put in) will be created.
    1. The graph shows the actual animal groups on the x-axis and the animal groups predicted by the model on the y-axis. So, if a point is at mammals on the x-axis and mammals on the y-axis, it was classified correctly. If a point is at mammals on the x-axis and reptiles on the y-axis, it is classified incorrectly.
    2. In a perfect scenario where the model is 100% accurate, the dots on the graph would create a straight line from the bottom left to the top right.
    3. The size of each dot on the graph shows how many points are in that position. Bigger dots mean there are more points in that spot. If a dot is big, that means that a lot of animals were classified that way.
    4. Did your model classify all the animals correctly? If not, what animal groups were mixed up? Document this in the third column of your data table called ‘Groups that Were Misclassified.’

7. Identify the Animals that were Misclassified

Now that our model is trained and tested with our testing data, we can identify which animals were misclassified (put in the wrong group) by the model. 

  1. (Code Block 7A) To help you with this, we have provided helper functions, which are pieces of code that help you perform common tasks quickly. Run the first code block to make these functions available. 
  2. (Code Block 7B) The second code block finds where the computer made mistakes in sorting the animals and shows them to you in a list of numbers (called indices), with each number representing the position of the animal in the list. For example, if y_test was [2, 3, 3] and y_hat was [2, 5, 5], then the block would return [1, 2], because it misclassified the second and third animals in that list (lists start to index from 0). 
  3. (Code Block 7C) The third code block contains the 'classified()' function, which tells us which groups the misclassified animals were put into. For example, it would tell us that the lizard was misclassified as part of the amphibian group. The function returns the animal at a given index (place in the list) and tells us what group it was classified as. 
  4. (Code Block 7D) Run the last block to print (show on the screen) a list of animals with their actual group and what group they were misclassified as. If the model were 100% accurate, then this list would be empty. 
    1. Add these misclassified animals to the fourth column on your data table called ‘Misclassified Animals.’
    2. Consider why these animals might have been misclassified. Are they similar to the other animals it was misclassified as?

8. Decision Tree Visualization

  1. (Code Block 8A) When you run this code block, it will create a picture or visualization of the decision tree. The characteristics are arranged from most important for determining an animal’s group at the top to least important for determining an animal’s group at the bottom. At the top of each box in the decision tree, you will see a characteristic and a number, such as "milk <= 0.5.”This example means that if an animal does not have milk, it will go to the left side of the tree. If it does, it will go to the right side of the decision tree.
    1. Document the top three most important characteristics for determining an animal’s group in the last column of your data table, called ‘Top 3 Most Important Characteristics for Classification.’
    2. Advanced Analysis Option: The Gini value, also known as the Gini impurity or Gini index, shows how mixed up a group is in a decision tree. It tells us how likely it is that a randomly chosen item in the group will be labeled incorrectly based on the current labels.
      1. 0 Gini Value: This means that the group is completely pure, and all the training data in that node belongs to one group.
      2. Higher Gini Value: When the Gini value is higher, it means there is more mixedness or impurity in the noise. This suggests that the node contains a mix of different groups in the dataset.
      3. If you want to, add the Gini values to the final column of your data table and consider this as part of your analysis.

9. Add Data to Your Spreadsheet and Retest

Next, you'll follow the steps below to add more animals to your spreadsheet to see how it affects how well the model works. You'll do this in steps: first with 4 animals in each group (4 mammals, 4 reptiles, 4 amphibians, etc.), then with 6, then 8, and finally with 10. Make sure to give each new spreadsheet a different name like 'zoo4.csv', 'zoo6.csv', and so on. This way, you can tell them apart easily. Get ready to add your favorite animals to your dataset!

  1. Research and add your favorite animals. We recommend using Animalia and A-Z Animals for your research, but any search engine works as well:
    1. Scroll to the bottom of the spreadsheet and add the animal's name, its characteristics, and its group.
    2. For all characteristics except 'Legs,' use 0 for no and 1 for yes. For 'Legs,' insert the number of legs the animal has. The characteristic 'catsize' refers to whether the animal is larger than a house cat.
    3. 'group_number' will be one of the following:
      • 1 for Mammal
      • 2 for Bird
      • 3 for Reptile
      • 4 for Fish
      • 5 for Amphibian
      • 6 for Insect
      • 7 for Other Invertebrate
    4. Add an equal number of animals to each group until you get to the number you are testing (4, 6, 8, or 10).
      1. To check your progress on how many of each animal you have, check the second sheet called "count."
      2. When you are done, save your spreadsheet as a new CSV file using a descriptive file name (zoo4.csv, zoo6.csv, etc.) and upload the file to your ‘Animal Classification’ folder on Google Drive.
    5. Repeat Step 3 (Preprocessing the Dataset) through Step 7 (Identify the Animals that Were Misclassified) for each dataset you create and track your findings on your data table.
    6. What patterns do you notice in your data? How does the Model Accuracy Score change as you add more animals? How does the number of animals that are misclassified change? Which groups become more accurate with more data? What are the most important characteristics for determining the group? Why do you think this is?
icon scientific method

Ask an Expert

Do you have specific questions about your science project? Our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Global Goals

The United Nations Sustainable Development Goals (UNSDGs) are a blueprint to achieve a better and more sustainable future for all.

This project explores topics key to Life Below Water: Conserve and sustainably use the oceans, seas and marine resources.
This project explores topics key to Life on Land: Sustainably manage forests, combat desertification, halt and reverse land degradation, halt biodiversity loss.

Variations

  • Try this project using Random Forests, a collection of decision trees. You can use the RandomForestClassifier from the sklearn library
  • Try training the model on an unbalanced dataset (think 100 mammals and 2 reptiles), do you find that the model has a more challenging time classifying the animals it did not have a lot of exposure to?

Careers

If you like this project, you might enjoy exploring these related careers:

Career Profile
Many aspects of peoples' daily lives can be summarized using data, from what is the most popular new video game to where people like to go for a summer vacation. Data scientists (sometimes called data analysts) are experts at organizing and analyzing large sets of data (often called "big data"). By doing this, data scientists make conclusions that help other people or companies. For example, data scientists could help a video game company make a more profitable video game based on players'… Read more
Career Profile
Ever wondered what wild animals do all day, where a certain species lives, or how to make sure a species doesn't go extinct? Zoologists and wildlife biologists tackle all these questions. They study the behaviors and habitats of wild animals, while also working to maintain healthy populations, both in the wild and in captivity. Read more
Career Profile
Life is all around you in beauty, abundance, and complexity. Biologists are the scientists who study life in all its forms and try to understand fundamental life processes, and how life relates to its environment. They answer basic questions, like how do fireflies create light? Why do grunion fish lay their eggs based on the moon and tides? What genes control deafness? Why don't cancer cells die? How do plants respond to ultraviolet light? Beyond basic research, biologists might also apply… Read more

News Feed on This Topic

 
, ,

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Ngo, Tracey. "Classify Animals with Machine Learning." Science Buddies, 7 July 2025, https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p018/artificial-intelligence/animal_classification?from=Blog. Accessed 7 June 2026.

APA Style

Ngo, T. (2025, July 7). Classify Animals with Machine Learning. Retrieved from https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p018/artificial-intelligence/animal_classification?from=Blog


Last edit date: 2025-07-07
Top
Free science fair projects.