Jump to main content

Use Machine Learning to Identify Lyme Disease-Transmitting Ticks

1
2
3
4
5
10 reviews

Abstract

Ticks are a growing public health concern in North America, as their populations and the diseases they carry, like Lyme disease, are on the rise. But how can we help people identify the ticks they encounter to reduce their risk of infection? With artificial intelligence! In this project, you will gather image data of three different tick species and use a convolutional neural network (CNN) to classify them. You will also apply image augmentation techniques to expand and enhance the dataset, making the model more accurate.

Summary

Areas of Science
Difficulty
Method
Time Required
Short (2-5 days)
Prerequisites

None

Material Availability

Readily available

Cost
Very Low (under $20)
Safety

No issues

Credits
Science Buddies is committed to creating content authored by scientists and educators. Learn more about our process and how we use AI.

Objective

Collect tick image data from the iNaturalist website to train a convolutional neural network. Then, adjust the image augmentation parameters to improve the model's accuracy in correctly identifying ticks.

Introduction

Ticks and the diseases they carry pose a growing public health threat in North America. With the number of ticks and their geographical distribution on the rise, the incidence of tick-borne diseases, such as Lyme disease, has also been increasing. There are two species of black-legged ticks that are primarily responsible for transmitting Lyme disease in the United States. In this project, we will focus on one of these types called I. scapularis. Proper identification of tick species, which can look different based on their sex and at various points in their lifecycle–from larva to nymph to adult–can play a crucial role in mitigating the risks associated with these diseases. 

An image of each of the three species of ticks (A. americanum, D. variabilis, and I. scapularis) at their different stages of life.Image Credit: CDC

Figure 1. An image of each of the three species of ticks (A. americanum, D. variabilis, and I. scapularis) at their different stages of life.

However, accurately identifying different species of ticks is challenging for the general public due to their small size and similar appearance. According to a study by Justen et al. (2021) from the University of Wisconsin, in an image-based passive surveillance program, it was found that members of the general public scored 13.2%, 19.3%, and 22.9% identification accuracy for A. americanum, D. variabilis, and I. scapularis, respectively. This lack of expertise can hinder efforts to prevent tick-borne illnesses. Artificial intelligence and computer vision are one way to identify ticks without consulting an expert. 

An image of each type of tick side by side. From left to right, A. americanum, D. variabilis, and I. scapularis. Image Credit: iNaturalist

Figure 2. An image of each type of tick, A. americanum, D. variabilis, and I. scapularis.

Artificial Intelligence (AI) is a branch of computer science focused on the creation of tools that can solve problems and analyze information. Machine learning is a subdivision of AI. Its goal is to create tools that can learn and improve over time using data. Computer vision allows machines to interpret and analyze visual information from the world, such as images of ticks. By using a type of AI model called a convolutional neural network (CNN), we can create a tool to automatically identify the type of tick someone encounters. CNNs are particularly well-suited for image recognition tasks, as they are designed to learn visual patterns and features from images.

Watch this video to learn more about CNNs:

In this project, we will gather images of three types of ticks (A. americanum, D. variabilis, and I. scapularis) and use image augmentation to improve the dataset. Image augmentation involves modifying existing images, such as by rotating or flipping them, to create more variety. This helps the CNN model learn better by exposing it to a wider range of scenarios, ultimately making it more accurate at identifying ticks.

Watch this video to learn more about image augmentation:

Our goal is to develop a tool that can help identify ticks and reduce the spread of tick-borne diseases by applying AI, computer vision, and CNNs.

Terms and Concepts

Questions

Bibliography

Study this project is based on:

Image data is from the iNaturalist website:

  • iNaturalist. (n.d.) iNaturalist. Retrieved September 17, 2024. 

To learn more about ticks and Lyme disease:

To learn more about CNNs:

To learn more about image augmentation:

To learn more about loss in machine learning:

Materials and Equipment

Experimental Procedure

This project follows the Engineering Design Process. Confirm with your teacher if this is acceptable for your project, and review the steps before you begin.

Overview

In this project, you will collect image data on three different types of ticks (A. americanum, D. variabilis, and I. scapularis). You will perform data augmentation which will create even more data for our convolutional neural network model to train on. Your task will also include adjusting image augmentation parameters and explore how different parameters impact the model's accuracy. 

Setting Up the Google Colab Environment

  1. You will need a Google account. If you do not have one, make one when prompted. 

  2. Download the tick_identification.zip file from Science Buddies. Once you have downloaded the zip file, unzip it by right-clicking it in your Downloads and clicking ‘Extract all.’ If you are on a Mac, you can also do this by double-clicking on the file. This will create a new folder alongside the compressed zip file. 

  3. Upload the folder called tick_identification inside the newly extracted folder to Google Drive. You will need to sign in to your Google account at this point or make an account. Upload the file by clicking on the ‘New’ button on the upper left corner of the main Google Drive screen, then selecting the ‘Folder upload’ option. Be sure to select the file folder, not the compressed zip file. 

  4. There are two ways you can access the Google Colab notebook:

    1. Within your Google Drive, double-click the folder you just uploaded. Then, double-click on the file called ‘tick_identification.ipynb.’ This should automatically open the notebook in Google Colaboratory.

    2. Another option is to go directly to Google Colaboratory. On the pop-up menu, select ‘Google Drive’ then select the file called ‘tick_identification.ipynb.’ 

  5. Change the Runtime type by clicking on ‘Runtime,’ then ‘Change runtime type.’ Next, click on the ‘T4 GPU’ option and save. Next, click on the 'T4 GPU' option (if it isn't already selected) and save. By switching the runtime type, you are switching from CPU (default) to a GPU, which can significantly speed up the execution of machine learning models or other compute-intensive tasks. 

  6. Read the Troubleshooting Tips and How to Use This Notebook sections. Follow the instructions you find there. 

  7. Run the blocks under Importing Libraries to ensure you have access to all the functions we will use for this project. The first block will provide us with functions to create the CNN model and make it easier to augment the image data. The second block will mount the Google Drive to use the data we uploaded there in our code. 

1. Gathering the Data

We will be using user-generated data from iNaturalist for this project. Follow the steps below to gather images for three tick species. 

  1. Navigate to the iNaturalist website and search for images for each of the tick species listed below. We highly encourage you to pick the images that are marked as ‘Research Grade’ as those have been verified as the correct species. We will be collecting data for three different types of ticks:

    1. Amblyomma americanum (Lone Star TIck)
    2. Dermacentor variabilis (American Dog Tick)
    3. Ixodes scapularis (Eastern Black-legged Tick)
  2. In the 'tick_identification' folder you uploaded earlier, there is a ‘data’ folder inside. Inside of the 'data' folder you will find three subfolders named A. americanum, D. variabilis, and I. scapularis. Download at least 50 images for each tick species from iNaturalist by right-clicking on the image and selecting ‘Save image as’ and upload them to the corresponding folder in your Google Drive.

    1. Tips for Selecting Images:
      1. Select images with ticks from a similar life stage (this will help the machine learning model learn more easily!).
      2. The tick should occupy most of the image.
      3. Avoid images where the tick is not the main focus of the image.
      4. Select images with minimal clutter or distractions in the background.

2. View Images

First, we will view the images within our code to make sure the images are readily available for use in our machine learning model. 

  1. (Code Block 2A) Run this code block to display five images of A. americanum ticks at a time. You can rerun the block as many times as needed to view more images. 

  2. (Code Block 2B) Next, run this code block to display five images of D. variabilis ticks. As with the previous step, you can rerun the block to load additional images. 

  3. (Code Block 2C) Finally, run this block to view images of I. scapularis ticks. This block also displays five images at a time, and you can run it multiple times to see more images. 

  4. (Code Block 2D) Once you have confirmed that all images load correctly, run this code block to convert the image files in each folder to JPG format. This ensures consistency in the data format for our machine learning model. 

3. Split to Train, Validation, and Test

We will split our dataset into train, validation, and test to avoid overfitting and ensure the model is tested on data it has never seen before. Click on this link to learn more about splitting our data into train, validation, and test

  1. (Code Block 3A) We have provided the code to split the dataset into training, validation, and testing parts.

  2. (Code Block 3B) This code block prints out the number of images in the train, validation, and test data sets. There should be about 70% of images in train, 15% in validation, and 15% in test.

4. Test Augmentation on a Single Image

We will now test augmentation on a single image before using it on the rest of the dataset. Remember that data augmentation creates more data for our model to train on by flipping an image, rotating an image, etc. 

  1. (Code Block 4A) In this code block, we are creating a data augmentation pipeline (sequence of steps) using tf.keras.Sequential, a class from the Tensorflow library that allows us to easily augment images and create CNN models and various augmentation layers. You can explore how changing the augmentation values impacts your image.

    1. Rotation: Try changing the value in RandomRotation(). Increase or decrease the percentage by which images are rotated.
    2. Translation: Modify the horizontal and vertical translation values in RandomTranslation(). What happens if you increase the range of translation?
    3. Zoom: Experiment with RandomZoom() by increasing or decreasing the zoom factor. How does zoom impact the image?
    4. Flip: You can flip images horizontally (already in the pipeline) or also vertically by adding RandomFlip(“vertical”). How does vertical flipping impact model performance on your data?
    5. Contrast & Brightness: Change the factors in RandomContrast() and RandomBrightness() to see how more extreme contrast or brightness variations affect the images.
  2. (Code Block 4B) In this code block, we define a helper function to perform image augmentation given the image and its label. Helper functions are small, reusable pieces of code that perform specific tasks. Run this code block. 

  3. (Code Block 4C) In this code block, we will randomly select an image to perform image augmentation on. Run this code block.

  4. (Code Block 4D) In this code block, we can visualize the randomly selected image. Run this code block.

  5. (Code Block 4E) In this code block, we can visualize the augmented images from the randomly selected image. Note that these are not all the possible augmentations, and you may run it again to generate more. Adjust the values in Code Block 4A as you see fit. When creating augmented images, make sure the changes you apply make sense for your task. Do not make the images too different or too hard to understand. Try to create variety, but avoid repeating the same small changes over and over. Remember to rerun the blocks all the blocks in this section (4A-4E) whenever you make any changes. 

    1. Note: The number of augmented images depends on how many times you run the code. The images will be randomly generated every time. We currently set the code to produce 10 augmented images for each image.

5. Augment all the train data

  1. (Code Block 5A) This code block will perform image augmentation on all images in the training dataset and save the augmented images. Run this code block.

6. Train the Model

  1. (Code Block 6A) This code block defines some helper functions that the model will be using. Run this code block. This may take some time to run. 

  2. (Code Block 6B) This code block defines the model and trains it on the training data set earlier. It will also adjust hyperparameters based on the validation data set and save the trained model. Run this code block. This may take some time to run. Feel free to grab a snack during this time!

7. Test the Model

  1. (Code Block 7A) This code block will reload the saved model. Run this code block. 

  2. (Code Block 7B) This code block will prepare the data generator on the test data to test our model. Run this code block. 

  3. (Code Block 7C) This code block will evaluate and print out the accuracy of our model.

    1. Accuracy represents the percentage of correct predictions made by the model on the test data. In classification tasks, it's the ratio of correctly predicted labels to the total number of predictions. Remember that if your accuracy is 0.20, that means your model correctly identified 20% of ticks from the test data. 

    2. Loss is a measure of how well the model's predictions match the actual labels for the test data. It quantifies the error between the predicted output and the true output. Click on this link to learn more about loss in machine learning
      1. Lower loss means the model’s predictions are close to the actual labels.
      2. Higher loss means the predictions are further away from the actual labels.
  4. (Code Block 7D) This code block will display the classification matrix for this model. Compare the accuracy for each tick species. Is it higher than the general public’s identification accuracy: 13.2% for A. americanum, 19.3% for D. variabilis, and 22.9% for I. scapularis?

    1. Accuracy is a measure used to evaluate the performance of a machine learning model. It represents the proportion of correctly predicted outcomes or labels compared to the total number of instances in the dataset. In simpler terms, accuracy tells you how often the model's predictions are correct.

      Mathematically, accuracy is calculated as:

    2. Macro Average is a measure used to evaluate the performance of a machine learning model, particularly in multi-class classification problems. It computes the metric (e.g., precision, recall, or F1-score) separately for each class and then calculates the unweighted mean across all classes.

      In simpler terms, macro average treats all classes equally, regardless of their representation in the dataset, providing an overall sense of how the model performs across each class individually.

      Mathematically, macro average is calculated as:

      Where is the number of classes and is the value of the metric for class .

  5. Precision is another measurement used to evaluate the performance of a machine learning model, and it focuses on the accuracy of positive predictions by the model. It answers the question: Of the instances the model predicted as positive, how many are actually positive?

    Mathematically, precision is calculated as:

  6. Recall is yet another measurement used to evaluate the performance of a machine learning model. Recall measures the ability of the model to correctly identify all positive instances. It answers the question: Of all the actual positive instances, how many did the model predict correctly?

    Mathematically, recall is calculated as:

  7. F1 Score is a harmonic mean of precision and recall used to evaluate the performance of a machine learning model. It provides a balance between precision and recall, especially when the dataset is imbalanced. The F1 score answers the question: How well does the model perform, considering both the proportion of correct positive predictions (precision) and the ability to identify all positive instances (recall)?

    Mathematically, the F1 score is calculated as: 


     

8. Refine Your Model

Now, it is time to experiment and improve the model by creating different “prototypes.” By adjusting various settings, like the data you use and the augmentation parameters, you can find what works best to boost accuracy. 

  1. Consider downloading additional data to enhance model performance. Aim for at least 100 images for each tick species. Remember to delete the train, validation, and test data so that they can all be randomized with the new dataset. You can do this by selecting the folders called 'train_dataset,' 'val_dataset,' and 'test_dataset,' and delete all of them from the Google Drive. After adding the new data to the appropriate subfolders within the 'data' folder, make sure to run all of the code again by clicking on 'Runtime' and then 'Run all.' 
  2. Experiment with adjusting the augmentation parameters. Observe if these changes improve accuracy. Remember that we adjust the augmentation parameters in Code Block 4A. Before running the code again, remember to delete the train, validation, and test data so that you could randomize the augmentation data. You can do this by selecting the folders called 'train_dataset,' 'val_dataset,' and 'test_dataset,' and delete all of them from the Google Drive. After you finish adjusting the parameters in Code Block 4A, make sure to run all of the code again by clicking on 'Runtime' and then 'Run all.'
icon scientific method

Ask an Expert

Do you have specific questions about your science project? Our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Global Goals

The United Nations Sustainable Development Goals (UNSDGs) are a blueprint to achieve a better and more sustainable future for all.

This project explores topics key to Good Health and Well-Being: Ensure healthy lives and promote well-being for all at all ages.
This project explores topics key to Life on Land: Sustainably manage forests, combat desertification, halt and reverse land degradation, halt biodiversity loss.

Variations

  • Now, challenge yourself by adding images of ticks at every life stage! Can your model still accurately identify each one?
  • For advanced learners, try building your own convolutional neural network (CNN) using TensorFlow's Sequential library. See how well your custom model performs!
  • Do some more research and see if this CNN can perform well on different animals as well! Try seeing if this same CNN can predict different insects, or maybe even a completely different species!

Careers

If you like this project, you might enjoy exploring these related careers:

Career Profile
Many aspects of peoples' daily lives can be summarized using data, from what is the most popular new video game to where people like to go for a summer vacation. Data scientists (sometimes called data analysts) are experts at organizing and analyzing large sets of data (often called "big data"). By doing this, data scientists make conclusions that help other people or companies. For example, data scientists could help a video game company make a more profitable video game based on players'… Read more
Career Profile
Ever wondered what wild animals do all day, where a certain species lives, or how to make sure a species doesn't go extinct? Zoologists and wildlife biologists tackle all these questions. They study the behaviors and habitats of wild animals, while also working to maintain healthy populations, both in the wild and in captivity. Read more
Career Profile
Park rangers are the law enforcement officials of our state and national parks. They protect and preserve parklands, keeping park resources safe from people who might try to damage them, deliberately or through neglect, and keeping people safe from dangers within the park. To achieve this goal, park rangers work in a wide variety of positions, including education and interpretation for park visitors, emergency dispatch, firefighting, maintenance, law enforcement, search and rescue, and… Read more

News Feed on This Topic

 
, ,

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Ngo, Tracey. "Use Machine Learning to Identify Lyme Disease-Transmitting Ticks." Science Buddies, 18 Oct. 2024, https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p021/artificial-intelligence/tick-identification?from=Blog. Accessed 6 June 2026.

APA Style

Ngo, T. (2024, October 18). Use Machine Learning to Identify Lyme Disease-Transmitting Ticks. Retrieved from https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p021/artificial-intelligence/tick-identification?from=Blog


Last edit date: 2024-10-18
Top
Free science fair projects.