Train a Reinforcement Learning AI to Play Minesweeper

Abstract

Have you ever played Minesweeper? Did you start clicking around randomly until you hit a mine? What did you think the numbers meant? Do you think a computer program can play Minesweeper better than a human? In this project, you will explore how to train an AI agent to play Minesweeper.

Summary

Areas of Science

Artificial Intelligence

Difficulty

Method

Engineering Design Process

Time Required

Short (2-5 days)

Prerequisites

None

Material Availability

Readily availiable

Cost

Very Low (under $20)

Safety

No issues

Credits

Tracey Ngo, Science Buddies

Science Buddies is committed to creating content authored by scientists and educators. Learn more about our process and how we use AI.

https://www.youtube.com/watch?v=EGp2LHOEWTI

Objective

To experiment with changing model hyperparameters in training an AI to play Minesweeper.

Introduction

Minesweeper is a classic puzzle game where your goal is to uncover every safe square on a grid–without clicking on a mine. Today, many people already know how it works, but when Minesweeper first became popular, new players often clicked randomly because they were unfamiliar with the rules. Over time, people learned the strategy: the numbers on the board aren’t random–they’re clues. Each number tells you how many mines are touching that square, and you use logic to decide which squares are safe and which ones might be dangerous. If you click on a single mine, the game ends.

Image Credit: Science Buddies

Figure 1. Example Minesweeper game state with a partially revealed board: opened tiles display the number of adjacent mines while unopened tiles remain hidden.

We’ll explore how Artificial Intelligence (AI) (computer systems designed to perform tasks that normally require human intelligence, such as pattern recognition, learning, and decision-making) can learn to play Minesweeper. Unlike games where you can always see where all the pieces are (like checkers or chess), Minesweeper is partially observable. You start with a blank grid and do not know which squares contain mines. You (or the AI) only discover information as you click and reveal tiles. Instead of solving the game with complete information, the AI must learn how to make informed decisions using limited clues – just like humans do.

To do this, we’ll use a machine learning approach called reinforcement learning. Reinforcement learning is a way for an AI to learn by trying actions and learning from the results, similar to how someone might learn a new game through practice. The AI (called the agent) interacts with an environment (the Minesweeper board). Each time it makes a move, it receives feedback. Over many attempts, the agent learns which choices tend to lead to better feedback, such as winning the game–meaning it avoids mines longer and wins more often.

Watch this video to learn more about reinforcement learning:

https://www.youtube.com/watch?v=nIgIv4IfJ6s

To train our agent, we’ll use a specific reinforcement learning method called a Deep Q-Network (DQN). A DQN learns to estimate how good each action is in each situation. Instead of memorizing every single board configuration, the DQN uses a model to generalize from experience.

Watch this video to learn more about DQNs:

https://www.youtube.com/watch?v=x83WmvbRa2I

The “deep” in Deep-Q-Network refers to the fact that the agent utilizes a neural network. A neural network is a type of machine learning model inspired by how the brain processes information. You can think of a neural network as a pattern-finder: it learns to recognize board situations and predict which actions are likely to be safe or useful.

Watch this video to learn more about neural networks:

https://www.youtube.com/watch?v=2wrcv28ODNg

In this project, your task is to help the agent learn better by adjusting hyperparameters and training time. Hyperparameters are settings you choose before training starts–they control how learning happens. For example, hyperparameters can affect how fast the AI updates its knowledge, how much it explores random moves, and how strongly it relies on past experiences. You’ll also change the number of episodes, which are full games (or attempts) the agent plays during training. One episode ends when the agent wins or hits a mine. In general, more episodes provide the agent with more practice; however, too many or too few episodes, or suboptimal hyperparameter choices, can affect how well the AI learns.

Terms and Concepts

Artificial Intelligence (AI)
Partially observable
Reinforcement learning
Agent
Deep Q-Network (DQN)
Neural network
Hyperparameter
Episode

Questions

What is the goal of Minesweeper?
Why can’t the player see the whole board at the start of the game?
What are hyperparameters, and how are they different from something the AI learns on its own?

Bibliography

GitHub pages:

Science Buddies. (n.d.). reinforcement_learning_minesweeper. GitHub. Retrieved on January 20, 2026.
markov-labs. (n.d.). RL-Minesweeper. GitHub. Retrieved on January 20, 2026.

Play Minesweeper:

247 Minesweeper. (n.d.). 247 Minesweeper. Retrieved on January 20, 2026.

To learn more about reinforcement learning:

CodeEmporium. (Nov 28, 2023). Deep Q-Networks Explained! YouTube. January 20, 2026.
CrashCourse. (Oct 11, 2019). Reinforcement Learning: Crash Course AI #9. YouTube. Retrieved on January 20, 2026.
Dr. et al. (Sep 2, 2024). Reinforcement Learning: Agent Interaction, Rewards, and Balancing Exploration vs. Exploitation. YouTube. Retrieved January 20, 2026.

To learn more about neural networks:

Science Buddies. (May 7, 2024). Simple Explanation of Neural Networks. YouTube. Retrieved on January 20, 2026.

To learn more about hyperparameters:

deeplizard. (Nov 22, 2017). Learning Rate in a Neural Network explained. YouTube. Retrieved January 20, 2026.

Materials and Equipment

Computer with Internet access

Experimental Procedure

Download PDF of Procedure

This project follows the

Engineering Design Process. Confirm with your teacher if this is acceptable for your project, and review the steps before you begin.

Setting Up the Google Colab Environment

You will need a Google account. If you do not have one, make one when prompted.
Download the rl_minesweeper.ipynb file from Science Buddies. This is the code you will need to process your data.
Within your Google Drive, click on ‘MyDrive,’ select the ‘+ New’ button and ‘File upload.’ Select the rl_minesweeper.ipynb file you just downloaded.
Double-click on the rl_minesweeper.ipynb file. This should automatically open in Google Colab.
1. Read the Troubleshooting Tips and How to Use This Notebook sections. Follow the instructions you find in that section.
2. Run the block under Importing Libraries to ensure you have access to all the functions we will use for this project.
  1. After running all the blocks in this section, return to your Google Drive, and you should see a new folder called rl_minesweeper. This folder contains all of the files necessary to run our Minesweeper environment. You can read more about the code on the GitHub page here.

Editing the Hyperparameters

In this section, you will improve your Minesweeper AI the same way engineers improve a design: pick a goal (like “win more games”), set limits (how long training takes), then change one hyperparameter at a time, train, and test to see what works best. You may have to make a tradeoff–training longer can improve win rate but it can also take a lot more time. This cycle of “try->test->improve” matches the Engineering Design Process steps.

In this code block, you will see that the #TODO comment is where you can edit the model's hyperparameters. You will see that there is a long list, but here are the ones to focus on:
1. Within board, there are height, width, and num_mines.
  1. Here, you can adjust the height and width of the Minesweeper board, as well as the number of mines on the board. It is recommended that you start with a small board (4x4, 5x5, 6x6) and a low number of mines (1-3).
2. epsilon_decay controls how fast epsilon decreases.
  1. ɛ (epsilon) is the chance of taking a random action; it determines the balance between exploration (trying new moves) and exploitation (using the best known move). If ɛ decays too fast, the agent may stop exploring too early; if it’s too slow, learning can be inefficient. Watch this video to learn more about the exploration/exploitation trade-off.
  2. Our agent starts with an ɛ of 1, meaning it is 100% exploring at the beginning and choosing random actions instead of following what it currently thinks is best. As ɛ decreases over time ( epsilon_decay), the agent explores less and exploits more:
    1. With ɛ = 0.5, it acts randomly about 50% of the time.
    2. With ɛ = 0.1, it acts randomly about 10% of the time.
    3. With ε = 0.05, the agent chooses a random action about 5% of the time. Since 0.05 is the minimum ε, once the agent reaches this value, it continues exploring at a steady 5% rate for the rest of training. If you want, you can change this minimum–the lowest you can set it to is 0, which would mean the agent eventually stops taking random actions completely and always uses what it thinks is best.
  3. learning_rate_a is how big of a “step” the model takes when it tries to fix a mistake. Watch this video to learn more about the learning rate.
    1. A big learning rate = big steps, which means it can learn faster but might “overshoot” or overcorrect and get worse or act unstable.
    2. A small learning rate means taking small steps, which is steadier and more cautious, but it learns more slowly.
  4. For all other hyperparameters, we recommend only changing them if you are more familiar with machine learning.
  5. Run this code block after you finish making your changes.

Training the Agent

This code block will train the agent for a number of episodes. By default, it is set to 100, but to see meaningful results, it is recommended to change it to between 300,000-1,000,000.
1. If you have a smaller board (6x6 and smaller), you should try starting with 300,000 and see how your agent performs in the later sections.
2. If you have a larger board (6x6 and larger), try starting with 500,000 and increasing up to 1,000,000.
3. Run this code block once you have changed the number of episodes.
4. IMPORTANT: Even if your runtime disconnects, this code block will continue training your model. For example, if you finished training for 300,000 episodes, running this code block again will train the agent for another 300,000 episodes, for a total of 600,000. If you want to start a new model from scratch, it is best to rename your folder from rl_minesweeper to something like rl_minesweeper_v1, then download a new notebook and start from the section "Setting Up the Google Colab Environment" again.

Visualizing the Data

Run this code block to display training data. You will see four graphs displayed below.
1. The “Episode Length per Episode” graph shows the number of steps the agent can take on the board before the game ends, either when the agent finishes the game or when it hits a mine. Ideally, the agent would be able to take more steps per episode over time.
2. The “Epsilon Decay Over Time” graph displays the epsilon value over several episodes. Here, you can see how quickly epsilon decays and decide whether you want your epsilon to decay more slowly or faster.
3. The “Average Loss per Episode” graph shows how much the agent’s predictions differ from the training target while it learns. In general, you want the loss to decrease over time, which suggests the agent is learning. Some ups and downs are normal, but if the loss remains very high or is highly unstable, you may need to adjust hyperparameters, such as the learning rate.
4. The “Total Reward per Episode” graph shows how many reward points the agent earned in each episode. This is one of the best overall signs of improvement: if the agent is learning, the total reward should usually increase over time (though it may be noisy). If the reward stays flat or gets worse, it may mean the agent isn’t learning effectively–or that the reward system needs to be adjusted.

Evaluating the Data

Run this code block to have your agent run 100 different Minesweeper games and calculate its win rate. How well did your agent perform?

Changing the Agent

Iteration is a required part of the Engineering Design Process, so you should keep testing and improving your agent. To do this, make a data table to track each version of your model and decide whether your changes are worth it based on your criteria and constraints (for example: higher win rate vs. longer training time). After each run, record your settings and results, then choose your next change and try again. To improve your agent, you have two main options:

Keeping training the same agent: Go back to Training the Agent and rerun that section (and the sections after it) to train for more episodes. Remember that continuing training means that the agent continues learning on top of what it has already learned.

Start a new version of the agent (recommended for comparing changes to the hyperparameters): Rename your current folder from rl_minesweeper to rl_minesweeper_v1, then download a fresh copy of the notebook and repeat the Setting Up the Google Colab Environment steps. This lets you create a new agent with different settings while keeping your old results.

Swipe left to see more

**Table 1.** Example data table. You can add more rows for additional changes.
Version #	Episodes Trained For	Board Size	# of Mines	Other Hyperparameters Changed	Win Rate (%)
V1	300,000	6x6	3	None	-
V1	600,000	6x6	3	None	-
V2	500,000	5x5	3	Changed epsilon_decay to 0.99999	-

Ask an Expert

Do you have specific questions about your science project? Our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Post a Question

Variations

Check out the GitHub page for this project and create additional models, such as Double DQN and Dueling DQN, among others.
Adjust the reward values within the /environment/minesweeper_env.py file.
Compare the model’s win rate to a human player.
Right now, the model is not designed to train on a small board and “carry over” to a bigger one. Modify the project so it can reuse what it learned (transfer learning) and see if that helps it learn faster on harder boards.

Careers

If you like this project, you might enjoy exploring these related careers:

Data Scientist

Career Profile

Many aspects of peoples' daily lives can be summarized using data, from what is the most popular new video game to where people like to go for a summer vacation. Data scientists (sometimes called data analysts) are experts at organizing and analyzing large sets of data (often called "big data"). By doing this, data scientists make conclusions that help other people or companies. For example, data scientists could help a video game company make a more profitable video game based on players'… Read more

Computer Programmer

Career Profile

Computers are essential tools in the modern world, handling everything from traffic control, car welding, movie animation, shipping, aircraft design, and social networking to book publishing, business management, music mixing, health care, agriculture, and online shopping. Computer programmers are the people who write the instructions that tell computers what to do. Read more

Computer Software Engineer

Career Profile

Are you interested in developing cool video game software for computers? Would you like to learn how to make software run faster and more reliably on different kinds of computers and operating systems? Do you like to apply your computer science skills to solve problems? If so, then you might be interested in the career of a computer software engineer. Read more

News Feed on This Topic

, ,

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Ngo, Tracey. "Train a Reinforcement Learning AI to Play Minesweeper." Science Buddies, 27 Feb. 2026, https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p034/artificial-intelligence/rl_minesweeper?from=Blog. Accessed 9 June 2026.

APA Style

Ngo, T. (2026, February 27). Train a Reinforcement Learning AI to Play Minesweeper. Retrieved from https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p034/artificial-intelligence/rl_minesweeper?from=Blog

Last edit date: 2026-02-27

Explore Our Science Videos

Finding Pi Using Everyday Objects | STEM Activity

Build a Mini Trebuchet | STEM Activity

Valentine's Day Rube Goldberg Machine | STEM Activity

Train a Reinforcement Learning AI to Play Minesweeper

Abstract

Summary

Objective

Introduction

Terms and Concepts

Questions

Bibliography

Materials and Equipment

Experimental Procedure

Setting Up the Google Colab Environment

Editing the Hyperparameters

Training the Agent

Visualizing the Data

Evaluating the Data

Changing the Agent

Ask an Expert

Variations

Careers

Related Links

News Feed on This Topic

Cite This Page

MLA Style

APA Style

Explore Our Science Videos