Train a Reinforcement Learning AI to Play Minesweeper
Abstract
Have you ever played Minesweeper? Did you start clicking around randomly until you hit a mine? What did you think the numbers meant? Do you think a computer program can play Minesweeper better than a human? In this project, you will explore how to train an AI agent to play Minesweeper.
Summary
None
Readily availiable
No issues
Objective
To experiment with changing model hyperparameters in training an AI to play Minesweeper.
Introduction
Minesweeper is a classic puzzle game where your goal is to uncover every safe square on a grid–without clicking on a mine. Today, many people already know how it works, but when Minesweeper first became popular, new players often clicked randomly because they were unfamiliar with the rules. Over time, people learned the strategy: the numbers on the board aren’t random–they’re clues. Each number tells you how many mines are touching that square, and you use logic to decide which squares are safe and which ones might be dangerous. If you click on a single mine, the game ends.

We’ll explore how Artificial Intelligence (AI) (computer systems designed to perform tasks that normally require human intelligence, such as pattern recognition, learning, and decision-making) can learn to play Minesweeper. Unlike games where you can always see where all the pieces are (like checkers or chess), Minesweeper is partially observable. You start with a blank grid and do not know which squares contain mines. You (or the AI) only discover information as you click and reveal tiles. Instead of solving the game with complete information, the AI must learn how to make informed decisions using limited clues – just like humans do.
To do this, we’ll use a machine learning approach called reinforcement learning. Reinforcement learning is a way for an AI to learn by trying actions and learning from the results, similar to how someone might learn a new game through practice. The AI (called the agent) interacts with an environment (the Minesweeper board). Each time it makes a move, it receives feedback. Over many attempts, the agent learns which choices tend to lead to better feedback, such as winning the game–meaning it avoids mines longer and wins more often.
Watch this video to learn more about reinforcement learning:
To train our agent, we’ll use a specific reinforcement learning method called a Deep Q-Network (DQN). A DQN learns to estimate how good each action is in each situation. Instead of memorizing every single board configuration, the DQN uses a model to generalize from experience.
Watch this video to learn more about DQNs:
The “deep” in Deep-Q-Network refers to the fact that the agent utilizes a neural network. A neural network is a type of machine learning model inspired by how the brain processes information. You can think of a neural network as a pattern-finder: it learns to recognize board situations and predict which actions are likely to be safe or useful.
Watch this video to learn more about neural networks:
In this project, your task is to help the agent learn better by adjusting hyperparameters and training time. Hyperparameters are settings you choose before training starts–they control how learning happens. For example, hyperparameters can affect how fast the AI updates its knowledge, how much it explores random moves, and how strongly it relies on past experiences. You’ll also change the number of episodes, which are full games (or attempts) the agent plays during training. One episode ends when the agent wins or hits a mine. In general, more episodes provide the agent with more practice; however, too many or too few episodes, or suboptimal hyperparameter choices, can affect how well the AI learns.
Terms and Concepts
- Artificial Intelligence (AI)
- Partially observable
- Reinforcement learning
- Agent
- Deep Q-Network (DQN)
- Neural network
- Hyperparameter
- Episode
Questions
- What is the goal of Minesweeper?
- Why can’t the player see the whole board at the start of the game?
- What are hyperparameters, and how are they different from something the AI learns on its own?
Bibliography
GitHub pages:
- Science Buddies. (n.d.). reinforcement_learning_minesweeper. GitHub. Retrieved on January 20, 2026.
- markov-labs. (n.d.). RL-Minesweeper. GitHub. Retrieved on January 20, 2026.
Play Minesweeper:
- 247 Minesweeper. (n.d.). 247 Minesweeper. Retrieved on January 20, 2026.
To learn more about reinforcement learning:
- CodeEmporium. (Nov 28, 2023). Deep Q-Networks Explained! YouTube. January 20, 2026.
- CrashCourse. (Oct 11, 2019). Reinforcement Learning: Crash Course AI #9. YouTube. Retrieved on January 20, 2026.
- Dr. et al. (Sep 2, 2024). Reinforcement Learning: Agent Interaction, Rewards, and Balancing Exploration vs. Exploitation. YouTube. Retrieved January 20, 2026.
To learn more about neural networks:
- Science Buddies. (May 7, 2024). Simple Explanation of Neural Networks. YouTube. Retrieved on January 20, 2026.
To learn more about hyperparameters:
- deeplizard. (Nov 22, 2017). Learning Rate in a Neural Network explained. YouTube. Retrieved January 20, 2026.
Materials and Equipment
- Computer with Internet access
Experimental Procedure

Setting Up the Google Colab Environment
- You will need a Google account. If you do not have one, make one when prompted.
- Download the rl_minesweeper.ipynb file from Science Buddies. This is the code you will need to process your data.
- Within your Google Drive, click on ‘MyDrive,’ select the ‘+ New’ button and ‘File upload.’ Select the
rl_minesweeper.ipynbfile you just downloaded. - Double-click on the
rl_minesweeper.ipynbfile. This should automatically open in Google Colab.- Read the Troubleshooting Tips and How to Use This Notebook sections. Follow the instructions you find in that section.
- Run the block under Importing Libraries to ensure you have access to all the functions we will use for this project.
- After running all the blocks in this section, return to your Google Drive, and you should see a new folder called
rl_minesweeper. This folder contains all of the files necessary to run our Minesweeper environment. You can read more about the code on the GitHub page here.
- After running all the blocks in this section, return to your Google Drive, and you should see a new folder called
Editing the Hyperparameters
In this section, you will improve your Minesweeper AI the same way engineers improve a design: pick a goal (like “win more games”), set limits (how long training takes), then change one hyperparameter at a time, train, and test to see what works best. You may have to make a tradeoff–training longer can improve win rate but it can also take a lot more time. This cycle of “try->test->improve” matches the Engineering Design Process steps.
- In this code block, you will see that the
#TODOcomment is where you can edit the model's hyperparameters. You will see that there is a long list, but here are the ones to focus on:- Within
board, there areheight,width, andnum_mines.- Here, you can adjust the height and width of the Minesweeper board, as well as the number of mines on the board. It is recommended that you start with a small board (4x4, 5x5, 6x6) and a low number of mines (1-3).
epsilon_decaycontrols how fast epsilon decreases.- ɛ (epsilon) is the chance of taking a random action; it determines the balance between exploration (trying new moves) and exploitation (using the best known move). If ɛ decays too fast, the agent may stop exploring too early; if it’s too slow, learning can be inefficient. Watch this video to learn more about the exploration/exploitation trade-off.
- Our agent starts with an ɛ of 1, meaning it is 100% exploring at the beginning and choosing random actions instead of following what it currently thinks is best. As ɛ decreases over time ( epsilon_decay), the agent explores less and exploits more:
- With ɛ = 0.5, it acts randomly about 50% of the time.
- With ɛ = 0.1, it acts randomly about 10% of the time.
- With ε = 0.05, the agent chooses a random action about 5% of the time. Since 0.05 is the minimum ε, once the agent reaches this value, it continues exploring at a steady 5% rate for the rest of training. If you want, you can change this minimum–the lowest you can set it to is 0, which would mean the agent eventually stops taking random actions completely and always uses what it thinks is best.
learning_rate_ais how big of a “step” the model takes when it tries to fix a mistake. Watch this video to learn more about the learning rate.- A big learning rate = big steps, which means it can learn faster but might “overshoot” or overcorrect and get worse or act unstable.
- A small learning rate means taking small steps, which is steadier and more cautious, but it learns more slowly.
- For all other hyperparameters, we recommend only changing them if you are more familiar with machine learning.
- Run this code block after you finish making your changes.
- Within
Training the Agent
- This code block will train the agent for a number of episodes. By default, it is set to 100, but to see meaningful results, it is recommended to change it to between 300,000-1,000,000.
- If you have a smaller board (6x6 and smaller), you should try starting with 300,000 and see how your agent performs in the later sections.
- If you have a larger board (6x6 and larger), try starting with 500,000 and increasing up to 1,000,000.
- Run this code block once you have changed the number of episodes.
- IMPORTANT: Even if your runtime disconnects, this code block will continue training your model. For example, if you finished training for 300,000 episodes, running this code block again will train the agent for another 300,000 episodes, for a total of 600,000. If you want to start a new model from scratch, it is best to rename your folder from
rl_minesweeper to something likerl_minesweeper_v1, then download a new notebook and start from the section "Setting Up the Google Colab Environment" again.
Visualizing the Data
- Run this code block to display training data. You will see four graphs displayed below.
- The “Episode Length per Episode” graph shows the number of steps the agent can take on the board before the game ends, either when the agent finishes the game or when it hits a mine. Ideally, the agent would be able to take more steps per episode over time.
- The “Epsilon Decay Over Time” graph displays the epsilon value over several episodes. Here, you can see how quickly epsilon decays and decide whether you want your epsilon to decay more slowly or faster.
- The “Average Loss per Episode” graph shows how much the agent’s predictions differ from the training target while it learns. In general, you want the loss to decrease over time, which suggests the agent is learning. Some ups and downs are normal, but if the loss remains very high or is highly unstable, you may need to adjust hyperparameters, such as the learning rate.
- The “Total Reward per Episode” graph shows how many reward points the agent earned in each episode. This is one of the best overall signs of improvement: if the agent is learning, the total reward should usually increase over time (though it may be noisy). If the reward stays flat or gets worse, it may mean the agent isn’t learning effectively–or that the reward system needs to be adjusted.
Evaluating the Data
- Run this code block to have your agent run 100 different Minesweeper games and calculate its win rate. How well did your agent perform?
Changing the Agent
- Iteration is a required part of the Engineering Design Process, so you should keep testing and improving your agent. To do this, make a data table to track each version of your model and decide whether your changes are worth it based on your criteria and constraints (for example: higher win rate vs. longer training time). After each run, record your settings and results, then choose your next change and try again. To improve your agent, you have two main options:
- Keeping training the same agent: Go back to Training the Agent and rerun that section (and the sections after it) to train for more episodes. Remember that continuing training means that the agent continues learning on top of what it has already learned.
- Start a new version of the agent (recommended for comparing changes to the hyperparameters): Rename your current folder from
rl_minesweepertorl_minesweeper_v1, then download a fresh copy of the notebook and repeat the Setting Up the Google Colab Environment steps. This lets you create a new agent with different settings while keeping your old results.
Swipe left to see moreTable 1. Example data table. You can add more rows for additional changes. Version # Episodes Trained For Board Size # of Mines Other Hyperparameters Changed Win Rate (%) V1 300,000 6x6 3 None - V1 600,000 6x6 3 None - V2 500,000 5x5 3 Changed epsilon_decay to 0.99999 -
Ask an Expert
Variations
- Check out the GitHub page for this project and create additional models, such as Double DQN and Dueling DQN, among others.
- Adjust the reward values within the
/environment/minesweeper_env.pyfile. - Compare the model’s win rate to a human player.
- Right now, the model is not designed to train on a small board and “carry over” to a bigger one. Modify the project so it can reuse what it learned (transfer learning) and see if that helps it learn faster on harder boards.
Careers
If you like this project, you might enjoy exploring these related careers:











