Abstract
Have you ever wondered what causes wildfires and how resources are allocated to prevent their spread? In this science project, you will use machine learning to predict areas of wildfire susceptibility and their intensities.
Summary
None
Readily available
No issues
Objective
Use Artificial Intelligence (AI) to predict future wildfire locations and intensities.
Introduction
Did you know that a majority of wildfires in the United States are caused by humans? Although many of these wildfires are accidents, prevention could help reduce their impact on our environment. Although the overall frequency, or number of wildfires per year, has not drastically increased over time, the extent and damage caused by them have. These changes are likely due to changing temperatures, precipitation, and drier conditions mediated by climate change. These changes have even led to unpredictable shifts in wildfire season timing that have left particular regions more vulnerable to high-intensity wildfires.
Wildfires cause immense damage to the landscape, habitats, and homes in a region. Knowing when wildfires are more common is the first step in planning for wildfire prevention. Many regions across the world have warmer seasons, when plants dry out and become fuel for fires. All that's needed then is an ignition source, like lightning, sparks, or human activity, to start a wildfire. There are multiple things we can do to reduce the risk of wildfires in our communities. One of the most well-known is Smokey Bear's reminder: ‘Only you can prevent forest fires.’ What this means is that people can greatly reduce the number of wildfires by being careful with fire and making sure campfires or outdoor burns are fully extinguished. To support this, many U.S. states place restrictions on when outdoor fires are allowed, especially during dry seasons when forests and grasslands are most vulnerable to fire spread. These rules often apply to state forests, public lands, and campsites.
How we prevent and reduce the burden of forest fires has evolved with scientific discovery and better data collection by the federal government. More recent scientific findings have shown that wildfire suppression and performing controlled burns in fire-prone habitats are important measures to reduce overall wildfire intensity and frequency. However, common wildfire patterns have become more unpredictable year to year, making it more challenging to predict where to allocate resources for fire prevention, management, suppression, and control. Could Artificial Intelligence (AI) — computer systems designed to perform tasks that normally require human intelligence, such as pattern recognition, learning, and decision-making — be used to help us better predict where to focus our efforts to help control wildfires?
One way AI works is through machine learning, a branch of AI where computer algorithms learn from data to make predictions or decisions without being explicitly programmed. In the case of wildfires, we could use machine learning to analyze past wildfire data to better predict where and when wildfires are most likely to occur. These predictions could then guide how resources are allocated to prevent and suppress fires. The wildfire data you will use comes from NASA's Fire Information for Resource Management System (FIRMS), which identifies active fires using satellite imagery by detecting unusually bright pixels in thermal infrared channels that correspond to fire activity. This allows scientists to track wildfire locations and intensities worldwide in near real time.
In this science project, you will use a machine learning algorithm called the Prophet model to predict wildfires based on multiple years of collected data on wildfire locations and intensities. Then, you will compare your model's predictions to real yearly reported data to see how accurately your algorithm can identify regions that are especially susceptible to wildfires.
Watch this video to learn more about the Prophet model:
Terms and Concepts
- Wildfire
- Frequency
- Extent
- Damage
- Intensity
- Ignition
- Suppression
- Controlled burns
- Artificial Intelligence (AI)
- Machine learning
- Prophet
- Fire Radiative Power (FRP)
- Rolling average
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
Questions
- What is the most common cause of wildfires in the United States?
- Why has the extent and damage of wildfires worsened?
- What is an ignition, and how can we prevent forest fires?
- How has wildfire prevention shifted to control measures?
- Why is predicting the location and intensity of wildfires so challenging?
- Why is the Prophet model suited for this type of predictive analysis?
Bibliography
More on wildfires:
- Environmental Protection Agency. (2025, May 9). Climate Change Indicators: Wildfires. Retrieved August 12, 2025.
We are using data from NASA:
- Fire Information for Resource Management System. (n.d.) Country Yearly Summary [.csv]. NASA. Retrieved August 12, 2025.
- Fire Information for Resource Management System. (n.d.) NASA FIRMS active fire data attributes documentation. NASA. Retrieved August 12, 2025.
More on the Prophet model:
- LaBarr, Aric. (2022, Feb 16). What is the Prophet Model. YouTube. Retrieved August 12, 2025.
- Prophet. (n.d.). Prophet Quick Start. Facebook. Retrieved August 12, 2025.
We are using this tool to find coordinates of a specific region:
- geojson.io. (n.d.). geojson.io. Retrieved August 12, 2025.
Materials and Equipment
- Computer with Internet access
Experimental Procedure

Setting Up the Google Colab Environment
- You will need a Google account. If you do not have one, make one when prompted.
- Download the wildfire_prediction.ipynb file from Science Buddies.
- Within your Google Drive, click on ‘MyDrive,’ then create a new folder and rename it
wildfire_prediction. Inside the folder, upload thewildfire_prediction.ipynbfile. - Double-click on the
wildfire_prediction.ipynbfile. This should automatically open in Google Colab.- Read the Troubleshooting Tips and How to Use This Notebook sections. Follow the instructions you find in that section.
- Run the block under Importing Libraries to ensure you have access to all the functions we will use for this project.
Collecting the Data
- Access FIRMS (Fire Information for Resource Management System) and choose the MODIS instrument.
- You can select other instruments, but MODIS typically has the largest dataset.
- Select your prediction year and country.
- Download data for your chosen country covering five years leading up to your prediction year (inclusive). For example, if you want to predict for the United States in 2024, download data from the years 2019-2024 (six years total).
- Note: We generally recommend using more recent data, since tracking methods have improved over time, and the results will be more relevant to current trends. As a variation of this project, you could compare model performance by training on different time periods–for example, see whether a model trained on 2019-2023 predicts 2024 more accurately than a model trained on 2000-2004 does for 2005.
- Download the data for each year:
- On the FIRMS page, click the desired year.
- Scroll down to find your selected country and download its CSV file.
- Repeat for all required years.
- Check the frequency of data collection, as a quality control measure, by opening and checking your CSV files:
- Caution: Some countries may have less frequent fire monitoring and therefore fewer data entries. To maximize the accuracy of your predictions, choose a country or area that has at least one data record (row) for each day in the CSV file. Note that a daily record does not always mean a fire occurred — sometimes the fire radiative power (FRP) value will be very low, indicating little or no fire activity.
- Upload to Google Drive:
- Upload all of your downloaded CSV files into your
wildfire_predictionfolder. - You may rename the files if you would like to. We recommend choosing clear, descriptive names for each dataset–for example:
wildfire_united_states_2024.csv.
- Upload all of your downloaded CSV files into your
Loading the Data into a Pandas DataFrame
- (Code Block 1A) This code block loads all the wildfire data CSV files you uploaded and merges them into one DataFrame. Run this code block.
- (Code Block 1B) This code block will reload your data. This will be faster than running Code Block 1A because it will load the combined DataFrame directly instead of loading six different .csv files and combining them. You can run this code block instead of Code Block 1A if your runtime disconnects.
Visualizing the Data
Before diving into the machine learning portion of this project, we will visualize our data. This step is crucial because visual exploration helps us gain insights that guide data preprocessing and improve the effectiveness of our predictive models.
- (Code Block 2A) This code block defines a function that will show us Fire Radiative Power (FRP), which is how fast a fire gives off energy. The function calculates the monthly average FRP values for a given year and shows them in a bar graph. Run this code block.
- (Code Block 2B) This code calls the function from Code Block 2A to display the FRP values for the year, defaulting to 2024. You can change the year under the
#TODOcomment to view data for any year you would like. - (Code Block 2C) This code block defines a function that will create an interactive map of intense fires (FRP >= 100) for a given year (and optional month). Run this code block.
- (Code Block 2D) This code calls the function from Code Block 2D to display the FRP values for the defined year and month on a map, defaulting to the year 2024 and month 7 (July). You can change the year and month under the
#TODOcomment to view data for any year and month you want. You can also remove the month from the function to view values for the entire year.
Preprocessing
You will preprocess our data to prepare it for the machine learning model. To do this, you will first remove unnecessary columns and filter them to a smaller geographic area. Focusing on a smaller region will improve prediction accuracy because factors like climate, vegetation, and fire behavior vary widely across geographical regions. Thus, the Prophet model performs better when trained on more region-specific patterns. Note that some countries may be small enough and have fewer data points per area (fewer than 400 data points per year). If that is the case, you can skip to step 4 (Code Block 3D).
- (Code Block 3A) This code block simplifies the dataset by removing unnecessary columns (
acq_time,satellite,instrument,version,daynight) from the DataFrame df. Run this code block.- The
acq_timevariable records the exact time the satellite acquired the data. We recommend excluding this variable for the current analysis because the focus is on daily trends rather than specific times of day. However, as a variation of this project, you could explore usingacq_timeto see whether the time of day affects fire detection patterns. - The
satellitevariable indicates which satellite collected the data (Terra or Aqua). Since both satellites carry the same MODIS sensor and capture similar data, this column doesn't add useful variation to your model. - The
instrumentvariable specifies the sensor used (MODIS). Since only one instrument is involved, this column provides no additional information to the model. - The
versionvariable shows the algorithm used for fire detection (e.g., 61.03). Since the dataset uses a consistent version throughout, this column is not informative for modeling. - The
daynightvariable indicates whether the observation was made during day or night. We drop it because our current model doesn't differentiate based on time of day and focuses on daily aggregated data.
- The
- (Code Block 3B) You will now filter our data by region. Double-check that you downloaded data for the selected region of the country you select on the map.
- Open the mapping tool: Go to geojson.io and zoom in on the area you want to analyze. This could be a specific state or region.
- Draw your region: Click the Draw Rectangular Polygon (r) tool on the right side of the map. Draw a box around your chosen area.
- Locate the coordinates: On the right panel, a JSON file will appear. Under
“coordinates”, you will see four coordinate pairs in the format(longitude, latitude)representing the rectangle's corners. - Identify min/max values: The second and fourth coordinates contain the minimum and maximum latitude and longitude values for your box.
- Update the code: In the code block, under the
#TODOcomment, replace the default values with your coordinates, then run the code. - Note: After running the code, the number of rows will be displayed below. Ensure there are at least 2,000 records remaining after filtering for your selected area. If not, we recommend choosing a larger region.
- (Code Block 3C) This code block displays your selected region on a map. Confirm that this is the area you want to use for training the model; if not, update the latitude and longitude values in the previous code block.
- (Code Block 3D) This code block selects the median of the data by day and removes any missing values. We aggregate by day and clean it because the Prophet model works best with regularly spaced time series data. Selecting the median of multiple entries per day ensures one value per day, creating consistent daily data points. Removing missing values helps the model train smoothly without gaps or errors, enabling forecast prediction.
- Note: Although we aggregate the data daily by median without considering exact latitude and longitude, you have already accounted for this by filtering the data to a specific region.
Visualizing Region Data
- (Code Block 4A) This code block plots FRP values for your selected region across all available years. Do you notice any seasonal patterns?
- (Code Block 4B) This code block defines a function that adds new columns to the data for month, week of the year, and season based on the date. We add these time-based columns because they help the Prophet model better understand patterns in the data. Features like month, week of the year, and season capture important seasonal and yearly trends, which improve the model's ability to make accurate forecasts.
- (Code Block 4C) This code block graphs a boxplot showing how FRP values vary by season, with each season color-coded. What seasonal trends or patterns do you notice in the FRP values, and what might explain these differences?
Splitting Data into Train and Test
- (Code Block 5A) This code block splits the data into training and testing sets. We use five years of data to predict the following year. For example, you will use the years 2019-2023 to forecast 2024, then the split will be done on January 1, 2024. You can change the split date under the
#TODOcomment, but be sure to keep the same format for yearly predictions:“1-Jan-2024”.
Training the Model
- (Code Block 6A) This code block resets the index and renames columns to match the Prophet model’s required format:
‘acq_date’becomes‘ds’(date) and‘frp’becomes‘y’(target). Run this code block. - (Code Block 6B) This code block creates the Prophet model and trains it on the training data. The
%%timecommand measures how long the training takes, and will be printed under the code block.
Evaluating the Model
- (Code Block 7A) This code block uses the trained Prophet model to generate predictions for the test data. Run this code block.
- (Code Block 7B) This code block creates a chart of the Prophet model's predictions. The black dots show the training data, the blue line shows the Prophet forecast, and the shaded blue area represents the model's confidence range.
Figure 1. Example of the Prophet forecast. The black dots represent historical data points, showing the FRP levels over time from 2019 through 2024 (x-axis). The blue shaded region represents the forecast for the future, with the darker blue line indicating the predicted trend and the lighter blue area showing the uncertainty range of the prediction. Image Credit: Science Buddies
Example of the Prophet forecast. The black dots represent historical data points, showing the FRP levels over time from 2018 through 2023 (x-axis).
- How well do you think the model captures seasonal trends or fluctuations?
- What does the shaded confidence interval tell us about the model’s certainty?
- (Code Block 7C) This code block plots the Prophet model’s predicted FRP values alongside the actual observed data for the test period. It also includes a 28-day rolling average, a technique that smooths the data by averaging values over a moving window of days, to highlight underlying trends and reduce short-term fluctuations.
- Are there periods where the Prophet forecast significantly differs from the actual values? What might explain these differences?
- What does the rolling average tell us about the overall trend compared to the daily fluctuations?
- (Code Block 7D) This code block calculates the Mean Absolute Error (MAE) to assess the accuracy of the model's predictions.
- The MAE shows how close predictions are to actual values by averaging the size of errors. A lower MAE indicates better prediction accuracy. An MAE of 50 indicates that, on average, the model's predictions differ from the actual values by 50 units. For example, if the model predicts the FRP to be 150, the actual FRP value can be anywhere between 100 and 200.
- (Code Block 7E) This code block calculates the Mean Absolute Percentage Error (MAPE) to evaluate the accuracy of the model's predictions.
- The MAPE measures how far predictions are from actual values on average, expressed as a percentage. A lower MAPE means the model's predictions are closer to the real values relative to their size. A MAPE of 50% means that, on average, the model's predictions are off by 50% from the actual values. For example, if the model predicts a value of 100, the true value can be anywhere between 50 and 150.
- (Code Block 7F) This code block defines a function that creates an interactive map displaying the Prophet model’s predicted fire intensities for a chosen year and optional month. Run this code block.
- (Code Block 7G) This code calls the function from Code Block 7I to display the predicted FRP values for the defined year and month on a map, defaulting to the year 2024 and month 7 (July). You can change the year and month under the
#TODOcomment to view data for any year and month you want. You can also remove the month from the function to view values for the entire year. - (Code Block 7H) This code block calls the function from Code Block 2D again to display the actual FRP values for the defined year and month on a map. Again, you can change the year and month under the
#TODOcomment to view the data for the year and month you want.- How does this map compare to the one from Code Block 7J? If they differ, what do you think might explain those differences?
Final Reflection
- Reflect on your results. Look at how well your model predicted wildfires. Ask yourself:
- Does the model do a good job showing overall patterns or trends in wildfire activity?
- Do you think the model can predict the exact day, place, or size of a wildfire so firefighters would know exactly where to go?
- If not, how might the model still be useful for helping people decide where to put resources (like equipment or staff) to be better prepared?
Ask an Expert
Global Goals
The United Nations Sustainable Development Goals (UNSDGs) are a blueprint to achieve a better and more sustainable future for all.
Variations
- You can also use variables other than FRP—like
brightness(brightness temperature in Kelvin, derived from the 4 µm channel),bright_t31(brightness temperature in Kelvin, from the 11 µm channel),confidence(a quality flag indicating the likelihood that a detection is a true fire), andtype(categorizes the source of the detection: e.g., 0 = vegetation fire, 1 = active volcano, 2 = other static land source, 3 = offshore)—as predictors in your analysis. A full description of these variables can be found in the NASA FIRMS active fire data attributes documentation. - Explore daily trends by keeping the
acq_timevariable to see what time of day fires are most likely to occur. This can also suggest when more personnel might be needed for wildfire prevention and control. - Repeat the analysis without averaging the daily values to see if it improves accuracy. You can also experiment with a different model in addition to Prophet.
- Compare wildfire patterns across different countries, but note that data availability may vary and affect results. To minimize bias, we recommend comparing data sets with similar sizes.
- Experiment with adjusting the Prophet model settings, like seasonality or other hyperparameters. Use criteria to remove outliers or apply scaling techniques to FRP and other features to enhance model performance. Which of these changes most drastically improves predictions in Prophet forecasting?
- Change the intensity filter for fires (e.g., only include fires with FRP ≥ 200) to focus on the most severe events. When you do this, expect fewer data points overall, but the visualization will highlight larger, more intense fires. Be aware that smaller fires—which can still be frequent or important—will be excluded.
- To evaluate how data volume impacts model accuracy, incorporate more years of historical data or extend predictions further into the future, keeping in mind that human-driven changes and interventions might impact predictability.
- You can visualize the model's components by running
model.plot_components. This will break down the Prophet forecast into its main elements, such as overall trend, weekly patterns, and yearly seasonality. For more details on these components, refer to the Prophet documentation.
Careers
If you like this project, you might enjoy exploring these related careers:













