Abstract
Woolly mammoths shook the ground of ice-age tundras for millennia, living next to saber tooth tigers and prehistoric man. Although they have been extinct for thousands of years, scientists continue to learn more and more about this mighty animal. Some of the most exciting new research is being produced by looking at DNA extracted from the hair and bones of woolly mammoths entombed in ice. In this genomics science fair project, you will use bioinformatics tools to determine the woolly mammoth's closest relative. You will also estimate the rate at which the woolly mammoth's DNA accumulated mutations.Summary
David Whyte, PhD, Science Buddies
Edited: Svenja Lohner, PhD, Science Buddies

Objective
The objective of this genomics science fair project is to use the BLAST bioinformatics tool to identify the closest living relative of the extinct woolly mammoth. You will also estimate the mutation rate in the woolly mammoth's mitochondrial DNA.
Introduction
The application of molecular analysis techniques to samples retrieved from the hair and bone of extinct animals has generated excitement among paleobiologists and other scientists. Our understanding of the woolly mammoth, for example, which used to be based on fossils and comparison with modern elephants, has been enriched by analysis of its DNA.
Woolly mammoths (scientific name: Mammuthus primigenius) are extinct elephant-like animals that roamed Earth until about 10,000 years ago, when scientists believe they became extinct due to warming weather, human hunters, disease, or some combination of these factors.

Figure 1. The woolly mammoth at the Royal BC Museum, Victoria, British Columbia. (Wikipedia, 2008.)
There are believed to be millions of mammoths buried in Siberia's permafrost. Only about 100 specimens have been uncovered in the sparsely populated countryside, but some of them have been remarkably well preserved. In 1997, an entire mummified woolly mammoth was found in Siberian ice! It was transferred in October 1999 to a frigid, underground cave where it will be carefully studied. Scientists hope to clone this remarkable specimen.
Mammoths are classified in the order Proboscidae, which includes mammoths, modern Asian elephants (scientific name: Elephas maximus) and African elephants (scientific name: Loxodonta africana), and mastodons (scientific name: Mammut americanum). In the past few years, scientists have successfully isolated mitochondrial DNA from the hair and bones of mammoths and mastodons. A study by Nadin Rohland, from the Max Planck Institute for Evolutionary Anthropology in Germany, and his colleagues, demonstrates the power of genetic data to clarify interrelationships. The researchers used the DNA sequence information from a mastodon tooth to estimate that the time of divergence of mastodons occurred about 27 million years ago. African elephants diverged from Asian elephants and mammoths about 7.6 million years ago. The divergence between mammoths and Asian elephants was estimated to have taken place about 6.7 million years ago (Figure 2).

This simplified phylogenetic tree shows a relationship between four species of proboscideans. This tree shows that African and Asian elephants diverged from mammoths around 6-9 million years ago. It also shows mastodons and mammoths having diverged in the family tree about 24-28 million years ago.
Figure 2. Phylogeny and divergence times for the four proboscidean species. Phylogenetic trees are like family trees: individuals, or species in this case, are grouped together based on shared ancestry. The image on each of the "branches" is a representation of the animal's tooth. "MY" stands for million years. (Rohland, et al., 2007.)
The analysis of mitochondrial DNA is a valuable tool in evolutionary biology. It is easier to obtain good sequences from mitochondrial DNA than it is from chromosomal DNA because there are multiple copies in each cell. Mitochondrial DNAs are only passed on from one parent (mothers) to their child, which means that they are not involved in recombination. As a result, any variations in mitochondrial DNA are only due to mutations. The mutations that are accumulated over time are useful in phylogenetic analysis. You might ask, what is phylogenetic analysis and how are mutations used? Phylogenetic analysis simply means the study of evolutionary relationships among organisms. Mutations play a role in phylogenetic analysis because they are ideal "tags" for lines of descent. The presence of a particular mutation within a group of animals is evidence of a common ancestor. Most mutations are not passed down to subsequent generations, but some do become common within the population. A mutation that is present at a significant level (over 1%) within a population is often called a polymorphism. Based on the differences in mitochondrial DNA, one can create a phylogenetic tree, which is a tree-like diagram showing the evolutionary relationships among different biological species based upon similarities and differences in their genetic characteristics. You can watch the two videos below to learn more about phylogenetics and reading phylogenetic trees.
In this genomics science fair project, you will compare mitochondrial DNA from a woolly mammoth to mitochondrial DNA from other Proboscidae species such as other extinct mammoths, a mastodon, an Asian elephant, and two different African elephant species. Based on the level of sequence similarity, you will be able to determine how closely the mammoth is related to mastodons and to modern elephants.
As outlined in the Procedure you will first download the sequence for woolly mammoth mitochondrial DNA. You will then use BLAST, a software tool available at the National Center for Biotechnology Information (NCBI) website, to search a database of DNA sequences for the best matches to the mammoth DNA. You will be using BLAST to infer evolutionary relationships among the modern elephants and their extinct relatives.
For the basic procedure, you will BLAST a portion of the woolly mammoth mitochondrial DNA against a small set of related sequences from mastodon and elephant sequences. Once you are familiar with the tools and databases, you can expand your science fair project to include protein sequences. You can also expand the number of organisms you search for with the BLAST tool.
Terms and Concepts
To start your background research, look up the terms below. Also, the paper by Rohland, et al., cited in the Bibliography, has very useful data and background information.
- Paleobiologist
- Woolly mammoth (Mammuthus primigenius)
- Proboscidian
- Asian elephant (Elephas maximus)
- African elephant (Loxodonta africanus)
- Mastodon (Mammut americanum)
- Mitochondrial DNA (mtDNA)
- Recombination
- Phylogenetic analysis
- Phylogenetic tree
- Mutation
- Polymorphism
- BLAST
- Genome
- Molecular clock
- Base pairs
- Query
Questions
- Based on the diagram in Figure 2, how long ago did the mastodons diverge from the elephants and mammoths?
- How is mitochondrial DNA used to establish phylogenetic trees?
- What is the relationship between the number of mutational differences in two mitochondrial genomes and the length of time since they diverged? In other words, how can you use mutations to establish a molecular clock?
Bibliography
- Rohland, N. (200, July 24). Proboscidean Mitogenomics: Chronology and Mode of Elephant Evolution Using Mastodon as Outgroup. PLOS Biology, Vol. 5, No. 8, e207. Retrieved August 14, 2008.
- Orr, I. (2008). Introduction to Phylogenetic Analysis. Retrieved August 15, 2008.
- Tree of Life Web Project. (2008). Explore the Tree of Life. Retrieved August 15, 2008.
- Cooper, A. (2006). The Year of the Mammoth. Retrieved August 14, 2008.
- NCBI. (2008). National Center for Biotechnology Information. Retrieved August 14, 2008.
- European Molecular Biology Laboratory's European Biology Institute. (2008). ClustalW2. Retrieved November 13, 2008.
- Rambaut, A. (2018). How to read a phylogenetic tree. Tutorial Phylogenetics. Retrieved Agust 27, 2021.
Materials and Equipment
- Computer with high-speed connection to the Internet
- Lab notebook
Experimental Procedure
First, obtain the sequence information for woolly mammoth mitochondrial DNA.
- To download the mitochondrial DNA sequence for woolly mammoth mitochondrial DNA, go to the National Center of Biotechnology Information's (NCBI) database of nucleotide sequences.
- In the "Search" box, make sure that "Nucleotide" is selected from the drop-down menu.
- In the "Search" box, type "Mammuthus primigenius mitochondrion, complete genome."
- You will retrieve a page with the search results. If there is more than one result, look for the one that has the accession number MF770243.1. This number is a unique identifier for this record.
- Click on the link to the record for the accession number MF770243.1.
- Scan up and down the page for MF770243.1. It has a wealth of information, including the names of the researchers who submitted the sequence, the date of submission, the sequence of the proteins coded for by the mitochondrial DNA, and the complete mitochondrial DNA sequence (at the bottom of the page).
Now that you have the DNA, you can use it to search for similar sequences in a database of nucleotide sequences.
- Open a new window or tab in your internet browser and go to the NCBI main page, and click on the BLAST link in the "Popular Resources" list on the right.
- This page has information about how BLAST works, as well as links to various BLAST search tools.
The page describes the BLAST search tools, as follows: "The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide (or protein) sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences, as well as help identify members of gene families."
- Click on the box that says "Nucleotide BLAST" under the "Web BLAST" heading.
- Fill out the protein BLAST query form, which should look similar to Figure 3, below, when you are done.
- Enter the accession number MF770243.1 into the box labeled "Enter accession number, gi, or FASTA sequence."
- In the boxes that indicate "From" and "To," type in "1" for "From" and "3000" for "To." The first 3000 base pairs will be used as the query. BLASTing the entire 16,851 nucleotides slows the BLAST results and is not necessary for our purposes.
- In the "Job Title" box, type "Woolly Mammoth," or another name if you choose.
- For "Database," keep the default "Standard databases (nr etc.)" setting.
- Select "Nucleotide collection (nr/nt)" in the drop-down box. This database contains many nucleotide sequences including single copies (reference sequences) of the mitochondrial genomes for each species, which makes evolutionary analysis easier.
- For "Organism," type "Proboscidea (taxid:9779)." This will restrict your BLAST to the elephants and their extinct relatives. Be sure there is no space between the "taxid:" and "9779." Return to this point later if you want to BLAST against a wider array of sequences.
- In the "Entrez Query" field, write "complete" This will limit your search to complete genome sequences and excludes partial sequences.
- For "Program Selection, Optimize for," select "Highly similar sequences, megablast." (The alternatives are more appropriate when searching for distant similarities).
- Next to the BLAST button, check the box "Show results in a new window."
- Then click on the BLAST button to start the database search.
- Note: You can also get to the nucleotide BLAST page directly from the NCBI page for MF770243.1. On the right side under "Analyze this sequence," click on "Run BLAST." This will take you to the BLAST page where the accession number field is already filled out.

Screenshot of the BLAST query search page on the ncbi.nlm.nih.gov website. At the top of the BLAST query search page there is a text box where users can fill in search terms. Other options are available under the search box that allow for different databases to be searched and to limit searches through keywords or IDs.
Figure 3. BLAST query input page (NCBI, 2021.)
- Be patient. The search algorithm will take about a minute or so to complete the database search. The results page will resemble the one below in Figure 4.

Screenshot of the BLAST results page on the ncbi.nlm.nih.gov website. The results page in the BLAST tool on the NCBI webpage shows a list of nuceleotide sequences that match a search term. Results provide additional information such as the percentage match a result has to a specific query string.
Figure 4. BLAST results for woolly mammoth mitochondrial DNA. (NCBI, 2008.)
- On the top left of the BLAST results page you will find the summary section (blue in Figure 4), which provides information on different aspects of your search. On the top right there is a box that allows you to filter your results based on certain criteria (red in Figure 4). Below the top section, the BLAST results are shown (yellow in Figure 4). There are four different tabs called "Description," Graphic Summary," "Alignments," and "Taxonomy." Each tab presents the search results in a different way.
- The "Description" tab contains a summary table of hits found by BLAST and is the default tab shown.
- The "Graphic Summary" tab shows a color key of the alignments. The color key shows the degree of similarity for the sequences.
- The "Alignment" section contains the detailed pairwise alignments between query and database sequences.
- The "Taxonomy" section provides details of the taxonomic distribution of matches BLAST found.
- Look at the data table in the "Description" tab. The "Query Coverage" notes how much of the query was aligned with the hit in the database—for this science fair project, this should be close to 100%. The "Per. Ident." column gives the % identity between the woolly mammoth DNA and the other sequences found in the nucelotide database that matched the BLAST criteria.
- Looking more closely at the listed results in the description table, you will notice that there are multiple results for each of the different species. Each of these sequences is usually derived from a different sample of the same species or a different species isolate. You can ignore the sequences from isolates for the purpose of this science project.
- Unselect all the sequences in your results table by unchecking the "select all" box in the top left corner of the results table. Then select one representative sequence for each of your species of interest (Mammuthus jeffersonii, Mammuthus columbi, Mammuthus primigenius, Elephas maximus, Loxodonta Africana, Loxodonta cyclotis, and Mammut americanum) from your list and check its box. Make sure the chosen sequence result includes the name of the species and the words "mitochondrion" and "complete genome". For example, the sequence description could be "Elephas maximus mitochondrion, complete genome". If you don't find a sequence for each species, that is ok.
- Review and record the % identity values for each of your selected sequences. Based on this data, can you already tell which animal the woolly mammoth is most closely related to?
- With the chosen sequences still selected, click on the "Alignment" tab and and look at the alignments of the query and database sequences.
- The alignment is the heart of the BLAST output. It pairs each base in the query with its counterpart in the database sequence.
- Look for mismatches and gaps in the alignments to get an idea of the type of changes that occur.
- At the top of each alignment, there is information on the % identity and the number of mismatches between the query and the database sequence.
- Make a data table in your lab notebook of the actual number of changes between the mammoth DNA and the DNA of the other animals. For example, the data could say: "Identities = 2941/3000 (98%), Gaps = 3/3000 (0%)". Thus, there are 59 changes (3000 - 2941).
- Using the approximate times that these animals diverged from each other (from the Introduction section), and the actual number of differences in the DNA, you can estimate the rate at which the mitochondrial DNA acquires mutations. See the example below.
Equation 1:
Mutation rate ~ | Number of changes from BLAST report Number of nucleotides × Number of years |
Equation 2:
Mutation rate ~ | 60 3,000 × 6,700,000 |
- You can compare your result for the mutation rate with a published mutation rate in the paper by Nadin Roland and his colleagues, found in the Bibliography, above. You'll want to look for the link to table S3, further down on the webpage. The rate you calculate will not be identical to the published rate, due to differences in the methods and DNA regions used.
Next, use the BLAST output to generate a phylogenetic tree of your results. You will use your selected sequences to do that.
- Go back to the "Description" tab and make sure you still have your chosen sequences selected. Within the results table (on the top right), click on the link "Distance tree of results." This will create a distance tree for your selected results only. If you click on the "Distance tree of results" above the results table (on the left), the distance tree will include all results.
- Your distance tree should look similar to Figure 5. Keep the default values for the "Tree method" and "Max Seq Difference," but change the "Sequence Label" to "Taxonomic Name (Sequence ID)" using the drop-down list. If you like you can also try various formats.

Screenshot of a distance tree generated on the ncbi.nlm.nih.gov website. The BLAST tool on the NCBI webpage can generate distance trees that map relationship between proteins based on shared sequences. Arrows at the end of branches are color coded to match a key on the right side of the page that distinguish the species the protein originates from.
Figure 5. Tree of mammoth, mastodon, and modern elephant mitochondrial DNA. (NCBI, 2021.)
- Add times and mutations to the tree that you make. Discuss the degree of relatedness between the sequences, as shown in the tree.
- What does the phylogenetic tree tell you about how the different species within the Proboscidae group are related? If you need help reading the phylogenetic tree, you can view the Phylogenetics and Reading Phylogenetic Trees or the Phylogenetic analysis of pathogens video.

Ask an Expert
Variations
These variations suggest ways that you can extend your analysis of the woolly mammoth's mitochondrial genome.
- In the main experimental Procedure, you used a region of mitochondrial DNA to establish the family structure of the Proboscidea. Extend this analysis using mitochondrial proteins. Retrieve the record from NCBI for nucleotide sequence MF770243.1. Copy a protein sequence, for example for the protein "NADH dehydrogenase subunit 1," from the NCBI page for MF770243.1. Enter the sequence in the BLAST page, clicking this time on "Protein BLAST." Enter the other settings, as you did for the DNA BLAST. Compare the results of your protein BLAST vs. the DNA BLAST.
- In the paper by Rohland, referenced in the Bibliography, they excluded the D-loop region of the mitochondrial DNA because it accumulates mutations at a faster rate than the rest of the genome. Repeat the analysis with the D-loop excluded (the NCBI page for MF770243.1 indicates the D-loop boundaries).
- Extend the number of organisms that you are comparing with the mammoth. You can do this by selecting a larger set of animals in the "Organisms" box. (Mammoths are classed as follows: Eukaryota, Metazoa, Chordata, Craniata, Vertebrata, Euteleostomi, Mammalia, Eutheria, Afrotheria, Proboscidea).
- Download the DNA or protein sequences from the BLAST results and generate a tree using the tools at the European Bioinformatics Institute.
- Repeat this analysis with a set of primate genomes. As noted in the paper by Rohland, the rate at which mutations accumulate in the primate line is dramatically different than it is in the elephant line. The reason is unknown.
Careers
If you like this project, you might enjoy exploring these related careers: