Jump to main content

Use DNA Sequencing to Trace the Blue Whale's Evolutionary Tree

39 reviews


The first land animals took their tentative steps out of the ocean and onto solid ground around 365 million years ago. Over millions of years, these early ancestors developed into tetrapods, including amphibians, reptiles, dinosaurs, birds, and mammals. Then, around 50 million years ago, the reverse process occurred: the mammalian ancestor of today's whales returned to the ocean. In this genomics science fair project, you will use mitochondrial protein sequencing to trace the evolution of whales and identify their closest living relatives that still have four legs.


Areas of Science
Time Required
Average (6-10 days)
You should be familiar with high school level genetics (DNA, protein, mutations).
Material Availability
Readily available.
Very Low (under $20)
No issues

David Whyte, PhD, Science Buddies
Edited: Svenja Lohner, PhD, Science Buddies


In this genomics science fair project, you will trace the blue whale's family tree using genomic sequences in the GenBank database and the BLAST search tool.


The evolutionary story of the whales is dramatic. An ancient land mammal returned to the sea, about 50 million years ago, to become the forerunner of today's whales. In doing so, it lost its legs, and all of its internal organs became adapted to a marine existence. This ocean invasion by mammals is the reverse of what happened millions of years previously, when the first animals crawled out of the sea onto land.

Aerial photo of a blue whale breaching
Figure 1. The blue whale (Balaenoptera musculus) is a marine mammal belonging to the suborder of baleen whales (called Mysticeti). At up to 110 feet (about 34 meters) in length, it is believed to be the largest animal ever to have existed. (Wikipedia, 2008.)

What did the first whales look like, and what gave rise to them? For a long time, scientists could only speculate, for the oldest fossils anyone knew of had already assumed the basic appearance of whales. Charles Darwin speculated in the first edition of The Origin of Species that bears might be the precursors of whales: "In North America the black bear was seen ... swimming for hours with widely open mouth, thus catching, like a whale, insects in the water. Even in so extreme a case as this, if the supply of insects were constant, and if better adapted competitors did not already exist in the country, I can see no difficulty in a race of bears being rendered, by natural selection, more and more aquatic in their structure and habits, with larger and larger mouths, till a creature was produced as monstrous as a whale."

Today we have much better information about the evolution of whales than was available to Darwin. The new information comes primarily from two sources. The first source is an abundance of intermediate fossils that have been uncovered over the past two decades that allow paleontologists to trace the development of modern whales, step by step, back to their beginnings early in the Eocene epoch. This time period is often referred to as the dawn of the age of mammals, and lasted from about 55 million to 34 million years ago. The second source of new insights into whale evolution comes from analysis of DNA samples.

DNA acquires mutations over time. The longer two species have been separated, the greater will be the number of different mutations that will accumulate in each species' genome. A phylogenetic tree can be built based on the accumulated changes between genes in different organisms: fewer genetic changes implies a relatively recent divergence. Based on pair-wise comparisons of a set of genes from different organisms, it is possible to construct a tree that reflects the genetic distances between the organisms. This sort of tree-building was done in the past, solely by comparing the bones, anatomy, behavior, etc., of many animals. With genomics tools, it can be done online from your computer, and it can include animals that don't even have bones, provided there is DNA sequence data available. You can watch the two videos below to learn more about phylogenetics and reading phylogenetic trees.

Phylogenetics and Reading Phylogenetic Trees
Phylogenetic analysis of pathogens (lecture - part 1)

The focus of this genomics science fair project is to explore the evolutionary tree of the whales using DNA sequences available in GenBank, a free database of sequences derived from hundreds of different organisms. Since new sequences are always being added, the tree you make will reflect the newest genomic information available.

Although this is a genomics science fair project, it is also about whales. To build the tree of the whale family, you will need to spend some time getting familiar with the scientific names and key features of about 25 whale and dolphin species. You will also learn about the blue whale's closest relatives that did not make the transition to a marine habitat.

The process you will use to generate an evolutionary tree for the whales is outlined as follows:

  1. Obtain the protein sequence for a whale mitochondrial protein.
    1. A mitochondrial sequence is very useful for studying evolution because it does not recombine and it accumulates mutations at a faster rate than chromosomal DNA.
    2. For the purposes of this science fair project, you will use the cytochrome c oxidase 1 protein (cox1) from the blue whale. This will be the query sequence.
      • Other genes or other whales can be used as variations on this science fair project.
  2. Use the bioinformatics search tool, BLAST, to identify sequences from other organisms that are related to the query sequence. BLAST will find genes that are related to the query. The more related the genes are to each other, the closer the organisms they are derived from are to each other on the evolutionary tree.
  3. Use a simple tree-building tool to generate an evolutionary tree based on the BLAST output. There are more-sophisticated tools available, but the BLAST tool will suffice for this science fair project.
  4. The key issues the tree will address are:
    1. What are the nearest relatives of the blue whale, determined by DNA analysis?
    2. What is the identity of the closest blue whale relative that still has four legs? (It is not a bear!)

The tools involved in this science fair project are simple to use and very powerful. The evolutionary trees that you make from your BLAST analysis will be based on molecular data for hundreds of species derived from samples collected from all over the globe. In the short time it takes to build a BLAST tree, you can see evidence for relationships between animals that would be difficult or impossible to obtain by traditional methods that are based on fossils.

Terms and Concepts

To start this genomics science fair project, learn about whale evolution using the Internet and your local library. Also, go to the National Center for Biotechnology Information (NCBI) website and explore the databases and tools available. You will be using BLAST to search the reference protein database.



These websites discuss the current data and hypotheses about whale evolution:

These websites are useful resources for understanding how DNA can be used to build evolutionary trees and the bioinformatics tools used to do this:

Materials and Equipment

Experimental Procedure

Before you start with this project, it might be helpful to familiarize yourself with the bioinformatic tools and websites that you are going to use. You can watch the two videos below to learn more about the BLAST tool and the NCBI website and databases.

How to Use BLAST for Finding and Aligning DNA or Protein Sequences
How to Use the NCBI’s Bioinformatics Tools and Databases

To begin, obtain the query sequence, the cytochrome c oxidase subunit 1 protein, from the blue whale.

  1. Open the NCBI home page.
  2. In the "Search" box, make sure that "Nucleotide" is selected from the drop-down menu.
  3. Type in "blue whale mitochondrial complete genome" in the "Search for" box.
  4. The resulting page will list documents in the Nucleotide database that contain "blue whale" or its scientific name Balaenoptera musculus.
  5. If there is more than one result, look for the one that has the accession number NC_001601.1. This number is a unique GenBank identifier for this record.
  6. Click on the link to the record for the accession number NC_001601.1. Scan up and down the page for NC_001601.1. It has a wealth of information, including the names of the researchers who submitted the sequence, the date of submission, the sequence of the proteins coded for by the mitochondrial DNA, and the complete mitochondrial DNA sequence (at the bottom of the page).
  7. Note that the mitochondrial genome is 16,402 base pairs long and contains 13 genes. You can find information for each gene in the "Features" section (labeled as "gene"). Right underneath the gene information, you will find information on the protein the gene encodes (labeled as "CDS"). You will use the protein sequence from one of these genes to establish the evolutionary tree.
  8. On the page in the "Features" section, look for the CDS that says "/product="cytochrome c oxidase subunit I." This entry should list the protein sequence of the cytochrome c oxidase subunit 1. The protein sequence is the string of capitalized letters right after "/translation=." Alternatively, you can click on the protein_id link "NP_007058.1" to see the page describing the protein. There you will also find the protein sequence at the very bottom of the page. The protein sequence is the query that you will use for your search.

Now that you have the query sequence, open the BLAST page, following the instructions below. BLAST is a program that compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences.

  1. Open a new window or tab in your internet browser and go to the NCBI main page, and click on the BLAST link in the "Popular Resources" list on the right.
  2. There are several versions of BLAST. Since you want to use a protein sequence to search a protein database, click on "Protein BLAST" under the "Web BLAST" heading.
  3. Fill out the protein BLAST query form.
    1. Copy and paste the cytochrome c oxidase subunit I protein sequence into the box labeled "Enter accession number, gi, or FASTA sequence". You can also paste the protein accession number, NP_007058, into the box.
    2. In the "Job title" box, type "Blue whale cox1 protein."
    3. In the "Database" box, choose "Reference proteins (refseq_protein)" from the drop-down menu. The reference proteins are non-redundant, so there is just one cox1 sequence for each species.
    4. Under "Algorithm," click on "blastp" (which is protein vs. protein).
    5. Keep all the other search parameters including the Algorithm parameters at their default settings.
    6. Next to the BLAST button at the bottom of the page, check the box "show results in a new window."
  4. Click on BLAST to start the search.
  5. Be patient. It will take a minute or two for the BLAST search to finish. The results page will resemble the one in Figure 2.
Results from a BLAST search generates a result list of matching accessions

Screenshot of the BLAST results page on the ncbi.nlm.nih.gov website. The results page in the BLAST tool on the NCBI webpage shows a list of protein sequences that match the search term. Results provide additional information such as the percentage match a result has to a specific query string.

Figure 2. BLAST results for cytochrome c oxidase subunit 1 protein from the blue whale. (NCBI, 2021.)
  1. On the top left of the BLAST results page you will find the summary section (blue in Figure 2), which provides information on different aspects of your search. On the top right there is a box that allows you to filter your results based on certain criteria (red in Figure 2). Below the top section, the BLAST results are shown (yellow in Figure 2). There are four different tabs called "Description," Graphic Summary," "Alignments," and "Taxonomy." Each tab presents the search results in a different way.
    1. The "Description" tab contains a summary table of hits found by BLAST and is the default tab shown.
    2. The "Graphic Summary" tab shows a color key of the alignments. The color key shows the degree of similarity for the sequences.
    3. The "Alignment" section contains the detailed pairwise alignments between query and database sequences.
    4. The "Taxonomy" section provides details of the taxonomic distribution of matches BLAST found.
  2. Click on the "Alignment" tab and look at the result page, which should look similar to the one in Figure 3. The first result will be the query sequence against itself, with 100% identity.
    1. Note the "% identity" listed for each alignment. If the "% identity" between two species is 97%, then these two species differ by 3% in the protein sequence. Remember, the larger the % difference, the more distant they are in the family tree.
  3. Pick 30 BLAST results (10 from the top, 10 from the middle, and 10 from the bottom of the results list) and make a data table listing the scientific classification of each species (order, sub-order, family, sub-family, genus, species), its common name (you will need to look this up), the % identity, and the actual number of protein sequence changes (for example, "Identities = 493/512" means there are 19 changes between the sequences) in your lab notebook.
    1. In the "Description" tab, you can uncheck all the results by clicking on the "select all" box in the top left corner of the BLAST results table. Then you can check the individual sequences you want to select for your analysis and look at their alignments in the "Alignments" tab or get more information on the taxonomy of the individual species in the "Taxonomy" tab. Go through each of the results tabs to find the information you need.
    2. What do you notice about your data as you go further down the BLAST result page?
Screenshot of the results page of the 'Alignment' tab in the BLAST tool on the ncbi.nlm.nih.gov website shows the detailed alignment of the query and a database sequences. Two rows of letters on top of each other represent the two sequences. A vertical line between the two sequences show where amino acids/letters match. The amino acids/letters are not connected when there is a mismatch between the sequences.
Figure 3. A detailed view of two aligned sequences as shown in the "Alignment" tab of the BLAST output page. (NCBI, 2021.)

Your BLAST results provide the data needed to determine which species are most related to the blue whale. The next step is to visualize the data as a tree.

  1. To generate a tree, click on the "Distance tree of results" link next to "Other reports" above the results table in the summary section of your BLAST report. This includes all of the hits from the BLAST search in the tree. To simplify the tree, you can also select a subset of sequences in the "Description" tab and then click on the link "Dinstance tree of results" within the results table. This tree will only include the hits from your selected sequences. The tree should look similar to the one in Figure 4.
    1. Use the drop-down list under "Sequence label" and select "taxonomic name" to simplify the names on the tree.
    2. Keep the other tree parameters at their default settings.
  2. The program has clustered some of the species together (leaves), which is indicatied by a green triangle. Hover over the tip of the triangles with your curser and expand each cluster of leaves to identify the individual members. You might need to zoom into the tree using the slider in the top toolbar of the tree to be able to read the taxonomic names of each leave.
Tree showing relationship of cox1 proteins from species related to the blue whale

A line diagram shows the relationship of cox1 proteins from species related to the blue whale. Each line represents a branch within the tree. At the tips of the lines the species names are written.

Figure 4. Tree showing relationship of cox1 proteins from species related to the blue whale. (NCBI, 2021.)
  1. Create a simplified distance tree with just the 30 sequences you selected for your data table in step 16. In the "Description" tab, select your 30 sequences and then click on "Distance tree of results" within the BLAST results table. Your tree will only include the 30 selected sequences.
    1. Compare the tree with all the BLAST results and your simplified tree. How are they similar or different? Which branches of the tree are missing? Which species are within the missing branches?
  2. Make a figure showing your simplified tree, with the data in your data table (% differences of the sequences) added.
    1. You might want to re-draw the tree, using a computer graphics program or by hand, to enlarge the text and to add pictures of the animals.
  3. Looking at the tree that includes the full set of your BLAST results, what are the blue whales' nearest "DNA cousins"? How does your result compare to traditional trees made prior to DNA analysis? What are the blue whale's nearest relatives that are not whales or dolphins? If you need help reading the phylogenetic tree, you can view the Phylogenetics and Reading Phylogenetic Trees or the Phylogenetic analysis of pathogens video.
  4. Based on your data, you are in a position to estimate the time of divergence of the blue whale from the other species listed in your BLAST table. If you assume that the blue whale (Balaenoptera musculus) split from the fin whale (Balaenoptera physalus) 10.3 million years ago (see Whale Origins, also referenced in the Bibliography), how many mutations occurred in the cox1 protein per million years? Use this data to determine the time of divergence for the other species in the BLAST table, assuming the mutation rate is constant.
icon scientific method

Ask an Expert

Do you have specific questions about your science project? Our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.


  • Repeat the procedure with a different protein sequence from the blue whale mitochondrial genome.
    • Do you get the same tree? Is the apparent mutation rate the same for different proteins?
  • Repeat the procedure using the DNA sequence for the genes. This allows you to use genetic changes that occurred in the DNA, but that did not affect the protein sequence.
  • Increase the number of BLAST hits to include species that are less related (Increase "Max target sequences" under "Algorithm parameters").
  • What is the % difference between blue whale and Homo sapiens' cox1 protein sequences?
  • Living cetaceans are subdivided into two highly distinct suborders, Odontoceti (the echolocating toothed whales) and Mysticeti (the filter-feeding baleen whales). Are there specific mutations that distinguish the Odontoceti from the Mysticeti in your BLASTs?
  • Download the protein sequences of the BLAST hits using the "Get selected sequences" button, then use tools at the European Bioinformatics Institute (EBI), such as Clustalw, to make an evolutionary tree. ClustalW compares every sequence to all of the others.


If you like this project, you might enjoy exploring these related careers:

Career Profile
Many aspects of peoples' daily lives can be summarized using data, from what is the most popular new video game to where people like to go for a summer vacation. Data scientists (sometimes called data analysts) are experts at organizing and analyzing large sets of data (often called "big data"). By doing this, data scientists make conclusions that help other people or companies. For example, data scientists could help a video game company make a more profitable video game based on players'… Read more
Career Profile
The human body can be viewed as a machine made up of complex processes. Scientists are working on figuring out how these processes work and on sequencing and correlating the sections of the genome that correspond to the individual processes. (The genome is an organism's complete set of genetic material.) In the course of doing so, they generate large amounts of data. So large, in fact, that to make sense of it, the data must be organized into databases and labeled. This is where bioinformatics… Read more
Career Profile
Ever wondered what wild animals do all day, where a certain species lives, or how to make sure a species doesn't go extinct? Zoologists and wildlife biologists tackle all these questions. They study the behaviors and habitats of wild animals, while also working to maintain healthy populations, both in the wild and in captivity. Read more
Career Profile
Growing, aging, digesting—all of these are examples of chemical processes performed by living organisms. Biochemists study how these types of chemical actions happen in cells and tissues, and monitor what effects new substances, like food additives and medicines, have on living organisms. Read more

News Feed on This Topic

, ,

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Science Buddies Staff. "Use DNA Sequencing to Trace the Blue Whale's Evolutionary Tree." Science Buddies, 30 June 2023, https://www.sciencebuddies.org/science-fair-projects/project-ideas/Genom_p017/genetics-genomics/dna-sequencing-blue-whale-evolutionary-tree?from=Blog. Accessed 26 Sep. 2023.

APA Style

Science Buddies Staff. (2023, June 30). Use DNA Sequencing to Trace the Blue Whale's Evolutionary Tree. Retrieved from https://www.sciencebuddies.org/science-fair-projects/project-ideas/Genom_p017/genetics-genomics/dna-sequencing-blue-whale-evolutionary-tree?from=Blog

Last edit date: 2023-06-30
Free science fair projects.