Use DNA Sequencing to Trace the Blue Whale's Evolutionary Tree
|Areas of Science||
Genetics & Genomics
|Time Required||Average (6-10 days)|
|Prerequisites||You should be familiar with high school level genetics (DNA, protein, mutations).|
|Material Availability||Readily available.|
|Cost||Very Low (under $20)|
AbstractThe first land animals took their tentative steps out of the ocean and onto solid ground around 365 million years ago. Over millions of years, these early ancestors developed into tetrapods, including amphibians, reptiles, dinosaurs, birds, and mammals. Then, around 50 million years ago, the reverse process occurred: the mammalian ancestor of today's whales returned to the ocean. In this genomics science fair project, you will use mitochondrial protein sequencing to trace the evolution of whales and identify their closest living relatives that still have four legs.
In this genomics science fair project, you will trace the blue whale's family tree using genomic sequences in the GenBank database and the BLAST search tool.
David Whyte, PhD, Science Buddies
Cite This PageGeneral citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.
Last edit date: 2020-11-20
The evolutionary story of the whales is dramatic. An ancient land mammal returned to the sea, about 50 million years ago, to become the forerunner of today's whales. In doing so, it lost its legs, and all of its internal organs became adapted to a marine existence. This ocean invasion by mammals is the reverse of what happened millions of years previously, when the first animals crawled out of the sea onto land.
Figure 1. The blue whale (Balaenoptera musculus) is a marine mammal belonging to the suborder of baleen whales (called Mysticeti). At up to 110 feet in length, it is believed to be the largest animal ever to have existed. (Wikipedia, 2008.)
What did the first whales look like, and what gave rise to them? For a long time, scientists could only speculate, for the oldest fossils anyone knew of had already assumed the basic appearance of whales. Charles Darwin speculated in the first edition of The Origin of Species that bears might be the precursors of whales: "In North America the black bear was seen ... swimming for hours with widely open mouth, thus catching, like a whale, insects in the water. Even in so extreme a case as this, if the supply of insects were constant, and if better adapted competitors did not already exist in the country, I can see no difficulty in a race of bears being rendered, by natural selection, more and more aquatic in their structure and habits, with larger and larger mouths, till a creature was produced as monstrous as a whale."
Today we have much better information about the evolution of whales than was available to Darwin. The new information comes primarily from two sources. The first source is an abundance of intermediate fossils that have been uncovered over the past two decades that allow paleontologists to trace the development of modern whales, step by step, back to their beginnings early in the Eocene epoch. This time period is often referred to as the dawn of the age of mammals, and lasted from about 55 million to 34 million years ago. The second source of new insights into whale evolution comes from analysis of DNA samples.
DNA acquires mutations over time. The longer two species have been separated, the greater will be the number of different mutations that will accumulate in each species' genome. A phylogenetic tree can be built based on the accumulated changes between genes in different organisms: fewer genetic changes implies a relatively recent divergence. Based on pair-wise comparisons of a set of genes from different organisms, it is possible to construct a tree that reflects the genetic distances between the organisms. This sort of tree-building was done in the past, solely by comparing the bones, anatomy, behavior, etc., of many animals. With genomics tools, it can be done online from your computer, and it can include animals that don't even have bones, provided there is sequence data available.
The focus of this genomics science fair project is to explore the evolutionary tree of the whales using DNA sequences available in GenBank, a free database of sequences derived from hundreds of different organisms. Since new sequences are always being added, the tree you make will reflect the newest genomic information available.
Although this is a genomics science fair project, it is also about whales. To build the tree of the whale family, you will need to spend some time getting familiar with the scientific names and key features of about 25 whale and dolphin species. You will also learn about the blue whale's closest relatives that did not make the transition to a marine habitat.
The process you will use to generate an evolutionary tree for the whales is outlined as follows:
Obtain the protein sequence for a whale mitochondrial protein.
- A mitochondrial sequence is very useful for studying evolution because it does not recombine and it accumulates mutations at a faster rate than chromosomal DNA.
For the purposes of this science fair project, you will use the cytochrome c oxidase 1 protein (cox1) from the blue whale. This will be the query sequence.
- Other genes or other whales can be used as variations on this science fair project.
- Use the bioinformatics search tool, BLAST, to identify sequences from other organisms that are related to the query sequence. BLAST will find genes that are related to the query. The more related the genes are to each other, the closer the organisms they are derived from are to each other on the evolutionary tree.
- Use a simple tree-building tool to generate an evolutionary tree based on the BLAST output. There are more-sophisticated tools available, but the BLAST tool will suffice for this science fair project.
The key issues the tree will address are:
- What are the nearest relatives of the blue whale, determined by DNA analysis?
- What is the identity of the closest blue whale relative that still has four legs? (It is not a bear!)
The tools involved in this science fair project are simple to use and very powerful. The evolutionary trees that you make from your BLAST analysis will be based on molecular data for hundreds of species derived from samples collected from all over the globe. In the short time it takes to build a BLAST tree, you can see evidence for relationships between animals that would be difficult or impossible to obtain by traditional methods that are based on fossils.
Terms and Concepts
To start this genomics science fair project, learn about whale evolution using the Internet and your local library. Also, go to the National Center for Biotechnology Information (NCBI) website and explore the databases and tools available. You will be using BLAST to search the reference protein database.
- Eocene epoch
- Phylogenetic tree
- Pair-wise comparisons
- Evolutionary tree
- Cytochrome c oxidase 1 protein (cox1)
- Query sequence
- Accession number
- Base pair
- Mitochondrial DNA
- Mutation rate
- What are the key characteristics of mammals?
- What adaptations occurred in the whale's anatomy as it evolved from a land-based to a marine mammal?
- Make a timeline of mammalian evolution, focusing on cetaceans.
- Why use mitochondrial genomes for evolutionary analysis?
- What does the acronym BLAST stand for?
These websites discuss the current data and hypotheses about whale evolution:
- The Thewissen Lab. (2008). Whale Origins Research. Retrieved November 5, 2008.
- KQED. (2001). Whale evolution. Retrieved August 18, 2008.
- MarineBio.org. (2008). Blue Whale. Retrieved August 18, 2008.
- Wikipedia Contributors. (2008). Blue Whale. Retrieved August 18, 2008.
- National Geographic Society. (2001). Evolution of Whales. Retrieved August 18, 2008.
These websites are useful resources for understanding how DNA can be used to build evolutionary trees and the bioinformatics tools used to do this:
- National Center for Biotechnology Information. (2008). Basic Local Alignment Search Tool. Retrieved August 18, 2008.
- National Center for Biotechnology Information. (2008). Welcome to NCBI. Retrieved August 18, 2008.
- National Center for Biotechnology Information. (2004, April 1). Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources. Retrieved November 6, 2008.
- European Molecular Biology Laboratory's European Biology Institute. (2008). ClustalW2. Retrieved November 13, 2008.
Materials and Equipment
- Computer with high-speed Internet connection
- Lab notebook
To begin, obtain the query sequence, the cytochrome c oxidase subunit 1 protein, from the blue whale.
- Open the NCBI home page at https://www.ncbi.nlm.nih.gov/.
- Select "Genome" in the drop-down menu for "Search."
- Type in "blue whale" in the "Search for" box.
- The resulting page will list documents in the Genomes database that contain "blue whale."
- Click on the link for NC_001601. NC_001601 is the accession number for the GenBank page for blue whale mitochondrial DNA.
- Look at the page for NC_001601. Note that the mitochondrial genome is 16,402 base pairs long and contains 13 genes. You will use the protein sequence from these genes to establish the evolutionary tree.
- Click on the "13" after "Protein coding" in the second row, second column to retrieve records for the blue whale mitochondrial proteins.
- The resulting page lists the 13 proteins. Look for "cytochrome c oxidase subunit 1 (cox1)."
Note the gi number, 5834998, for the protein sequence of cytochrome c oxidase subunit 1 in your lab notebook. Click on the link for cox1 under "Product Name" to see the page describing this protein. This is the query that you will use.
- The gi number and the accession number NP_007058 can be used interchangeably to identify the protein sequence.
Now that you have the query sequence, open the BLAST page, following the instructions below. BLAST is a program that compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences.
In the main NCBI page, click on the link for BLAST
- The main NCBI page is at https://www.ncbi.nlm.nih.gov/.
- There are several versions of BLAST. Since you want to use a protein sequence to search a protein database, click on "protein BLAST" under "Basic BLAST."
Enter the gi number for cox1 from the blue whale, 5834998, in the box that reads "Enter accession number, gi, or FASTA sequence."
- You can also paste the accession number, NP_007058, or paste the actual protein sequence into the box.
- In the "Job title" box, type "Blue whale cox1 protein."
- In the "Database" box, choose "Reference proteins." The reference proteins are non-redundant, so there is just one cox1 sequence for each species.
- Under "Algorithm," click on "blastp" (which is protein vs. protein).
Click the "Algorithm parameters" link under the blue BLAST button and select "50" for the "Max target sequences."
- This is the number of "hits" to list from the database.
- Use the default matrix BLOSUM62. See http://www.ebi.ac.uk/help/matrix.html for more about matrix choices.
- At the bottom of the page, click "show results in a new window."
- Click on BLAST to start the search. Be patient. It will take a minute or two for the BLAST search to finish.
- The BLAST results page has a color-coded alignment key. The red lines indicate a high level of similarity.
The BLAST results on the NCBI website show stacks of horizontal red bars indicating a very high alignment score for many related protein sequences in a mitochondrial genome. A high alignment score means that many other species have a high chance of sharing a similar protein sequence for the genome tested.
Figure 2. Alignment key from BLAST search. (NCBI, 2008.)
- The BLAST output also has alignments of the protein sequences, such as the one below:
The BLAST alignment program on the NCBI website returns protein sequences that closely match the search query sequence. The query results include a score, method of matching, indentities checked, positive hits, and any gaps. Some amino acids in a protein sequence can be replaced by similar amino acids, and are marked by a plus sign in the sequences that are returned.
Figure 3. Protein alignment from BLAST search. (NCBI, 2008.)
- Look at the BLAST page. The first hit will be the query sequence against itself, with 100% identity.
- Note the "% identity" listed for each alignment. If the "% identity" between two species is 97%, then these two species differ by 3% in the protein sequence. Remember, the larger the % difference, the more distant they are in the family tree.
- The output has a nice feature that groups sets of proteins that have identical results. This simplifies the analysis.
- Make a data table listing the scientific classification of each species (order, sub-order, family, sub-family, genus, species), its common name (you will need to look this up), the % identity, and the actual number of protein sequence changes (for example, "Identities = 493/512" means there are 19 changes between the sequences) in your lab notebook.
Your BLAST results provide the data needed to determine which species are most related to the blue whale. The next step is to visualize the data as a tree.
To generate a tree, click on "Select all" (a button that appears after the "Sequence Producing Significant Alignments" list) and then "Distance tree of results." This includes all of the hits from the BLAST search in the tree. To simplify the tree, you can select individual species by clicking on the box next to the alignment.
For "Sequence label," select "taxonomic name" to simplify the names on the tree.
Click on the "Force" tab above the tree. Figure 4, below, shows a tree made with the "Force" option. This selection gives a curved tree.
- Try the rectangle, slanted, and radial tabs to view the tree in different formats.
- Click on the "Force" tab above the tree. Figure 4, below, shows a tree made with the "Force" option. This selection gives a curved tree.
- The three main branches have been circled to draw attention to the groupings (this is not part of the BLAST output).
- For "Sequence label," select "taxonomic name" to simplify the names on the tree.
Figure 4. Tree showing relationship of cox1 proteins from species related to the blue whale. (NCBI, 2008.)
- The program has clustered some of the species together (leaves). Expand each cluster of leaves to identify the individual members. See Figure 5, below.
Figure 5. Expansion of the whales and dolphin cluster with eight leaves. (NCBI, 2008.)
Make a figure showing the tree, with the % differences of the sequences added.
- You might want to re-draw the trees, using a computer graphics program or by hand, to enlarge the text and to add pictures of the animals.
- Repeat this for the sub-branches.
- What are the blue whales' nearest "DNA cousins"? How does your result compare to traditional trees made prior to DNA analysis? What are the blue whale's nearest relatives that are not whales or dolphins?
- Based on your data, you are in a position to estimate the time of divergence of the blue whale from the other species listed in your BLAST table. If you assume that the blue whale split from the fin whale 10.3 million years ago (see Whale Origins, also referenced in the Bibliography), how many mutations occurred in the cox1 protein per million years? Use this data to determine the time of divergence for the other species in the BLAST table, assuming the mutation rate is constant.
If you like this project, you might enjoy exploring these related careers:
Repeat the procedure with a different protein sequence from the blue whale mitochondrial genome.
- Do you get the same tree? Is the apparent mutation rate the same for different proteins?
- Repeat the procedure using the DNA sequence for the genes. This allows you to use genetic changes that occurred in the DNA, but that did not affect the protein sequence.
- Increase the number of BLAST hits to include species that are less related (Increase "Max target sequences" under "Algorithm parameters").
- What is the % difference between blue whale and Homo sapiens' cox1 protein sequences?
- Living cetaceans are subdivided into two highly distinct suborders, Odontoceti (the echolocating toothed whales) and Mysticeti (the filter-feeding baleen whales). Are there specific mutations that distinguish the Odontoceti from the Mysticeti in your BLASTs?
- Download the protein sequences of the BLAST hits using the "Get selected sequences" button, then use tools at the European Bioinformatics Institute (EBI), such as Clustalw, to make an evolutionary tree. ClustalW compares every sequence to all of the others.
Recent Feedback Submissions
|Sort by Date||Sort by User Name|
What was the most important thing you learned?
You evolutionist scumbags.
What problems did you encounter?
Can you suggest any improvements or ideas?
Science Buddies materials are free for everyone to use, thanks to the support of our sponsors. What would you tell our sponsors about how Science Buddies helped you with your project?
With every step.
Overall, how would you rate the quality of this project?
What is your enthusiasm for science after doing your project?
Compared to a typical science class, please tell us how much you learned doing this project.
|Do you agree?||Report Inappropriate Comment|
Ask an ExpertThe Ask an Expert Forum is intended to be a place where students can go to find answers to science questions that they have been unable to find using other resources. If you have specific questions about your science fair project or science fair, our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.
Ask an Expert
Looking for more science fun?
Try one of our science activities for quick, anytime science explorations. The perfect thing to liven up a rainy day, school vacation, or moment of boredom.Find an Activity
Explore Our Science Videos
DIY Glitter Surprise Package with a Simple Circuit
How to Make Elephant Toothpaste
Flower Dissection - STEM Activity