Neanderthals, Orangutans, Lemurs, & You—It's a Primate Family Reunion!
|Areas of Science||
Genetics & Genomics
|Time Required||Average (6-10 days)|
|Prerequisites||High school genetics|
|Material Availability||Readily available|
|Cost||Low ($20 - $50)|
AbstractYou have probably seen figures showing how human beings are related to chimpanzees, gorillas, and other primates. In this genomics science fair project, you will use bioinformatics tools to generate your own primate family tree.
The objective of this genomics science fair project is to build a family tree for the primates, including Homo sapiens, based on the similarity of mitochondrial proteins. The method involves BLASTing a mitochondrial protein against a sequence database and analyzing the percent identity between hits.
David Whyte, PhD, Science Buddies
Cite This PageGeneral citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.
Last edit date: 2020-06-23
"Nothing in biology makes sense except in the light of evolution." This quote is from a 1973 essay by the famous geneticist Theodosius Dobzhansky. One of the key issues in biology is how we, as a species, are related to other life forms. Based on anatomy and physiology, biologists have argued for decades that human beings are closely related to chimpanzees, with close, but somewhat more distant, relatedness to other primates, such as gorillas and orangutans.
Despite the best efforts by generations of biologists, there are still many skeptics concerning our place on the primate family tree. In this genomics science fair project, you will make up your own mind by using raw protein sequence data and simple software tools to build a tree depicting Homo sapiens and our nearest non-human relatives.
How can you use a protein sequence to build a family tree? When two populations stop interbreeding, the genes in each population begin to accumulate mutations that are specific to each population. The longer the two populations are separated, the greater the number of genetic changes between the two populations. You can directly measure these genetic changes by comparing the sequences of particular proteins. If a protein mutates at a rate of 1 percent change per million years (say five amino acids change in a 500-amino-acid protein), then that protein in two species that split from a common ancestor 1 million years ago will be roughly 2 percent different. Why 2 percent? Because each species has accumulated 1 percent genetic change, independently of the other, resulting in 2 percent difference overall. Assuming a constant rate of mutation, a tree can be built showing the relationship of different species based on the observed genetic changes.
To start this science fair project, you will go to the website for the National Center for Biotechnology Information (NCBI), a part of the U.S. National Institutes of Health (NIH). At the NCBI website, you can look up the sequence of the protein you will use as a BLAST query. BLAST is a bioinformatics tool that allows the user to search a database of proteins for sequences that are similar to the query sequence. The query sequence is the sequence used to search the database.
You will also use your BLAST results to make a rough estimate of the rate of mutation by looking at the percent difference between a human protein and the same protein in different primates. Estimating the rate at which mutations accumulate is a way of looking at evolution in action.
You will use a protein sequence from the mitochondrial genome as a query. Mitochondrial protein sequences have the advantage that they are available for many more species (including Neanderthal) than chromosomal proteins are.
The BLAST output has three regions: a color diagram showing the alignments between the query and the database hits, a list of database hits, and the actual alignments of the query sequence with the database sequences. The figures below give more information about each of these outputs.
Once you have your BLAST results, you will make a table of the results, including the scientific names and common names of the species, the percent identity between the query sequence and the database hit, and the amount of time since Homo sapiens split from the other species. Based on this table, you can calculate a rough rate of mutation.
The BLAST page also has a tool for creating a tree of the BLAST results. You will use this tree to generate a figure showing how human beings are related to other primates. The numbers you'll see on the tree are estimates of the time since the various lines split, in millions of years ago (mya).
The BLAST results on the NCBI website show stacks of horizontal red bars indicating a very high alignment score for many related protein sequences in a mitochondrial genome. A high alignment score means that many other species have a high chance of sharing a similar protein sequence for the genome tested.
Figure 1. Color key for the alignments. (NCBI, 2008.)
A list of proteins that match our queried protein are returned in a list from the BLAST program on the NCBI website. Each protein has an associated species that carries the protein as well as a score that ranks how likely the proteins are to be a match.
Figure 2. List of BLAST hits. These are the proteins in the databse that are most related to the query protein. A high "Score" value and a low "E" value both indicate a high degree of similarity. (NCBI, 2008.)
The BLAST alignment program on the NCBI website returns protein sequences that closely match the search query sequence. The query results include a score, method of matching, indentities checked, positive hits, and any gaps. Some amino acids in a protein sequence can be replaced by similar amino acids, and are marked by a plus sign in the sequences that are returned.
Figure 3. An example of a BLAST alignment. BLAST aligns the query sequence with the database sequence. A blank indicates a mismatch. A plus sign, "+," indicates a conservative substitution (the amino acid is replaced by an amino acid with similar chemical characteristics). (NCBI, 2008.)
Terms and Concepts
- Amino acid
- Mitochondrial genome
- Mitochondrial protein sequence
- Chromosomal protein
- Conservative substitution
- Accession number
- What is the definition of species?
- What are the key concepts in Darwin's theory of evolution?
- What causes mutations?
- Look up the one-letter code for protein sequences. In this code, each of the 20 common amino acids is identified by a unique letter. This is the code used in the BLAST output. What letters are not used in the code?
- Dobzhansky, T. (1973, March). "Nothing in Biology Makes Sense Except in the Light of Evolution." The American Biology Teacher. Vol. 35:125-129. Used by permission of National Association of Biology Teachers: http://www.nabt.org. Retrieved September 25, 2008, from http://www.pbs.org/wgbh/evolution/library/10/2/l_102_01.html
- The Tree of Life Web Project. (1999). Primates. Retrieved September 25, 2008.
- National Center for Biotechnology Information. (2008). BLAST Guide and Tutorial. Retrieved October 1, 2008.
- Dawkins, Richard. The Ancestors Tale. New York: Houghton Mifflin Company, 2004.
- PBS. (2001). Evolving Ideas: Did Humans Evolve? Retrieved September 25, 2008, from http://www.pbs.org/wgbh/evolution/library/11/2/e_s_5.html
- European Molecular Biology Laboratory's European Biology Institute. (2008). ClustalW2. Retrieved September 25, 2008, from http://www.ebi.ac.uk/Tools/clustalw2/index.html
News Feed on This Topic
Materials and Equipment
- Computer with Internet access
- Before you start this science fair project, you should be familiar with the terms and concepts listed in the "Terms, Concepts, and Questions" area, above.
Direct your browser to the NCBI website, at https://www.ncbi.nlm.nih.gov/.
- The NCBI website has links to all sorts of information, including a database of scientific articles related to medicine and biology (PubMed), and databases of protein and nucleic acid sequences.
- In the "search" box, select "Protein," near the top of the drop-down box.
In the "for" box, type or copy and paste NP_536853. This is the accession number for the human mitochondrial protein NADH dehydrogenase subunit 5 (NADH5). An accession number is a unique identifier for a particular sequence.
- Other proteins can be used as well.
Click on the link for NP_536853.
- The page for NP_536853 contains information about the human NADH5 protein, including the protein sequence.
- Return to the NCBI homepage at https://www.ncbi.nlm.nih.gov/.
- Click on the "BLAST" link, near the top of the page.
- Click on the "protein blast" link, under "Basic BLAST."
- In the box "Enter query sequence/ accession number" type or copy and paste NP_536853.
In the box "Choose search set:"
- For "Database," select "Reference Proteins."
- For "Organisms," type " primates (taxid:9443)." This restricts the database search to primate sequences, which simplifies the output.
- For "Entrez query," type or copy and paste "595:605 [slen]." This restricts the output to full-length sequences, which can be compared to each other.
For "Algorithm," select "blastp."
- This is for a protein query vs. a protein database.
- Click on "Show results in a new window."
- Now click on "BLAST."
When the BLAST results page appears, look at the alignments of the proteins.
- Each alignment compares the human NADH5 protein against the NADH protein in a primate.
- How many full-length primate NADH5 proteins are in the database? This is the number of hits you got back in the BLAST search.
Make a data table of the BLAST results. Include the following: scientific names for the animals, common names, and BLAST percent identity.
- In the "Alignment" section, percent identity is given above each alignment; for example, "Identities = 563/564 (99%)."
- This is the number of amino acids that are identical in each protein.
- Add a column for BLAST "% difference." You can calculate this as 100% identical (for example, 99% identity = 1% difference).
Add a column for time, in millions of years ago (mya), that selected lines split from the human lineage. Use these numbers, from Richard Dawkins' book, The Ancestor's Tale:
- Neanderthal: 0.4 mya
- Chimpanzee: 6 mya
- Gorillas: 7 mya
- Orangutans: 14 mya
- Gibbons: 18 mya
- Colobus monkeys, macaques, etc: 25 mya
- Lemurs, tarsiers, etc, 63 mya
Calculate the rate of mutation for NADH5 in the primate line.
- Divide the percent difference by millions of years ago for the time in the past at which the lineages split.
- Calculate the average mutation rate.
- Based on your results using the human NADH5 protein, list the primates from most- to least-related to man.
- Next, generate a tree that shows how the different species are related to each other.
Click on the button "Distance tree of results."
- This will launch the "Distance tree widget."
- Click on the "slanted" tab to make a tree. Explore the other options for making trees.
To copy the tree for your notes, take a screen shot and paste it into a graphics program, like Adobe Photoshop, Microsoft PowerPoint, etc.
- For a compelling figure, you might redraw the tree and add pictures, the times that particular lineages split, common animal names, etc.
To simplify the tree, go back to the page with the alignments and select which organisms to include.
- First, click on "Deselect all."
- Then click on just the organisms that you want to focus on. For example, you could choose one species to represent larger groups, such as Homo sapiens, Homo sapiens neanderthalensis, Pan troglodytes (chimpanzee), Gorilla gorilla gorilla, Hylobates lar (gibbon), Pongo (orangutan), etc.
Looking for more big data science projects? Explore the World of Big Data with Your Science Project!
If you like this project, you might enjoy exploring these related careers:
- Click on the button "Taxonomy report" on the BLAST output page. Use this information in your report.
- Make a phylogenetic tree based on pair-wise comparisons. Click on the sequences to download, then click on "Get selected sequences." Go to the ClustalW2 page at EBI, and follow the instructions for aligning the sequences and generating a ClustalW-based tree.
- Analyze the mutation rate for different parts of the NADH5 protein. Are certain regions more prone to changes? Why? (Hint: Amino acids that are critical to the protein's structure or function will not vary as much as less-vital amino acids).
- Select a different protein for evolutionary genomics analysis. Go to the NCBI homepage, select "Genome" to search, then type in NC_001807. This is the page for the Homo sapiens mitochondrial genome. Open the NC_001807 page, and click on "protein coding: 13." These 13 proteins are coded for by the mitochondrial DNA. Pick one and repeat your analysis as above. Do you get similar results?
Looking for more big data science projects? Explore the World of Big Data with Your Science Project!
Ask an ExpertThe Ask an Expert Forum is intended to be a place where students can go to find answers to science questions that they have been unable to find using other resources. If you have specific questions about your science fair project or science fair, our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.
Ask an Expert
News Feed on This Topic
Looking for more science fun?
Try one of our science activities for quick, anytime science explorations. The perfect thing to liven up a rainy day, school vacation, or moment of boredom.Find an Activity
Explore Our Science Videos
Two-Stage Balloon Rocket Introduction
Vibration & Sound: Make Sprinkles Dance
Paper Roller Coasters - Fun STEM Activity!