Neanderthals, Orangutans, Lemurs, & You—It's a Primate Family Reunion!
AbstractYou have probably seen figures showing how human beings are related to chimpanzees, gorillas, and other primates. In this genomics science fair project, you will use bioinformatics tools to generate your own primate family tree.
The objective of this genomics science fair project is to build a family tree for the primates, including Homo sapiens, based on the similarity of mitochondrial proteins. The method involves BLASTing a mitochondrial protein against a sequence database and analyzing the percent identity between hits.
"Nothing in biology makes sense except in the light of evolution." This quote is from a 1973 essay by the famous geneticist Theodosius Dobzhansky. One of the key issues in biology is how we, as a species, are related to other life forms. Based on anatomy and physiology, biologists have argued for decades that human beings are closely related to chimpanzees, with close, but somewhat more distant, relatedness to other primates, such as gorillas and orangutans (Figure 1).
Figure 1. The primate family includes chimpanzees (top left), gorillas (top right), orangutans (bottom left), and gibbons (bottom right).
Despite the best efforts by generations of biologists, there are still many skeptics concerning our place on the primate family tree. In this genomics science fair project, you will make up your own mind by using raw protein sequence data and simple software tools to build a tree depicting Homo sapiens and our nearest non-human relatives.
How can you use a protein sequence to build a family tree? When two populations stop interbreeding, the two populations split into two separate species. The genes of every species accumulate mutations that are specific to each population. The longer two species have been separated, the greater will be the number of different mutations that will accumulate in each species' genome. A phylogenetic tree can be built based on the accumulated changes between genes or proteins in different organisms: fewer genetic changes implies a relatively recent divergence. If a protein mutates at a rate of 1 percent change per million years (say five amino acids change in a 500-amino-acid protein), then that protein in two species that split from a common ancestor 1 million years ago will be roughly 2 percent different. Why 2 percent? Because each species has accumulated 1 percent genetic change, independently of the other, resulting in 2 percent difference overall. Based on pair-wise comparisons of a set of genes or proteins from different organisms, it is possible to construct a phylogenetic tree that reflects the genetic distances between the organisms. This sort of tree-building was done in the past, solely by comparing the bones, anatomy, behavior, etc., of many animals. With genomics tools, it can be done online from your computer, and it can include animals that don't even have bones, provided there is DNA sequence data available. You can watch the two videos below to learn more about phylogenetics and reading phylogenetic trees.
To start this science fair project, you will go to the website for the National Center for Biotechnology Information (NCBI), a part of the U.S. National Institutes of Health (NIH). At the NCBI website, you can look up the sequence of the protein you will use as a BLAST query. BLAST is a bioinformatics tool that allows the user to search a database of proteins for sequences that are similar to the query sequence. The query sequence is the sequence used to search the database.
You will also use your BLAST results to make a rough estimate of the rate of mutation by looking at the percent difference between a human protein and the same protein in different primates. Estimating the rate at which mutations accumulate is a way of looking at evolution in action.
You will use a protein sequence from the mitochondrial genome as a query. Mitochondrial protein sequences have the advantage that they are available for many more species (including Neanderthal) than chromosomal proteins are. In addition, mitochondrial DNAs are only passed on from one parent (mothers) to their child, which means that they are not involved in recombination. As a result, any variations in mitochondrial DNA are only due to mutations.
Once you have your BLAST results, you will make a table of the results, including the scientific names and common names of the species, the percent identity between the query sequence and the database hit, and the amount of time since Homo sapiens split from the other species. Based on this table, you can calculate a rough rate of mutation.
The BLAST page also has a tool for creating a tree of the BLAST results. You will use this tree to generate a figure showing how human beings are related to other primates. If you know the DNA or protein mutation rate, the tree will give you some indication of the time since the various lines split, in millions of years ago (mya).
Terms and Concepts
- Phylogenetic tree
- Amino acid
- Pair-wise comparison
- Mitochondrial genome
- Mitochondrial protein sequence
- Chromosomal protein
- Conservative substitution
- Accession number
- What is the definition of species?
- What are the key concepts in Darwin's theory of evolution?
- What causes mutations?
- Look up the one-letter code for protein sequences. In this code, each of the 20 common amino acids is identified by a unique letter. This is the code used in the BLAST output. What letters are not used in the code?
- Dobzhansky, T. (1973, March). Nothing in Biology Makes Sense Except in the Light of Evolution. The American Biology Teacher. Vol. 35:125-129. Used by permission of National Association of Biology Teachers. Retrieved September 25, 2008.
- The Tree of Life Web Project. (1999). Primates. Retrieved September 25, 2008.
- National Center for Biotechnology Information. (2008). BLAST Guide and Tutorial. Retrieved October 1, 2008.
- Dawkins, Richard. The Ancestors Tale. New York: Houghton Mifflin Company, 2004.
- PBS. (2001). Evolving Ideas: Did Humans Evolve?. Retrieved September 25, 2008.
- European Molecular Biology Laboratory's European Biology Institute. (2008). ClustalW2. Retrieved September 25, 2008.
Materials and Equipment
- Computer with Internet access
- Lab notebook
- Before you start this science fair project, you should be familiar with the terms and concepts listed in the "Terms, Concepts, and Questions" area, above. It might also be helpful to familiarize yourself with the bioinformatic tools and websites that you are going to use. You can watch the two videos below to learn more about the BLAST tool and the NCBI website and databases.
Direct your browser to the NCBI website.
- The NCBI website has links to all sorts of information, including a database of scientific articles related to medicine and biology (PubMed), and databases of protein and nucleic acid sequences.
- In the "search" box, select "Protein," near the top of the drop-down box.
- In the "for" box, type or copy and paste YP_003024036. This is the accession number for the human mitochondrial protein NADH dehydrogenase subunit 5 (NADH5), which is part of the complete human mitochondrion. This number is a unique identifier for a particular sequence.
- Other proteins can be used as well.
- The page for YP_003024036 contains a wealth of information about the human NADH5 protein, including the names of the researchers who submitted the sequence, the date of submission, and the protein sequence.
- Open a new window or tab in your internet browser and go to the NCBI main page, and click on the BLAST link in the "Popular Resources" list on the right.
- This page has information about how BLAST works, as well as links to various BLAST search tools. The page describes the BLAST search tools, as follows: "The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide (or protein) sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences, as well as help identify members of gene families."
- There are several versions of BLAST. Since you want to use a protein sequence to search a protein database, click on "Protein BLAST" under the "Web BLAST" heading.
- Fill out the protein BLAST query form, which should look similar to Figure 2, below, when you are done.
- In the box "Enter accession number(s), gi(s), or FASTA sequence(s)," type or copy and paste YP_003024036.
- For "Database," select "Reference proteins (refseq_protein)" from the drop-down list.
- For "Organisms," type " primates (taxid:9443)." This restricts the database search to primate sequences, which simplifies the output.
- Under "Algorithm," select blastp (protein-protein BLAST).
- Next to the BLAST button, check the box "show results in a new window."
- Click on "Algorithm Parameters" underneath the BLAST button. In the "General Parameters" section, select 10 for the Max target sequences. This will limit your search to the closest 10 protein sequences and simplifies your phylogenetic tree in the following steps.
- You can expand your search to 50 or more target sequences later and also explore other BLAST options in the "Algorithm parameters" section.
- Then click on BLAST to start the search.
Screenshot of the BLAST query search page on the ncbi.nlm.nih.gov website. At the top of the BLAST query search page there is a text box where users can fill in search terms. Other options are available under the search box that allow for different databases to be searched and to limit searches through keywords or IDs.
Figure 2. Protein BLAST (blastp) query input page (NCBI, 2021.)
- Be patient. It will take a few minutes for the BLAST results to appear. See Figure 3, below, for a snapshot of how the results page should look like.
- On the top left of the BLAST results page you will find the summary section (blue in Figure 3), which provides information on different aspects of your search. On the top right there is a box that allows you to filter your results based on certain criteria (red in Figure 3). Below the top section, the BLAST results are shown (yellow in Figure 3). There are four different tabs called "Description," Graphic Summary," "Alignments," and "Taxonomy." Each tab presents the search results in a different way.
- The "Description" tab contains a summary table of hits found by BLAST and is the default tab shown.
- The "Graphic Summary" tab shows a color key of the alignments. The color key shows the degree of similarity for the sequences.
- The "Alignment" section contains the detailed pairwise alignments between query and database sequences.
- The "Taxonomy" section provides details of the taxonomic distribution of matches BLAST found.
- Click on the "Alignment" tab to look at the alignments of the proteins.
- Each alignment compares the human NADH5 protein against the NADH protein in a primate.
- Note the "Identities" value, which is the percent of amino acids that are the same in the query and the database sequence. "Positives" measures the percent of amino acids that remain the same or that were changed into similar amino acids. If the % identity between two species is 97%, then these two species differ by 3% in the protein sequence. Remember, the larger the % difference, the more distant they are in the family tree.
Screenshot of the BLAST results page on the ncbi.nlm.nih.gov website. The results page in the BLAST tool on the NCBI webpage shows a list of protein sequences that match a search term. Results provide additional information such as the percentage match a result has to a specific query string.
Figure 3. BLAST results for the human mitochondrial protein NADH dehydrogenase subunit 5 (NADH5) (NCBI, 2021.)
- Make a data table of the BLAST results. Include the following: scientific names for the animals, common names, and BLAST percent identity. View each of the four result tabs to find the information you need.
- In the "Alignment" section, percent identity is given above each alignment; for example, "Identities = 563/564 (99%)."
- This is the number of amino acids that are identical in each protein.
- Add a column for BLAST "% difference." You can calculate this as 100% identical (for example, 99% identity = 1% difference).
Add a column for time, in millions of years ago (mya), that selected lines split from the human lineage. Use these numbers, from Richard Dawkins' book, The Ancestor's Tale:
- Neanderthal: 0.4 mya
- Chimpanzee: 6 mya
- Gorillas: 7 mya
- Orangutans: 14 mya
- Gibbons: 18 mya
- Colobus monkeys, macaques, etc: 25 mya
- Lemurs, tarsiers, etc, 63 mya
Calculate the rate of mutation for NADH5 in the primate line.
- Divide the percent difference by millions of years ago for the time in the past at which the lineages split.
- Calculate the average mutation rate.
- Based on your results using the human NADH5 protein, list the primates from most- to least-related to man.
- Next, generate a tree that shows how the different species are related to each other. To generate a tree, click on the "Distance tree of results" link next to "Other reports" above the results table in the summary section of your BLAST report. This includes all of the hits from the BLAST search in the tree.
- Use the drop-down list under "Sequence label" and select "Taxonomic Name" to simplify the names on the tree.
- Keep the other tree parameters at their default settings.
- If you like you can explore the other options for making trees.
- To copy the tree for your notes, take a screen shot and paste it into a graphics program, like Adobe Photoshop, Microsoft PowerPoint, etc.
- For a compelling figure, you might redraw the tree and add pictures, the times that particular lineages split, common animal names, etc.
- What does the phylogenetic tree tell you about the closest living relatives of humans? If you need help reading the phylogenetic tree, you can view the Phylogenetics and Reading Phylogenetic Trees or the Phylogenetic analysis of pathogens video.
Ask an Expert
- Click on the button "Taxonomy report" on the BLAST output page. Use this information in your report.
- Make a phylogenetic tree based on pair-wise comparisons. Click on the sequences to download, then click on "Get selected sequences." Go to the ClustalW2 page at EBI, and follow the instructions for aligning the sequences and generating a ClustalW-based tree.
- Analyze the mutation rate for different parts of the NADH5 protein. Are certain regions more prone to changes? Why? (Hint: Amino acids that are critical to the protein's structure or function will not vary as much as less-vital amino acids).
- Select a different protein for evolutionary genomics analysis. Go to the NCBI homepage, select "Genome" to search, then type in NC_001807. This is the page for the Homo sapiens mitochondrial genome. Open the NC_001807 page, and click on "protein coding: 13." These 13 proteins are coded for by the mitochondrial DNA. Pick one and repeat your analysis as above. Do you get similar results?
If you like this project, you might enjoy exploring these related careers:
- Science Fair Project Guide
- Other Ideas Like This
- Genetics & Genomics Project Ideas
- Big Data Project Ideas
- My Favorites