Learning Your A, G, C's (and T, too)
AbstractThis is a project about the "molecular alphabet" of DNA. With just four "letters," it manages to keep track of the plan for an entire person, and keep a complete copy in nearly every cell. This project will help you start learning this new alphabet.
Sponsor: Molecular Sciences Institute (MSI), Berkeley, California
Management & Editing: Ken Hess, The Kenneth Lafferty Hess Family Charitable Foundation
ObjectiveThe goal of this project is to learn the basics about DNA sequences by examining some simple differences between groups of genes.
DNA is double stranded (a double helix) and made up of base PAIRS. Adenine on one strand (represented with an "A") always pairs with thymine (represented with a "T") on the other strand. These are called A~T pairs regardless of which strand has the "A" and which the "T." Similarly, cytosine on one strand (represented with a "C") always pairs with guanine (represented with a "G") on the other strand, creating G~C pairs.
Scientists often represent DNA strands with a string of letters like this:
This string of letters represents only one strand, or one half of the DNA molecule. There is no need to write down the other strand because as we just described above, a "G" in one strand means there is automatically a "C" in the other strand, just as a "C" in one strand implies that the other contains a "G".
Now think of the human genome and all of the genes in it as a VERY large set of blueprints. Each blueprint is an instruction set for assembling one part or piece of a cell. Almost every cell in your body carries the same set of blueprints -- so what makes a cell in your brain different from a cell in your stomach? A neuron and a stomach lining cell are very different in their morphologies (how they look) and their functions (what job they do). The different shapes and functions are a result of the fact that those two cells use different portions of the complete blueprint set to construct themselves. The neuron uses the blueprints for parts involved in brain signaling, while the stomach cell does not. The stomach cell makes parts for secreting stomach enzymes to help in food digestion, while the brain cell does not. These kinds of blueprints (or genes) are often called "tissue specific" because they are used in some body parts and not in others. However, there are also some blueprints that are used in every cell in the body because the parts they represent are needed in every cell (like pieces used during cell division or making energy). These kinds of blueprints (or genes) are often called "housekeeping" because they represent a basic need of every cell and they "keep up" the basic functions of the cell.
How does the cell know which blueprints to use? Each gene (or blueprint) is has its own control panel that acts as a group of switches affecting when (during an organism's development), where (in the body) and how much a particular blueprint is used. Scientists are still working hard at being able to identify all of the pieces of a gene's control panel. One important part of the control panel that we know a lot about is the "promoter." The number and pattern of As, Ts, Gs and Cs in a promoter is important in determining whether the switch will act like a "housekeeping" switch or a "tissue specific" switch. As of today, scientists are just beginning to understand why this is true.
Terms and Concepts
- DNA, gene
- Nucleotide bases (adenine, thymine, guanine, cytosine)
- Genetic Science Learning Center: http://gslc.genetics.utah.edu/units/basics/
- Understanding Genetics: http://www.thetech.org/genetics/ See the "Zooming into DNA" link.
- NHGRI Education: http://www.genome.gov/Education/ See the book "From Blueprint to You."
- The online encyclopedia Wikipedia has a slightly more advanced discussion of DNA. http://en.wikipedia.org/wiki/Deoxyribonucleic_acid
Materials and Equipment
- Computer with Internet connection
- Lab notebook
In this experiment, you will compare housekeeping promoters to other genes by calculating the percentage of G~C content. This will make sense in a minute!
How to calculate the G~C pair content of a DNA sequence:
- Count the total number of G's and C's. Try it for this sample:
ATATTTGAAAGCTGTGTCTGTAAACTGATGGCTAACAAAACTAGYou should get a total count of 15 (9 G's and 6 C's).
- Now count the total number of letters (bases). You should get 44.
- %G~C pair content = (Count of G's and C's / Total count of all bases)*100
So, for our sample, the %G~C pair content = (15/44)*100 or 34%
Now, let's perform the experiment:
Step 1: Formulate your hypothesis. Which do you think might be true?
- Housekeeping promoters will have lower G~C content than other genes,
- Housekeeping promoters will have higher G~C content than other genes, or
- Housekeeping promoters will have similar G~C content to other genes.
Step 2: Calculate the %G~C content for each sequence listed below on this page. We have provided partial DNA sequences for three housekeeping promoters, three tissue specific promoters, and for comparison, a number of additional genes of the sort that the promoters would regulate and control.
Time saver! Do the first couple sequences "by hand," but then you can use the %G~C Content Calculator. To use the calculator:
- Copy and paste the DNA sequence you want to analyze into the box.
- Press "Calculate" to count the bases and determine the %G~C content.
- Record your results.
- Press "Clear Form" to clear all the fields, preparing the calculator for its next count.
Step 3: Record your results in a table with the % G~C content in one column and the name of the sequence in the second column. Fill the table out as you do your calculations. It should look like this:
|%G~C Content||Sequence Name||Type of Gene|
|48%||Bone Morphogenetic Protein 5 (BMP5)||Tissue specific promoter|
Step 4: Sort the rows of the table, so that the highest %G~C content is at the top, the next highest percentage appears second, the third highest percentage appears third, and so forth. When you are done, the very lowest %G~C content should be at the bottom.
If you need or want to do a graph for your science fair project (it's almost always a good idea to do so), you can also do a bar chart showing the %G~C content for each gene.
Step 5: Draw your conclusion. What can you say about housekeeping promoters? How does their %G~C content compare to tissue specific promoters? How does the %G~C content of the housekeeping promoters compare to other genes of the sort that the promoters would regulate and control (bone morphogenetic protein 7, leptin, opsin, and cystic fibrosis genes)?
DNA Sequences for Your Experiment
Here are the sequences to use for your experiment. Note that these are partial sequences for the molecules; the full sequence is generally much longer.
- Heat Shock Protein 90 (HSP90): When proteins get over-heated, their folding and conformation gets messed up which often affects their function. Heat Shock Proteins repair the unfolded proteins back to their working state.
- Glucose-6-phosphate Dehydrogenase (G6PD): This molecule is a member of a team that helps protect each cell from agents that damage important proteins.
- Beta-actin (ACTB): Actin proteins help the cell make an internal "skeleton" that maintains the cell's proper shape.
Tissue Specific Promoters:
- Bone Morphogenetic Protein 5 (BMP5): Bone morphogenetic proteins help induce the growth of new bone.
- Hemoglobin Beta (HBB): Part of hemoglobin which carries iron molecules in blood cells.
- GABA Receptor A1 (GABRA1): An important receptor of chemical signals that travel only in the brain.
Bone Morphogenetic Protein 7 Genes:
Here is a partial DNA sequence from humans, pig, rabbit, and sheep for the Bone Morphogenetic Protein 7 gene (BMP7). You will notice that the sequences are not exactly the same. Bone Morphogenetic Proteins represent signals found in the body that help induce bone growth.
- Human BMP7
AGAACCGCTCCAAGACGCCCAAGAACCAGGAAGCCCTGCGGATGGCCAACGTGGCAGAG AACAGCAGCAGCGACCAGAGGCAGGCCTGTAAGAAGCACGAGCTGTATGTCAGCTTCCG AGACCTGGGCTGGCAGGACTGGATCATCGCGCCTGAAGGCTACGCCGCCTACTACTGTG AGGGGGAGTGTGCCTTCCC
- Pig BMP7
AGAACCGCTCCAAGACGCCCAAGAACCAGGAAGCCCTGCGGGTGGCCAACGTCGCAGAG AACAGCAGCAGTGACCAGCGGCAGGCCTGTAAGAAGCATGAGCTCTACGTCAGCTTCCG GGACCTGGGCTGGCAAGACTGGATCATCGCGCCCGAAGGCTATGCCGCCTACTACTGCG AGGGGGAGTGCGCCTTCCC
- Rabbit BMP7
AGAACCGCTCCAAGGCACCCAAGAACCAAGAGGCGCTGCGAGTGGCCAACGTGGCAGAA AACAGCAGCAGTGACCAGCGGCAGGCGTGCAAGAAACACGAACTGTACGTCAGCTTCCG CGACCTGGGCTGGCAGGATTGGATCATTGCCCCGGAAGGCTACGCCGCCTACTACTGCG AGGGAGAGTGCGCCTTCCC
- Sheep BMP7
AGAATCGCTCCAAGGCGCCCAAGAACCAAGAAGCCCTGCGGGTGGCCAACGTCGCAGAA AACAGCAGCAGTGACCAGAGGCAGGCATGTAAGAAGCACGAGCTATACGTCAGCTTCCG GGACCTGGGCTGGCAGGATTGGATCATCGCACCCGAAGGCTATGCCGCCTACTACTGCG AGGGGGAGTGCGCCTTCCC
Here is a partial DNA sequence from humans, cow, dog, and horse for Leptin (LEP), a signal found in the body that tells your brain how much fat you have stored away. Leptin may help regulate how hungry you feel. You will notice that the sequences are not exactly the same.
- Human Leptin
TGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGAT GACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACGCA GTCAGTCTCCTCCAAACAGAAAGTCACCGGTTTGGACTTCATTCCTGGGCTCCACCCCA TCCTGACCTTATCCAAGATGGACCAGACACTGGCAGTCTACCAACAGATCCTCACCAGT ATGCCTTCCAGAAACGTGATCCAAATATCCAACGACCTGGAGAACCTCCGGGATCTTCT TCACGTGCTGGCCTTCTCTAAGAGCTGCCACTTGCCCTGGGCCAGTGGCCTGGAGACCT TGGACAGCCTGGGGGGTGTCCTGGAAGCTTCAGGCTACTCCACAGAGGTGGTGGCCCTG AGCAGGCTGCAGG
- Cow Leptin
TGTGGCTTTGGCCCTATCTGTCTTACGTGGAGGCTGTGCCCATCCGCAAGGTCCAGGAT GACACCAAAACCCTCATTAAGACAATTGTCACCAGGATCAATGACATCTCACACACGCA GTCCGTCTCCTCCAAACAGAGGGTCACTGGTTTGGACTTCATCCCTGGGCTCCACCCTC TCCTGAGTTTGTCCAAGATGGACCAGACATTGGCGATCTACCAACAGATCCTCACCAGT CTGCCTTCCAGAAATGTGGTCCAAATATCCAATGACCTGGAGAACCTCCGGGACCTTCT CCACCTGCTGGCCGCCTCCAAGAGCTGCCCCTTGCCGCAGGTCAGGGCCCTGGAGAGCT TGGAGAGCTTGGGCGTTGTCCTGGAAGCTTCCCTCTACTCCACCGAGGTGGTGGCCCTG AGCCGGCTGCAGG
- Dog Leptin
TGTGGCTCTGGCCCTATCTGTCCTGTGTTGAAGCTGTGCCAATCCGAAAAGTCCAGGAC GACACCAAACCCCTCATCAAGACGATTGTCGCCAGGATCAATGACATTTCACACACTCA GTCTGTCTCCTCCCAACAGAGGGTCGCTGGTCTGGACTTCATTCCTGGGCTCCAACCAG TCCTGAGTTTGTCCAGGATGGGCCAGACGTTGGCCATATACCAACAGATCCTCAACAGT CTGCATTCCAGAAATGTGGTCCAAATATCTAATGACCTGGAGAACCTCCGGGACCTTCT CCACCTGCTGGCCTCCTCCAAGAGCTGCCCCTTGCCCCGGGCCAGGGGCCTGGAGACCT TTGAGAGCGTGGGCGGCGTCCTGGAAGCCTCACTCTACTCCACAGAAGTGGTGGCTCTG AACAGACTGCAGG
- Horse Leptin
TGTGGCTTTGGCCCTATCTGTTCTTCATTGAAGCTGTGCCCATCCGAAAAGTCCAGGAT GACACCAAAACCCTCATCAAGACGATTGTCACCAGGATCAATGACATTTCACACACGCA GTCAGTCTCCTCCAAACAGAGGGTCACTGGTTTGGACTTCATTCCTGGGCTTCACCCTG TCCTGAGTTTGTCCAAGATGGACCAGACATTGGCAATCTACCAACAGATCCTTACAAGT CTGCCTTCCAGAAATGTGATCCAGATATCTAATGACCTGGAGAACCTCCGGGACCTTCT CCACCTGCTGGCCTCCTCCAAGAGTTGCCCCTTGCCCCAGGCCAGGGGTCTGGAGACCT TGGCGAGCCTGGGCGGTGTCCTGGAAGCTTCACTCTACTCCACAGAGGTGGTAGCCCTG AGCAGGCTGCAGG
- Here is a partial DNA sequence for the human Cystic Fibrosis gene (CFTR). In the body this gene's product is involved in making sure mucous doesn't build up in the lungs and that the pancreas secretes the right enzymes to help you digest your food. If this gene is damaged, a patient gets Cystic Fibrosis.
ATATTTGAAAGCTGTGTCTGTAAACTGATGGCTAACAAAACTAGGATTTTGGTCACTTC TAAAATGGAACATTTAAAGAAAGCTGACAAAATATTAATTTTGAATGAAGGTAGCAGCT ATTTTTATGGGACATTTTCAGAACTCCAAAATCTACAGCCAGACTTTAGCTCAAAACTC ATGGGATGTGATTCTTTCGACCAATTTAGTGCAGAAAGAAGAAATTCAATCCTAACTGA GACCTTACACCGTTTCTCATTAGAAGGAGATGCTCCTGTCTCCTGGACAGAAACCAATC TTTTAAACAGACTGGAGAGTTTGGGGAAAAAAGGAAGAATTCTATTCTCAATCCAATCA ACTCTATACGAAAATTTTCCATTGTGCAAAAGACTCCCTTACAAATGAATGGCATCGAA GAGGATTCTGATGAGCCTTTAGAGAGAAGGCTGTCCTTAGTACCAGATTCTGAGCAGGG AGAGGCGATACTGCCTCGCATCAGCGTGATCAGCACTGGCCCCACGCTTCAGGCACGAA GGAGGCAGTCTGTCCTGAACCTGATGACACACTCAGTTAACCAAGGTCAGAACATTCAC CGAAAGACAACAGCATCCACACGAAAAGTGTCACTGGCCCCTCAGGCAAACTTGACTGA ACTGGATATATATTCAAGAAGGTTATCTCAAGAAACTGGCTTGGAAATAAGTGAAGAAA TTAACGAAGAAGACTTAAAGG
- Here is a partial DNA sequence for human Opsin1 (OPS1MW) Opsins are involved in providing color vision in the eye. Changes in the function of an opsin protein can lead to color-blindness.
CCCTTCGAAGGCCCGAATTACCACATCGCTCCCAGATGGGTGTACCACCTCACCAGTGT CTGGATGATCTTTGTGGTCATTGCATCCGTTTTCACAAATGGGCTTGTGCTGGCGGCCA CCATGAAGTTCAAGAAGCTGCGCCACCCGCTGAACTGGATCCTGGTGAACCTGGCGGTC GCTGACCTGGCAGAGACCGTCATCGCCAGCACTATCAGCGTTGTGAACCAGGTCTATGG CTACTTCGTGCTGGGCCACCCTATGTGTGTCCTGGAGGGCTACACCGTCTCCCTGTGTG GGATCACAGGTCTCTGGTCTCTGGCCATCATTTCCTGGGAGAGATGGATGGTGGTCTGC AAGCCCTTTGGCAATGTGAGATTTGATGCCAAGCTGGCCATCGTGGGCATTGCCTTCTC CTGGATCTGGGCTGCTGTGTGGACAGCCCCGCCCATCTTTGGTTGGAGCAGGTACTGGC CCCACGGCCTGAAGACTTCATGCGGCCCAGACGTGTTCAGCGGCAGCTCGTACCCCGGG GTGCAGTCTTACATGATTGTCCTCATGGTCACCTGCTGCATCACCCCACTCAGCATCAT CGTGCTCTGCTACCTCCAAGTGTGGCTGGCCATCCGAGCGGTGGCAAAGCAGCAGAAAG AGTCTGAATCCACCCAGAAGGCAGAGAAGGAAGTGACGCGCATGGTGGTGGTGATGGTC CTGGCATTCTGCTTCTGCTGGGGACCATACGCCTTCTTCGCATGCTTTGCTGCTGCCAA CCCTGGCTA
Ask an Expert
For the leptin and bone morphogenetic protein 7 genes you have data for different species. How would you describe the % G~C content for the same gene, but in different animals?
If you like this project, you might enjoy exploring these related careers: