Objective
The goal of this project is to learn the basics about DNA sequences by examining some simple differences between groups of genes.
Introduction
DNA is double stranded (a double helix) and made up of base PAIRS. Adenine on one strand (represented with an "A") always pairs with thymine (represented with a "T") on the other strand. These are called A~T pairs regardless of which strand has the "A" and which the "T." Similarly, cytosine on one strand (represented with a "C") always pairs with guanine (represented with a "G") on the other strand, creating G~C pairs.
Scientists often represent DNA strands with a string of letters like this:
ATATTTGAAAGCTGTGTCTGTAAACTGATGGCTAACAAAACTAG
This string of letters represents only one strand, or one half of the DNA molecule. There is no need to write down the other strand because as we just described above, a "G" in one strand means there is automatically a "C" in the other strand, just as a "C" in one strand implies that the other contains a "G".
Now think of the human genome and all of the genes in it as a VERY large set of blueprints. Each blueprint is an instruction set for assembling one part or piece of a cell. Almost every cell in your body carries the same set of blueprints -- so what makes a cell in your brain different from a cell in your stomach? A neuron and a stomach lining cell are very different in their morphologies (how they look) and their functions (what job they do). The different shapes and functions are a result of the fact that those two cells use different portions of the complete blueprint set to construct themselves. The neuron uses the blueprints for parts involved in brain signaling, while the stomach cell does not. The stomach cell makes parts for secreting stomach enzymes to help in food digestion, while the brain cell does not. These kinds of blueprints (or genes) are often called "tissue specific" because they are used in some body parts and not in others. However, there are also some blueprints that are used in every cell in the body because the parts they represent are needed in every cell (like pieces used during cell division or making energy). These kinds of blueprints (or genes) are often called "housekeeping" because they represent a basic need of every cell and they "keep up" the basic functions of the cell.
How does the cell know which blueprints to use? Each gene (or blueprint) is has its own control panel that acts as a group of switches affecting when (during an organism's development), where (in the body) and how much a particular blueprint is used. Scientists are still working hard at being able to identify all of the pieces of a gene's control panel. One important part of the control panel that we know a lot about is the "promoter." The number and pattern of As, Ts, Gs and Cs in a promoter is important in determining whether the switch will act like a "housekeeping" switch or a "tissue specific" switch. As of today, scientists are just beginning to understand why this is true.
Terms, Concepts and Questions to Start Background Research
Bibliography
Background knowledge/info:
Experimental Procedure
In this experiment, you will compare housekeeping promoters to other genes by calculating the percentage of G~C content. This will make sense in a minute!
How to calculate the G~C pair content of a DNA sequence:
ATATTTGAAAGCTGTGTCTGTAAACTGATGGCTAACAAAACTAGYou should get a total count of 15 (9 G's and 6 C's).
Now, let's perform the experiment:
Step 1: Formulate your hypothesis. Which do you think might be true?
Step 2: Calculate the %G~C content for each sequence listed below on this page. We have provided partial DNA sequences for three housekeeping promoters, three tissue specific promoters, and for comparison, a number of additional genes of the sort that the promoters would regulate and control.
Time saver! Do the first couple sequences "by hand," but then you can use the %G~C Content Calculator. To use the calculator:
Step 3: Record your results in a table with the % G~C content in one column and the name of the sequence in the second column. Fill the table out as you do your calculations. It should look like this:
| %G~C Content | Sequence Name | Type of Gene |
| 48% | Bone Morphogenetic Protein 5 (BMP5) | Tissue specific promoter |
| ... | ... | ... |
Step 4: Sort the rows of the table, so that the highest %G~C content is at the top, the next highest percentage appears second, the third highest percentage appears third, and so forth. When you are done, the very lowest %G~C content should be at the bottom.
If you need or want to do a graph for your science fair project (it's almost always a good idea to do so), you can also do a bar chart showing the %G~C content for each gene.
Step 5: Draw your conclusion. What can you say about housekeeping promoters? How does their %G~C content compare to tissue specific promoters? How does the %G~C content of the housekeeping promoters compare to other genes of the sort that the promoters would regulate and control (bone morphogenetic protein 7, leptin, opsin, and cystic fibrosis genes)?
DNA Sequences for Your Experiment
Here are the sequences to use for your experiment. Note that these are partial sequences for the molecules; the full sequence is generally much longer.
Housekeeping Promoters:
GCGCGCTGGGCGGGCCCGTGGCTATATAAGGCAGGCGCGGGGGTGGCGCG
CAGGCGCCCGCCCCCGCCCCCGCCGATTAAATGGGCCGGCGGGGCTCAGC
CGAGCGGCCGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCACC
Tissue Specific Promoters:
TCCCAGCAGGGTTGTGCTTACACTACTCTTTAGATCTCTCTTGAAGAGGG
CAGGAGCCAGGGCTGGGCATAAAAGTCAGGGCAGAGCCATCTATTGCTTA
TTTGCCCTGGGACGTATTACTACTGTCTTGGTAAAGAGAAATCTTTTGTT
Bone Morphogenetic Protein 7 Genes:
Here is a partial DNA sequence from humans, pig, rabbit, and sheep for the Bone Morphogenetic Protein 7 gene (BMP7). You will notice that the sequences are not exactly the same. Bone Morphogenetic Proteins represent signals found in the body that help induce bone growth.
AGAACCGCTCCAAGACGCCCAAGAACCAGGAAGCCCTGCGGATGGCCAACGTGGCAGAG AACAGCAGCAGCGACCAGAGGCAGGCCTGTAAGAAGCACGAGCTGTATGTCAGCTTCCG AGACCTGGGCTGGCAGGACTGGATCATCGCGCCTGAAGGCTACGCCGCCTACTACTGTG AGGGGGAGTGTGCCTTCCC
AGAACCGCTCCAAGACGCCCAAGAACCAGGAAGCCCTGCGGGTGGCCAACGTCGCAGAG AACAGCAGCAGTGACCAGCGGCAGGCCTGTAAGAAGCATGAGCTCTACGTCAGCTTCCG GGACCTGGGCTGGCAAGACTGGATCATCGCGCCCGAAGGCTATGCCGCCTACTACTGCG AGGGGGAGTGCGCCTTCCC
AGAACCGCTCCAAGGCACCCAAGAACCAAGAGGCGCTGCGAGTGGCCAACGTGGCAGAA AACAGCAGCAGTGACCAGCGGCAGGCGTGCAAGAAACACGAACTGTACGTCAGCTTCCG CGACCTGGGCTGGCAGGATTGGATCATTGCCCCGGAAGGCTACGCCGCCTACTACTGCG AGGGAGAGTGCGCCTTCCC
AGAATCGCTCCAAGGCGCCCAAGAACCAAGAAGCCCTGCGGGTGGCCAACGTCGCAGAA AACAGCAGCAGTGACCAGAGGCAGGCATGTAAGAAGCACGAGCTATACGTCAGCTTCCG GGACCTGGGCTGGCAGGATTGGATCATCGCACCCGAAGGCTATGCCGCCTACTACTGCG AGGGGGAGTGCGCCTTCCC
Leptin Genes:
Here is a partial DNA sequence from humans, cow, dog, and horse for Leptin (LEP), a signal found in the body that tells your brain how much fat you have stored away. Leptin may help regulate how hungry you feel. You will notice that the sequences are not exactly the same.
TGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGAT GACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACGCA GTCAGTCTCCTCCAAACAGAAAGTCACCGGTTTGGACTTCATTCCTGGGCTCCACCCCA TCCTGACCTTATCCAAGATGGACCAGACACTGGCAGTCTACCAACAGATCCTCACCAGT ATGCCTTCCAGAAACGTGATCCAAATATCCAACGACCTGGAGAACCTCCGGGATCTTCT TCACGTGCTGGCCTTCTCTAAGAGCTGCCACTTGCCCTGGGCCAGTGGCCTGGAGACCT TGGACAGCCTGGGGGGTGTCCTGGAAGCTTCAGGCTACTCCACAGAGGTGGTGGCCCTG AGCAGGCTGCAGG
TGTGGCTTTGGCCCTATCTGTCTTACGTGGAGGCTGTGCCCATCCGCAAGGTCCAGGAT GACACCAAAACCCTCATTAAGACAATTGTCACCAGGATCAATGACATCTCACACACGCA GTCCGTCTCCTCCAAACAGAGGGTCACTGGTTTGGACTTCATCCCTGGGCTCCACCCTC TCCTGAGTTTGTCCAAGATGGACCAGACATTGGCGATCTACCAACAGATCCTCACCAGT CTGCCTTCCAGAAATGTGGTCCAAATATCCAATGACCTGGAGAACCTCCGGGACCTTCT CCACCTGCTGGCCGCCTCCAAGAGCTGCCCCTTGCCGCAGGTCAGGGCCCTGGAGAGCT TGGAGAGCTTGGGCGTTGTCCTGGAAGCTTCCCTCTACTCCACCGAGGTGGTGGCCCTG AGCCGGCTGCAGG
TGTGGCTCTGGCCCTATCTGTCCTGTGTTGAAGCTGTGCCAATCCGAAAAGTCCAGGAC GACACCAAACCCCTCATCAAGACGATTGTCGCCAGGATCAATGACATTTCACACACTCA GTCTGTCTCCTCCCAACAGAGGGTCGCTGGTCTGGACTTCATTCCTGGGCTCCAACCAG TCCTGAGTTTGTCCAGGATGGGCCAGACGTTGGCCATATACCAACAGATCCTCAACAGT CTGCATTCCAGAAATGTGGTCCAAATATCTAATGACCTGGAGAACCTCCGGGACCTTCT CCACCTGCTGGCCTCCTCCAAGAGCTGCCCCTTGCCCCGGGCCAGGGGCCTGGAGACCT TTGAGAGCGTGGGCGGCGTCCTGGAAGCCTCACTCTACTCCACAGAAGTGGTGGCTCTG AACAGACTGCAGG
TGTGGCTTTGGCCCTATCTGTTCTTCATTGAAGCTGTGCCCATCCGAAAAGTCCAGGAT GACACCAAAACCCTCATCAAGACGATTGTCACCAGGATCAATGACATTTCACACACGCA GTCAGTCTCCTCCAAACAGAGGGTCACTGGTTTGGACTTCATTCCTGGGCTTCACCCTG TCCTGAGTTTGTCCAAGATGGACCAGACATTGGCAATCTACCAACAGATCCTTACAAGT CTGCCTTCCAGAAATGTGATCCAGATATCTAATGACCTGGAGAACCTCCGGGACCTTCT CCACCTGCTGGCCTCCTCCAAGAGTTGCCCCTTGCCCCAGGCCAGGGGTCTGGAGACCT TGGCGAGCCTGGGCGGTGTCCTGGAAGCTTCACTCTACTCCACAGAGGTGGTAGCCCTG AGCAGGCTGCAGG
Other genes:
ATATTTGAAAGCTGTGTCTGTAAACTGATGGCTAACAAAACTAGGATTTTGGTCACTTC TAAAATGGAACATTTAAAGAAAGCTGACAAAATATTAATTTTGAATGAAGGTAGCAGCT ATTTTTATGGGACATTTTCAGAACTCCAAAATCTACAGCCAGACTTTAGCTCAAAACTC ATGGGATGTGATTCTTTCGACCAATTTAGTGCAGAAAGAAGAAATTCAATCCTAACTGA GACCTTACACCGTTTCTCATTAGAAGGAGATGCTCCTGTCTCCTGGACAGAAACCAATC TTTTAAACAGACTGGAGAGTTTGGGGAAAAAAGGAAGAATTCTATTCTCAATCCAATCA ACTCTATACGAAAATTTTCCATTGTGCAAAAGACTCCCTTACAAATGAATGGCATCGAA GAGGATTCTGATGAGCCTTTAGAGAGAAGGCTGTCCTTAGTACCAGATTCTGAGCAGGG AGAGGCGATACTGCCTCGCATCAGCGTGATCAGCACTGGCCCCACGCTTCAGGCACGAA GGAGGCAGTCTGTCCTGAACCTGATGACACACTCAGTTAACCAAGGTCAGAACATTCAC CGAAAGACAACAGCATCCACACGAAAAGTGTCACTGGCCCCTCAGGCAAACTTGACTGA ACTGGATATATATTCAAGAAGGTTATCTCAAGAAACTGGCTTGGAAATAAGTGAAGAAA TTAACGAAGAAGACTTAAAGG
CCCTTCGAAGGCCCGAATTACCACATCGCTCCCAGATGGGTGTACCACCTCACCAGTGT CTGGATGATCTTTGTGGTCATTGCATCCGTTTTCACAAATGGGCTTGTGCTGGCGGCCA CCATGAAGTTCAAGAAGCTGCGCCACCCGCTGAACTGGATCCTGGTGAACCTGGCGGTC GCTGACCTGGCAGAGACCGTCATCGCCAGCACTATCAGCGTTGTGAACCAGGTCTATGG CTACTTCGTGCTGGGCCACCCTATGTGTGTCCTGGAGGGCTACACCGTCTCCCTGTGTG GGATCACAGGTCTCTGGTCTCTGGCCATCATTTCCTGGGAGAGATGGATGGTGGTCTGC AAGCCCTTTGGCAATGTGAGATTTGATGCCAAGCTGGCCATCGTGGGCATTGCCTTCTC CTGGATCTGGGCTGCTGTGTGGACAGCCCCGCCCATCTTTGGTTGGAGCAGGTACTGGC CCCACGGCCTGAAGACTTCATGCGGCCCAGACGTGTTCAGCGGCAGCTCGTACCCCGGG GTGCAGTCTTACATGATTGTCCTCATGGTCACCTGCTGCATCACCCCACTCAGCATCAT CGTGCTCTGCTACCTCCAAGTGTGGCTGGCCATCCGAGCGGTGGCAAAGCAGCAGAAAG AGTCTGAATCCACCCAGAAGGCAGAGAAGGAAGTGACGCGCATGGTGGTGGTGATGGTC CTGGCATTCTGCTTCTGCTGGGGACCATACGCCTTCTTCGCATGCTTTGCTGCTGCCAA CCCTGGCTA
Variations
For the leptin and bone morphogenetic protein 7 genes you have data for different species. How would you describe the % G~C content for the same gene, but in different animals?
Credits
| Author: | Shelley Force Aldred, Department of Genetics, Stanford University |
| Sponsor: | Molecular Sciences Institute (MSI), Berkeley, California |
| Management & Editing: | Ken Hess, The Kenneth Lafferty Hess Family Charitable Foundation |
Last edit date: 2005-08-31 13:46:08
If you like this project, you might want to think about career opportunities in
Genetics & Genomics.
Many decisions regarding a person's health depend on knowing the patient's genetic risk of having a disease. Genetic counselors help assess those risks, explain them to patients, and counsel individuals and families about their options. Learn more about this career: Genetic Counselor.
|
Join Science Buddies
Become a Science Buddies member! It's free! As a member you will be the first to receive our new and innovative project ideas, news about upcoming science competitions, science fair tips, and information on other science related initiatives. |