Home Store Project Ideas Project Guide Ask An Expert Blog Careers Teachers Parents Students

Bioinformatics - The Perfect Marriage of Computer Science & Medicine

Time Required Average (6-10 days)
Prerequisites A good knowledge of basic concepts in genetics and good computer database searching skills
Material Availability Readily available
Cost Very Low (under $20)
Safety No issues


Find out the real explanation for why your parents are so weird! Here is a project that lets you explore "the net" to find out why your "DNA blueprint" is so important to health and disease. In this project you will use methods that bioinformatics and biotech scientists perform on a daily basis to decipher the human genome in their efforts to diagnose and treat genetic diseases.


In this project, you will use publicly available web-based bioinformatics resources to search for a disease of interest and SNPs, single-nucleotide polymorphisms, associated with that disease.


Science Buddies would like to thank the following volunteers from Schering-Plough who contributed towards writing this project:

  • Beth Basham Ph.D.
  • Melissa Bilardello, B.S.
  • Sarah Bodary, Ph.D.
  • Jamie Furneisen. M.S.
  • Jennifer Louten, Ph.D.
  • Sheela Mohan-Peterson, J.D.
  • Venkataraman Sriram, Ph.D.

Edited by Sara Agee, Ph.D., Science Buddies

Cite This Page

MLA Style

Science Buddies Staff. "Bioinformatics - The Perfect Marriage of Computer Science & Medicine" Science Buddies. Science Buddies, 16 Nov. 2013. Web. 15 Sep. 2014 <http://www.sciencebuddies.org/science-fair-projects/project_ideas/Genom_p008.shtml>

APA Style

Science Buddies Staff. (2013, November 16). Bioinformatics - The Perfect Marriage of Computer Science & Medicine. Retrieved September 15, 2014 from http://www.sciencebuddies.org/science-fair-projects/project_ideas/Genom_p008.shtml

Share your story with Science Buddies!

I did this project I Did This Project! Please log in and let us know how things went.

Last edit date: 2013-11-16


Biomedical Informatics is a broad discipline that encompasses bioinformatics and computational biology. Online bioinformatics resources, such as the database Online Mendelian Inheritance in Man, or OMIM, allow bioscience researchers to search up-to-date information on human genes, genetic traits and disorders. This project will take you through the step-by-step process of researching a specific disease of interest and how a single base change in one's DNA could be associated with that disease. This project should take approximately one week to complete.

Scientists are on a constant quest to improve and lengthen the quality of human life. DNA, the blueprint of life, has hidden clues for this quest. Identifying these clues is analogous to the cliché often heard “finding a needle in a haystack.” The “haystack” for this project is the public bioinformatics databases, such as OMIM, containing a multitude of genetic information and the “needle” is the SNP, (pronounced snip), single-nucleotide polymorphism.

A Single Nucleotide Polymorphism, or SNP, is a small genetic change, or variation, that can occur within a person's DNA sequence. SNPs represent the most frequent type of DNA variation found in the human population. These variations can be used to study and track inheritance in families. Despite the fact that more than 99% of human DNA sequences are the same across the population, small variations in DNA sequence, such as SNPs, can have a major impact on how humans respond to disease, environmental factors, and medicines. Interestingly, SNPs are evolutionarily stable. This means they don't change much from generation to generation. That being said SNPs are of great interest and value for biomedical research. Development pharmaceutical products or medical diagnostics are being influenced by SNP data.

This cartoon depiction of a SNP (Wikipedia contributors, n.d.) shows how DNA strand 1 differs from DNA strand 2 at a single base-pair location (a C/T polymorphism):

Genomics Science Project Idea - SNP
Here you can see a single nucleotide polymorphism, or SNP, that results in a small genetic change between sequence 1 and 2 (Wikipedia contributors, n.d.).

DNA, deoxyribonucleic acid, supplies a set of instructions for each living organism. Every cell in each organism contains an entire copy of DNA. Genes are sets of nucleotide sequences encoded and stored in DNA. Each gene encodes for a certain protein. DNA is transcribed into mRNA, messenger ribonucleic acid, and then translated into protein. Proteins are defined by amino acid sequences. A single amino acid is encoded by three nucleotides called a codon. There are 64 possible codons and only 20 amino acids. Since there are only 20 amino acids, multiple codons encode for the same amino acid. This is known as degeneracy of the genetic code. Because of this degeneracy in the genetic code some SNPs do not result in changes in the protein sequence. This is called a synonymous change. If a SNP results in a change in the protein sequence this is termed a non-synonymous change. Finding single nucleotide changes in the human genome may be like “finding a needle in a haystack,” however, bioinformatics resources make it possible to do just that.

Genomics Science Project Idea - codon table
This codon table shows how the genetic code is converted into a sequence of amino acids that make up a protein (image courtesy of Schering-Plough).

Variations in the DNA sequences of humans can affect how humans develop diseases and respond to medicines. While SNPs do not cause disease, they can help determine the likelihood that someone will develop a particular disease. Computational Biology, the actual process of analyzing and interpreting data, combined with Bioinformatics is used to for the technology called database-mining. With the completion of The Human Genome Project in April of 2003, vast amounts of genomic data have been made available for database-mining, the process of generating hypotheses regarding function or structure of a gene or protein of interest by identifying similar or dissimilar sequences in DNA. The International HapMap Project is designed to provide information to researchers with the HapMap, a catalog of common genetic variants that occur in human beings as well as a description of the variants and where they are located in our DNA. This catalog provided information that researchers need to link genetic variants to the risk for specific illnesses.

How do scientists utilize computers for mining of biological data to study genetics and disease association? In this project, you will utilize the World Wide Web to access free bioinformatics resources to search for a disease of interest, identify SNP(s) associated with that disease, and make a hypothesis regarding the effect of the SNP(s). These public databases provide a vault of information that can be searched in many ways. We have provided one example; however, you may use your own method. With the availability of millions of SNPs, scientists now believe that exciting advances in medicine are in our near future. It is now your turn to mine databases for SNPs and make a hypothesis on the outcome on the human phenotype based on your research.

Terms and Concepts

To do this type of experiment you should know what the following terms mean. Have an adult help you search the internet, or take you to your local library to find out more!

  • Allele - Alleles are different forms of the same gene. Allelic variation in a gene arises through mutation of the DNA sequence defining the gene and may or may not be associated with trait variation (e.g., height, eye color).
  • Bioinformatics - The collection, classification, storage, and analysis of large volumes of biological information (e.g., genomic, metabolomic, proteomic) using computers.
  • Codon - Three bases in a DNA or RNA sequence which specify a single amino acid.
  • DNA (Deoxyribonucleic Acid) - DNA is the chemical that forms a basic molecular code for how a living being should operate. DNA is the biological heredity material passed down from parent to child. Four bases called adenine (A), guanine (G), cytosine (C), and thymine (T) constitute DNA. It is present in the nucleus of almost all cells in an organism.
  • Exons - Exons are sequences of DNA that code information for protein synthesis that are transcribed to messenger RNA, which in turn are translated into at least a portion of a protein.
  • Gene - DNA sequences that contain a code that can be translated into a particular protein. For example, CFTR gene has the information that is necessary for a cell to make the CFTR protein.
  • Genome - The DNA sequence of the entire organism’s chromosomes. e.g., Human Genome
  • Genomics - The study of the entire human genome. Genomics explores not only the actions of single genes, but also the interactions of multiple genes with each other and with the environment.
  • Genotype - People inherit one allele for a gene from each parent such that they have two copies of each gene. The pair of alleles defines a person’s genotype. For a gene that has two alleles in the population (e.g., an A allele and a G allele), there are three possible genotypes—AA, AG, and GG.
  • HapMap - A partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom, and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals. HapMap's goal is to ultimately develop a haplotype map of the human genome and identify haplotype blocks.
  • Homozygous - A genotype in which the two copies of the gene that determine a particular trait are the same.
  • Heterozygous - Possessing two different forms of a particular gene, one inherited from each parent.
  • Introns - Introns are segments of a genes situated between exons that are removed before translation of messenger RNA and do not function in coding for proteins or protein fragments.
  • Junk DNA/Non-Coding Region - A region of the genome where the DNA has no known function (i.e., it does not code for a protein, regulatory sequence, or other functional elements). These regions usually consist of repeating DNA sequences. The majority of the human genome has no known function; only 2 percent to 5 percent of the DNA sequence codes for genes.
  • Locus - The position of a gene on a chromosome. This term is a classical genetic concept used to understand gene order, gene distance, and gene function before gene and genomic DNA sequences were known.
  • mRNA - Messenger RNA, or a single-stranded molecule of ribonucleic acid that is transcribed from the DNA and then translated into protein.
  • Mutation - A mutation is a change in a DNA sequence. If the mutation occurs during the development of an egg or a sperm (i.e., gametes), then it becomes a heritable mutation. If the mutation occurs in any other body cell (i.e., part of the soma), then it is called a somatic mutation and it is not heritable. Somatic mutations are a cause of cancer. Mutations can be of many different types—substitutions, deletions, or insertions. Mutations in the DNA can be synonomous, e.g., not having any effect on the translated protein, or non-synonomous, causing amino acid changes in the translated protein.
  • RNA (Ribonucleic Acid) - A chemical found in the nucleus and cytoplasm of cells; it transcribes the protein-coding instructions of DNA into a code that the protein-building ribosomes of a cell can understand. The chemical structure of RNA is similar to DNA—RNA also contains adenine (A), guanine (G), and cytosine (C), but instead of thymine (T), RNA contains uracil (U).
  • SNPs (Single Nucleotide Polymorphisms) - Currently, there is estimated to be about 6 million positions in the human genome where a mutation occurred at a single nucleotide (A, T, C, or G) and both its alleles are now greater than 1 percent prevalent in the population. These SNPs are important for studies of genetic or genomic associations with disease because the alleles are common in the population.


This project is based on research that provides often inconclusive but strongly correlative evidence that associates SNPs to risk of disease. The notion is that, with the availability of information about the complete human genome, we would be able to predict the risk of an individual contracting a disease or identify individuals with specific qualitative traits (‘smart’ genes, ‘criminal’ genes, ‘intuition’ genes etc.). One outcome of such advance would be personalized medicine where it is possible to treat each individual with a custom-made drug or even perform preventive therapy. However, on the flip side, ethical concerns need to be addressed with respect to individual human rights (The Minority Report movie debate).

Here are some questions that you will be thinking about while doing this project:

  • What is FASTA format?
  • If two genes are homologous, are they similar?
  • What are different types of mutations and how do they affect protein function?
  • What is the probability of a single base mutation affecting protein function?


  • Here are some websites that you will need to complete this project:
  • Here are some useful textbooks with backgound information for you to review:
    • Watson, J.D., et al. (eds), 1987. Molecular Biology of the Gene, 4th Ed., Menlo Park, CA: Benjamin/Cummings Publishing Co., Inc.
    • Wood, E.J., Smith, C.A., Pickering, W.R. (eds), 1997. Life Chemistry and Molecular Biology, Portland, OR: Portland Press.
    • Drlica, K., 1996. Understanding DNA and Gene Cloning: A Guide for the Curious, 3rd Ed., New York, NY: John Wiley & Sons.

Materials and Equipment

  • Computer with Internet access
  • Printer

Share your story with Science Buddies!

I did this project I Did This Project! Please log in and let us know how things went.

Experimental Procedure

Searching for your disease

The OMIM database in NCBI is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the National Center for Biotechnology Information. The database contains textual information and references. It also contains copious links to MEDLINE and sequence records in the Entrez system, and links to additional related resources at NCBI and elsewhere.

  1. Search for the disease of your choice in OMIM at http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM

    Genomics Science Project  Idea - NCBI

  2. Listed will be the genes associated with your disease. Choose one to further investigate:

    Genomics Project Idea - NCBI

    Genomics Project Idea - NCBI

  3. Click on Links and Choose GeneView in dbSNP:

    Genomics Project Idea - NCBI

  4. This screen lists all of the SNP’s associated with the gene you choose above:

    Genomics Project idea - NCBI
    Genomics Project idea - NCBI
    1. Region- Location on gene where SNP is found (Make sure you pick an Exon since these are the parts that code for protein.)
    2. Contig Position
    3. mRNA Position- Coordinates for where the SNP is found in the mRNA.
    4. dbSNP rs# cluster id- unique identifier in the database.
    5. Heterozygosity- Measure of genetic variation in a population.
    6. Validation-Other sources of information supporting this SNP.
    7. 3D- SNP mapped on a 3D structure
    8. OMIM-Links you back to OMIM page
    9. Function-The effect of the SNP on the protein sequence.
    10. dbSNP allele- What amino acid is effected by the SNP.
    11. Protein residue- Amino acid change
    12. Amino acid position- The coordinates within the protein sequence.

  5. Choose a SNP in an Exon that is missense. (Results in a change in amino acid.)

    Genomics Project idea - NCBI


  6. Click on your SNP of interest. This leads you to a screen with information on the SNP such as sequence, location of the mutation and population diversity.




  7. See if your SNP has an impact on protein structure and function by going to www.snps3d.org




  8. If your SNP is red in the model that SNP is predicted to have a damaging effect on the protein. If your SNP is yellow it is not predicted to be damaging, the SNP would be harmless.


  9. Now try to establish a sequence-structure-function relationship for your SNP. First, search for the GENE in OMIM:


  10. Select your gene from the results:



  11. Follow the P link (Protein) on the left sidebar under the Entrez Gene Category to get to the protein sequence:


  12. Follow the link for the protein:


  13. Change the display to FASTA under the display dropdown menu:


  14. Copy the protein sequence. Go to the SMART site http://smart.embl-heidelberg.de/smart/set_mode.cgi?NORMAL=1 then paste the protein sequence into the Sequence box and press Sequence SMART:



  15. Rolling your mouse over a domain will show you more information about the protein domain. Find the domain where your mutation is located. In this case, it is the first AAA domain:


  16. Read the description of the domain and the Interpro abstract. Assuming that the SNP results in a mutation in this domain, what could be the biochemical effects of this mutation? How might these effects relate to the disease?
  17. Look at Pubmed for articles on effects of the mutation:



Share your story with Science Buddies!

I did this project I Did This Project! Please log in and let us know how things went.


  • Variation 1: Environmental Factor - Gene Interaction:
    Identify how certain environmental factors may affect genes and their association to diseases by using Genetic Association Database (http://geneticassociationdb.nih.gov/cgi-bin/index.cgi). NOTE: This database is open-access and allows any user to input data. Use caution while using the data and only select data that has been endorsed by ‘Gene Expert’ or ‘Disease Expert’.
    1. Click on ‘Environmental Factor Gene Interaction’ link on the left menu of the website. On the top of the page, click on the link to see a complete list of environmental factors.
    2. Choose an environmental factor of interest (for e.g., tobacco smoke) by clicking on it.
    3. You can see entries that describe gene association with specific diseases.
    4. Are you able to identify any SNPs in this category? Follow links to research more for each category.
  • Variation 2: Multi-Species Association / Conserved SNPs:
    Using the databases referenced in this project, try to identify gene mutations that are common to multiple species. If a mutation is more frequent across multiple-species and if the mutation can be matched with its phenotype across species, it provides validity to your hypothesis. Highly conserved regions (across species) have an increased likelihood of being functionally important.

Share your story with Science Buddies!

I did this project I Did This Project! Please log in and let us know how things went.

Ask an Expert

The Ask an Expert Forum is intended to be a place where students can go to find answers to science questions that they have been unable to find using other resources. If you have specific questions about your science fair project or science fair, our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Ask an Expert

Related Links

If you like this project, you might enjoy exploring these related careers:

nanosystems engineer chemically deposits metal oxide nanoparticles

Nanosystems Engineer

Imagine creating a new material, medicine, or electrical component that is too small to see. How would you design it? What could the new invention do? These are precisely the types of questions that nanosystems engineers answer every day. Nanosystems engineers design and build new technologies using the smallest building blocks, atoms, and molecules. Read more
genetic counselor sitting at desk with computer

Genetic Counselor

Many decisions regarding a person's health depend on knowing the patient's genetic risk of having a disease. Genetic counselors help assess those risks, explain them to patients, and counsel individuals and families about their options. Read more

Looking for more science fun?

Try one of our science activities for quick, anytime science explorations. The perfect thing to liven up a rainy day, school vacation, or moment of boredom.

Find an Activity