Jump to main content

Bioinformatics - The Perfect Marriage of Computer Science & Medicine

Abstract

Find out the real explanation for why your parents are so weird! Here is a science project that lets you explore the internet to find out why your "DNA blueprint" is so important to health and disease. In this science project, you will use methods that bioinformatics and biotech scientists perform on a daily basis to decipher the human genome in their efforts to diagnose and treat genetic diseases.

Summary

Areas of Science
Difficulty
 
Time Required
Average (6-10 days)
Prerequisites
A good knowledge of basic concepts in genetics and good computer database searching skills
Material Availability
Readily available
Cost
Very Low (under $20)
Safety
No issues
Credits

Science Buddies would like to thank the following volunteers from Schering-Plough who contributed towards writing this project:

  • Beth Basham Ph.D.
  • Melissa Bilardello, B.S.
  • Sarah Bodary, Ph.D.
  • Jamie Furneisen. M.S.
  • Jennifer Louten, Ph.D.
  • Sheela Mohan-Peterson, J.D.
  • Venkataraman Sriram, Ph.D.

Edited by Sara Agee, PhD, Teisha Rowland, PhD, Science Buddies, and Svenja Lohner, PhD, Science Buddies

Objective

Use publicly available web-based bioinformatics resources to search for a disease of interest and SNPs, single-nucleotide polymorphisms, associated with that disease.

Introduction

Biomedical Informatics is a broad discipline that encompasses bioinformatics and computational biology. Online bioinformatics resources, such as the database Online Mendelian Inheritance in Man, or OMIM, allow bioscience researchers to search up-to-date information on human genes, genetic traits and disorders. This project will take you through the step-by-step process of researching a specific disease of interest and how a single base change in one's DNA could be associated with that disease. This project should take approximately one week to complete.

Scientists are on a constant quest to improve and lengthen the quality of human life. DNA, the blueprint of life, has hidden clues for this quest. Identifying these clues is analogous to the cliché often heard "finding a needle in a haystack." The "haystack" for this project is the public bioinformatics databases, such as OMIM, containing a multitude of genetic information and the "needle" is the SNP, (pronounced snip), single-nucleotide polymorphism.

A Single Nucleotide Polymorphism, or SNP, is a small genetic change, or variation, that can occur within a person's DNA sequence. SNPs represent the most frequent type of DNA variation found in the human population. These variations can be used to study and track inheritance in families. Despite the fact that more than 99% of human DNA sequences are the same across the population, small variations in DNA sequence, such as SNPs, can have a major impact on how humans respond to disease, environmental factors, and medicines. Interestingly, SNPs are evolutionarily stable. This means they don't change much from generation to generation. That being said SNPs are of great interest and value for biomedical research. Developmental pharmaceutical products or medical diagnostics are being influenced by SNP data.

The cartoon depiction of a SNP in Figure 1, below, shows how DNA strand 1 differs from DNA strand 2 at a single base-pair location (a C/T polymorphism).

Diagram of Single Nucleotide Polymorphism occurring in a strand of DNA

Drawing of two strands of DNA appear identical except for a single base pair that has been changed due to single nucleotide polymorphism. The top DNA strand contains a C and G base pair that is altered to an A and T base pair in the bottom DNA strand.


Figure 1. Here you can see a single nucleotide polymorphism, or SNP, that results in a small genetic change between sequence 1 and 2 (Wikipedia contributors, n.d.).

DNA, or deoxyribonucleic acid, supplies a set of instructions for each living organism. Every cell in each organism contains an entire copy of DNA. Genes are sets of nucleotide sequences encoded and stored in DNA. Each gene encodes for a certain protein. DNA is transcribed into mRNA, messenger ribonucleic acid, and then translated into protein. Proteins are defined by amino acid sequences. A single amino acid is encoded by three nucleotides called a codon. There are 64 possible codons and only 20 amino acids, as shown in Figure 2, below. Since there are only 20 amino acids, multiple codons encode for the same amino acid. This is known as degeneracy of the genetic code. Because of this degeneracy in the genetic code some SNPs do not result in changes in the protein sequence. This is called a synonymous change. If a SNP results in a change in the protein sequence this is termed a non-synonymous change. Finding single nucleotide changes in the human genome may be like "finding a needle in a haystack," however, bioinformatics resources make it possible to do just that.

A genome codon table

The genome codon table contains four rows and four columns (each labeled with the four nucleotides found in RNA). Each cell in the table contains four different combinations of three possible nucleotides and the resulting amino acid that would be produced.


Figure 2. This codon table shows how the genetic code is converted into a sequence of amino acids that make up a protein (image courtesy of Schering-Plough).

Variations in the DNA sequences of humans can affect how humans develop diseases and respond to medicines. Computational Biology, the actual process of analyzing and interpreting data, combined with Bioinformatics is used to for the technology called database-mining. With the completion of The Human Genome Project in 2003, vast amounts of genomic data have been made available for database-mining, the process of generating hypotheses regarding function or structure of a gene or protein of interest by identifying similar or dissimilar sequences in DNA. The International HapMap Project is designed to provide information to researchers with the HapMap, a catalog of common genetic variants that occur in human beings as well as a description of the variants and where they are located in our DNA. This catalog provided information that researchers need to link genetic variants to the risk for specific illnesses.

How do scientists utilize computers for mining of biological data to study genetics and disease association? In this science project, you will utilize the World Wide Web to access free bioinformatics resources to search for a disease of interest, identify SNP(s) associated with that disease, and make a hypothesis regarding the effect of the SNP(s). These public databases provide a vault of information that can be searched in many ways. We have provided one example; however, you may use your own method. With the availability of millions of SNPs, scientists now believe that exciting advances in medicine are in our near future. It is now your turn to mine databases for SNPs and make a hypothesis on the outcome on the human phenotype based on your research.

Terms and Concepts

To do this type of experiment you should know what the following terms mean. Have an adult help you search the internet, or take you to your local library to find out more!

Questions

This science project is based on research that provides often inconclusive but strongly correlative evidence that associates SNPs to risk of disease. The notion is that, with the availability of information about the complete human genome, we would be able to predict the risk of an individual contracting a disease or identify individuals with specific qualitative traits ('smart' genes, 'criminal' genes, 'intuition' genes etc.). One outcome of such advance would be personalized medicine where it is possible to treat each individual with a custom-made drug or even perform preventive therapy. However, on the flip side, ethical concerns need to be addressed with respect to individual human rights (The Minority Report movie debate).

Here are some questions that you will be thinking about while doing this science project:

Bibliography

Here are some useful resources on SNPs and genetics that may help you complete this project:

  • University of Utah, Health Sciences. (n.d.). Making SNPs Make Sense. Learn.Genetics. Genetic Science Learning Center. Retrieved October 8, 2014.
  • Oak Ridge National Laboratory (ORNL). (February 2011). The Gene Gateway Workbook. U.S. Department of Energy Office of Biological and Environmental Research. Retrieved October 8, 2014.
  • Moult, Y., Melamud, Z. (2005). SNPs3D. University of Maryland. Retrieved October 8, 2014.

Here are some useful textbooks with background information for you to review:

  • Wood, E.J., Smith, C.A., Pickering, W.R. (eds), 1997. Life Chemistry and Molecular Biology, Portland, OR: Portland Press.
  • Drlica, K., 1996. Understanding DNA and Gene Cloning: A Guide for the Curious, 3rd Ed., New York, NY: John Wiley & Sons.

Materials and Equipment

Experimental Procedure

Before you start with this project, it might be helpful to familiarize yourself with the bioinformatic tools and websites that you are going to use. You can watch a video about the NCBI website and databases below.
How to Use the NCBI's Bioinformatics Tools and Databases

Searching for your disease

The OMIM database in NCBI is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the National Center for Biotechnology Information. The database contains textual information and references. It also contains copious links to MEDLINE and sequence records in the Entrez system, and links to additional related resources at NCBI and elsewhere.

  1. Search for the disease of your choice in the OMIM database, shown in Figure 3, below.
    1. For the purpose of simplifying directions, cystic fibrosis will be used as an example.
Screenshot of the homepage on the website ncbi.nlm.nih.gov/omim

Screenshot of the homepage of the ncbi.nlm.nih.gov/omim website. A search bar appears at the top and quick links to resources and gene tools are located at the bottom of the page.


Figure 3. The OMIM database has information on human genes and genetic disorders.
  1. Listed will be clinical results associated with your disease, as shown in Figure 4, below. These results will include genes as well as descriptions of related medical conditions. Click on the different results to see what they are. Find a result that is a gene and continue on to step 3.
    1. Tip: Gene entries are marked with an asterisk (*) in the list. Results that are genes will list a "HGNC Approved Gene Symbol" near the top of the webpage, as shown in Figure 5, below.
Screenshot of search results on the website ncbi.nlm.nih.gov/omim

Screenshot of a genetic disease search result for cystic fibrosis on the website ncbi.nlm.nih.gov/omim. A list of genes as well as related medical conditions appear in the results list.


Figure 4. When you search for a disease in the OMIM database, you will get many clinical results associated with the disease.
  1. The gene webpage will have lots of information on the disease-related gene, as shown in Figure 5, below. In your lab notebook, write down the "HGNC Approved Gene Symbol" for your gene.
    1. For example, for the cystic fibrosis gene shown in Figure 5, below, this would be CFTR.
Screenshot of the gene page for cystic fibrosis on the OMIM website

Screenshot of the gene information page for cystic fibrosis on the website ncbi.nlm.nih.gov/omim. The specific gene appears at the top of the page with an abbreviation and full name written out. Near the top of the results page, under the gene heading, there is a smaller heading that states CFTR is a HGNC approved gene symbol. This symbol is specific to the cystic fibrosis gene.


Figure 5. Clicking on a gene results will bring you to a webpage with lots of information on that specific gene.
  1. On the right side of the gene webpage, under "External Links," click on the "Gene Info" heading, highlighted in orange in Figure 6, below. Then click on "NCBI Gene" from the dropdown menu. This will take you to the NCBI Gene database, which has additional information on the gene, as shown in Figure 7, below.
Screenshot of the gene page for cystic fibrosis on the OMIM website

Screenshot of the gene information page for cystic fibrosis on the website ncbi.nlm.nih.gov/omim. The specific gene appears at the top of the page with an abbreviation and full name written out. General information about the gene is displayed in the middle of the page (including gene-phenotype relationships). A sidebar with additional links and resources can be found on the right side of the page. A menu option labeled 'Gene Info' can be found in the sidebar and contains a link labeled 'NCBI Gene'.


Figure 6. Clicking on "Gene Info," highlighted in orange here, will show you other databases with additional information on the gene. Clicking on "NCBI Gene" in this dropdown menu will take you to the NCBI Gene database.


Screenshot of the gene page for CFTR on the ncbi.nlm.nih.gov website

Screenshot of the gene page for CFTR on the website ncbi.nlm.nih.gov/omim shows general information about the gene in a summary section located in the center of the page. A side bar on the right side of the page contains a table of contents and links to related information.


Figure 7. The NCBI database, shown here, will have additional information on the gene you found using the OMIM database.
  1. On the right side of the gene webpage, under "Related information," scroll down until you find "Variation Viewer" (as shown in Figure 8, below) and click on it. This will bring you to a webpage that lists all of the variants associated with the gene you chose above, as shown in Figure 9, below.
Screenshot of the gene page for CFTR on the ncbi.nlm.nih.gov website

Screenshot of the gene information page for CFTR on the website ncbi.nlm.nih.gov/omim shows more information about the CFTR gene. A link labeled 'Variation Viewer' can be found in the sidebar on the right of the page under the heading 'Related Information'.


Figure 8. Clicking on "Variation Viewer," highlighted in orange here, will take you to a webpage on the variants associated with the gene you chose.


Screenshot of an allele chart for the CFTR gene shown in the variation viewer on the ncbi.nlm.nih.gov website

Screenshot of an allele chart on the ncbi.nlm.nih.gov website. The variation viewer window displays an allele chart for a given gene at the top of the page. At the bottom-left of the page filters can be applied to the chart to find specific alleles, such as ones that could be potentially pathogenic. Directly underneath the chart is a list of variants of alleles that display the variation type and location.


Figure 9. Clicking on "Variation Viewer" takes you to a table listing different alleles, or alternative forms that occur through mutation of the DNA, for your gene. Each row is a different allele of the gene. You can filter these alleles by their "Most severe clinical significance" (circled in blue), sort by "Variant type" (circled in green), or find more information about them by clicking on their "Variant ID" (circled in red).
  1. Each row of data in the results table on this page, shown on the lower half of Figure 9, lists a different variant for the gene you just searched for. To navigate the large table of gene variants, here is a guide for the relevant columns in variant table:
    1. Variant ID: Unique identifier for the gene variant.
    2. Location: Genomic location of the variant.
    3. Variant type: The type of variant, e.g. single nucleotide variant.
    4. Molecular consequence: Consequences of the variation that can be computed from genes and other features annotated on the genome, e.g. nonsense, missense.
    5. Publication: Number of publications associated with the variant. Click on the linked value to view the publications in PubMed.
  2. On the left side of the page you can choose different options to filter the data. As you are interested in a genetic disease, click on "Pathogenic" (circled in blue in Figure 9) to sort the variants according to this criterion.
  3. You also want to filter the variants according to their variant type. Check the box that says "Single nucleotide variant" (which means this variant type has a single nucleotide change), circled green in Figure 9.
  4. Once you have applied all your filter criteria (variant type, clinical significance, etc.), click on the arrow to the left of the variant ID (circled in yellow in Figure 9) to open a drop-down window that provides more information on this specific gene variant. Here you will find more allele information, such as the "Transcript change," which lists what the DNA mutation is or the "Protein change" that result from the mutation.
  5. Find a SNP variant that has its "Molecular consequence" listed as "missense." Click on the "Variant ID" given for that SNP, as shown in the colum circled in red in Figure 9, above.
  6. This will take you to a webpage with information on the SNP variant, such as the sequence it occurs in, the location of the mutation, and other resources, as shown in Figure 10, below. Scroll down and review each tab to view and investigate all of the available information.
Screenshot of the SNPs for the CFTR gene in the NCBI SNP database

Screenshot of an SNP cluster report page on the website ncbi.nlm.nih.gov/omim. A cluster ID is shown at the top of the page along with tables of information about the SNP below.


Figure 10. Clicking on a SNP Variant ID will take you to a webpage with a lot of information on the SNP. Explore the webpage to find out more about the SNP.
  1. In your lab notebook, write down the SNP rsID. This should be at the top of the webpage, as shown circled in green in Figure 10, above.
    1. For example, in Figure 10, above, the rsID is rs397508328.
  2. Click on the "Variant Details" tab to review the information provided there. In your lab notebook, write down the name that appears in the "Change" column of the "Gene" section that starts with "NP_000483," as shown circled in red in Figure 10, above.
    1. The right part of this name should include the amino acid change and the location of the change, for example, "Met1Val."
  3. To see if your SNP has an impact on protein structure and function, go to the SNPs3D website and search for the gene associated with your disease of interest, as shown in Figure 11, below.
Screenshot of the homepage on the website snps3d.org

Screenshot of the homepage of the website snps3d.org includes three search boxes allowing users to search the SNP database by gene, SNP ID, or disease.


Figure 11. The SNPs3D website has information on the impact of a SNP on protein structure and function, as well as other SNP-related information.
  1. You should see a gene webpage similar to Figure 12, below.
Screenshot of the SNPs3D gene page for CFTR

Screenshot of the results page on the website snps3d.org shows a rendered model of the CFTR gene on the left side of the page. The right side of the page contains information about the gene and SNP.


Figure 12. This is the SNPs3D webpage for a gene (CFTR, associated with cystic fibrosis).
  1. On the gene webpage, go to the "SNP information" section and search for your SNP of interest.
    1. You can find your SNP of interest by entering the amino acid change of your SNP into the search box (circled red in Figure 12.) For example, write "Met1Val," "Leu6Arg," or another amino acid change.
Screenshot of the SNPs results page for the CFTR gene on the website snps3d.org

Screenshot of the SNP results page on the website snps3d.org returns a list of related SNPs each with a unique ID, location and svm profile. A small penguin shaped icon will appear on the list of SNPs if that particular SNP has an associated model that can be viewed.


Figure 13. On the webpage for a gene on SNPs3D, you can search for specific SNPs. Each SNP will have numerical values that show how likely the SNP is to damage the protein. Negative red numbers indicate a likely damaging SNP, whereas blue numbers indicate a SNP that is unlikely to be damaging.
  1. After finding your SNP of interest, see whether it affects the structure and function of the gene, as shown in Figure 13, above. Specifically, if a SNP has negative red numbers, that SNP is predicted to have a damaging effect on the protein, but if your SNP has blue numbers, it is not predicted to be damaging, and the SNP would be harmless. In other words, the higher the number, the more likely the mutation is to not be damaging.
    1. Click on red or blue numbers (under "svm profile" or the "svm structure") to learn more about what these values mean.
  2. Now try to establish a sequence-structure-function relationship for your SNP. First, search for the GENE in OMIM by repeating steps 1–3 above.
  3. On the OMIM gene page for your gene of interest, under "External Links," click on "Protein" (immediately above "Gene Info," shown in Figures 5 and 6, above). Then click on "UniProt" from the dropdown menu. This will take you to the UniProtKB database, which has additional information on the protein your gene of interest encodes for, as shown in Figure 14, below.
Screenshot of the CFTR gene information page on the website uniprot.org

Screenshot of the results page for the CFTR gene on the website uniprot.org shows additional information on the CFTR gene. The function of the gene is located near the top of the page and a sidebar filled with links to additional tools and resources can be found on the left side of the page.


Figure 14. The UniProtKB database has additional information on a protein of interest.
  1. Click on the "Sequences" link on the left side of the UniProtKB webpage, as shown in Figure 15, above, highlighted in green. This will take you to the amino acid sequence for your protein of interest, as shown in Figure 15, below.
Screenshot of the amino acid sequence for the CFTR gene displayed on the website uniprot.org

Screenshot shows a link labeled 'Sequences' within the left side bar on the website snps3d.org. The sequences page has a table at the center of the page that displays the amino acid sequence of the CFTR gene. Information about the amino acid sequence can be found at the top-right of the page.


Figure 15. Clicking on "Sequences" at the top of the UniProtKB webpage for a protein of interest will take you to the amino acid sequence of that protein, as shown here.
  1. Click on "FASTA" at the top of the "Sequences" section. Select the FASTA protein sequence using your mouse, as shown in Figure 16, below, and copy it.
Screenshot of the FASTA sequence for the CFTR gene generated on the website uniprot.org
Figure 16. The FASTA form of the protein sequence is a convenient form to use when comparing the sequence to other sequences.
  1. Go to the SMART site as shown in Figure 17. Once there, paste the protein sequence into the "Protein sequence" box and click on "Sequence SMART."
Screenshot of the sequence search page on the website smart.embl-heidelberg.de

Screenshot of the sequence search page on the website smart.embl-heidelberg.de. A sequence analysis section on the left side of the page has two search boxes that can be used to search by a sequence ID or a protein sequence. Options for architecture analysis can be found on the right side of the webpage. Under architecture analysis there are three fields labeled domain selection, GO terms query and taxonomic selection.


Figure 17. The SMART website has information on the protein domains of different proteins.
  1. You should see a webpage that shows the domains of your protein of interest, as shown in Figure 18. Rolling your mouse over a domain will show you more information about the protein domain. Find the domain where your mutation is located. Hint: The name you wrote down in step 13, above, includes the sequence location of your SNP of interest. For example, if it said "Met1Val" then the SNP is in amino acid 1.
    1. For help searching the literature, read the guide to Resources for Finding and Accessing Scientific Papers. You may also find you need some help from someone experienced with genetics or bioinformatics to read and understand the papers.
Screenshot of the protein sequence results page on the website smart.embl-heidelberg.de

Screenshot of the protein sequence results page on the website smart.embl-heidelberg.de. A chart at the top of the page that shows the protein domains where the CFTR gene is found. The page includes a link to InterPro abstract in the table under AAA domain.


Figure 18. This SMART webpage shows the protein domains of the protein sequence that was searched.
  1. Click on the domain of interest and read the description of the domain and the InterPro abstract, as shown in Figure 18, above, circled in red. Assuming that the SNP results in a mutation in this domain, what could be the biochemical effects of this mutation? How might these effects relate to the disease?
  2. You can search PubMed for articles on effects of the mutation, as shown in Figure 19. Tip: Try searching for the name of the gene and the SNP, such as "CFTR, Met1Val."
Screenshot of the homepage on the website ncbi.nlm.nih.gov/pubmed

Screenshot of the PubMed home page at ncbi.nlm.nih.gov/pubmed. The page includes a search bar at the top of the page, and links to tools and resources at the bottom of the page.


Figure 19. PubMed is a searchable database of scientific publications.
icon scientific method

Ask an Expert

Do you have specific questions about your science project? Our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Variations

  • Variation 1: Environmental Factor - Gene Interaction:

    Identify how certain environmental factors may affect genes and their association to diseases by using the Genetic Association Database. NOTE: This database is open-access and allows any user to input data. Use caution while using the data and only select data that has been endorsed by 'Gene Expert' or 'Disease Expert'.

    1. Click on 'Environmental Factor Gene Interaction' link on the left menu of the website. On the top of the page, click on the link to see a complete list of environmental factors.
    2. Choose an environmental factor of interest (for e.g., tobacco smoke) by clicking on it.
    3. You can see entries that describe gene association with specific diseases.
    4. Are you able to identify any SNPs in this category? Follow links to research more for each category.

  • Variation 2: Multi-Species Association / Conserved SNPs:

    Using the databases referenced in this project, try to identify gene mutations that are common to multiple species. If a mutation is more frequent across multiple-species and if the mutation can be matched with its phenotype across species, it provides validity to your hypothesis. Highly conserved regions (across species) have an increased likelihood of being functionally important.

  • For similar Science Buddies science project ideas that use SNPs and genetics, check out Drugs & Genetics: Why Do Some People Respond to Drugs Differently than Others?, A Prescription for Success: Drugs & Your Genetics, and Trace Your Ancient Ancestry Through DNA.

Careers

If you like this project, you might enjoy exploring these related careers:

Career Profile
The human body can be viewed as a machine made up of complex processes. Scientists are working on figuring out how these processes work and on sequencing and correlating the sections of the genome that correspond to the individual processes. (The genome is an organism's complete set of genetic material.) In the course of doing so, they generate large amounts of data. So large, in fact, that to make sense of it, the data must be organized into databases and labeled. This is where bioinformatics… Read more
Career Profile
Many decisions regarding a person's health depend on knowing the patient's genetic risk of having a disease. Genetic counselors help assess those risks, explain them to patients, and counsel individuals and families about their options. Read more
Career Profile
Growing, aging, digesting—all of these are examples of chemical processes performed by living organisms. Biochemists study how these types of chemical actions happen in cells and tissues, and monitor what effects new substances, like food additives and medicines, have on living organisms. Read more
Career Profile
Physicians work to ease physical and mental suffering due to injury and disease. They diagnose medical conditions and then prescribe or administer appropriate treatments. Physicians also seek to prevent medical problems in their patients by advising preventative care. Ultimately, physicians try to help people live and feel better at every age. Read more

News Feed on This Topic

 
, ,

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Science Buddies Staff. "Bioinformatics - The Perfect Marriage of Computer Science & Medicine." Science Buddies, 4 Dec. 2021, https://www.sciencebuddies.org/science-fair-projects/project-ideas/Genom_p008/genetics-genomics/bioinformatics-computer-science-medicine?from=Home. Accessed 29 May 2022.

APA Style

Science Buddies Staff. (2021, December 4). Bioinformatics - The Perfect Marriage of Computer Science & Medicine. Retrieved from https://www.sciencebuddies.org/science-fair-projects/project-ideas/Genom_p008/genetics-genomics/bioinformatics-computer-science-medicine?from=Home


Last edit date: 2021-12-04
Top
Free science fair projects.