Home Project Ideas Project Guide Ask An Expert Blog Careers Teachers Parents Students

NCBI Gene & SNP Tutorial

The National Center for Biotechnology Information (NCBI) Gene database (http://www.ncbi.nlm.nih.gov/gene) is an online resource to learn about gene sequences, gene alleles and mutations, genomes, and much more. It was created for the scientific community, but with a little effort and this guide, anyone with a basic understanding of genetics can learn to use it (see Table 3 below for a list of resources for brushing up on genetics). Below are instructions, tips, and advice on how to get started using this resource.

What can I use the NCBI Gene database for?

The NCBI Gene database has information on gene sequences, gene alleles and mutations, genomes, amino acid sequences for proteins, and much more genetic data on humans as well as many other animal species. You can explore many resources on the NCBI Gene database. In this tutorial, you will use the database to look up a gene of interest and learn what specific mutations in that gene may cause certain genetic diseases. The end of this tutorial covers additional resources and the NCBI's own tutorials for learning more about other NCBI Gene functions and tools.

How can I look up a gene and find out more information on it?

Here we will show you how to look up a gene of interest to learn more about it. For the purpose of simplifying the directions, we will use cystic fibrosis as the example in this tutorial.

  1. Go to the NCBI Gene database website, shown in Figure 1 below: http://www.ncbi.nlm.nih.gov/gene. (Note: This link will open a new window so you may more easily follow the steps.)
  2. At the top, enter the name of your gene of interest and click "Go."
    1. For example, the gene that is mutated in cystic fibrosis is CFTR. (Note: If you were interested in a disease but didn't know the related gene(s), you could look that up using another Science Buddies resource, the Genetics Home Reference Tutorial.) To look up this gene enter: CFTR
The NCBI Gene database has information on gene sequences, gene alleles and mutations, genomes, and much more genetic data on 
humans and other animal species.
Figure 1. The NCBI Gene database has information on gene sequences, gene alleles and mutations, genomes, and much more genetic data on humans and other animal species.
  1. The resulting page, shown in Figure 2 below, may have a long list of related results. The top results are usually the most relevant ones. You are looking for the first entry that both starts with your gene name and includes the species name for humans (Homo sapiens). In our CFTR example this is the first result; click on it to proceed to the gene page.
When you enter a gene name, you will get many results on the NCBI Gene database. The gene name is given at the top, followed by its symbol and unabbreviated name on the row below this. The species name is given in brackets on the far right of the second row. Additional gene information, including the chromosome location, appears on the rows below this. Pick the top gene result, circled in red above, for this tutorial.
Figure 2. When you enter a gene name, you will get many results on the NCBI Gene database. The gene name is given at the top, followed by its symbol and unabbreviated name on the row below this. The species name is given in brackets on the far right of the second row. Additional gene information, including the chromosome location, appears on the rows below this. Pick the top gene result, circled in red above, for this tutorial.
  1. The gene page, shown in Figure 3 below, contains a large amount of information on a given gene.
The NCBI gene database contains a large amount of information for any given gene. This tutorial explores the links in the sections titled 'Table of contents,' circled in yellow above, and 'Links,' circled in red above, both on the right side of the page.
Figure 3. The NCBI gene database contains a large amount of information for any given gene. This tutorial explores the links in the sections titled "Table of contents," circled in yellow above, and "Links," circled in red above, both on the right side of the page.

Use the table of contents, circled in yellow in Figure 3 above, to navigate to different information on the gene page. Table 1 below gives an overview of the different types of information provided.

Link Name What Information It Provides
Summary Summary of the gene name and its known functions.
Genomic context A graphical representation of where the gene is located on the chromosome.
Genomic regions, transcripts,
and products
A graphical representation of different areas of the gene, including where known mutations are located.
Bibliography Scientific articles related to this gene.
Phenotypes Diseases and conditions related to mutations in this gene.
Interactions Proteins known to interact with the protein made by this gene.
General gene info General information on the gene, including:
  • Other animals this gene belongs to (under "Homology")
  • Pathways that this gene is involved in (under "Pathways from BioSystems")
  • The different functions the protein made from this gene has (under "Gene Ontology")
General protein info Names of the protein made from this gene.
Reference sequences Links to where you can find the entire DNA sequence of this gene.
Related sequences Sequences closely related to this gene.
Additional links Links to more information on this gene and other genetic tools.
Table 1. On the right side of the NCBI Gene page for a given gene, there is a list of links in the "Table of Contents," circled in yellow in Figure 3 above. This table shows what information these links will provide.

Use the links section, circled in red in Figure 3 above, to navigate to additional NCBI pages with information on the gene and its role in human biology. Table 2 below highlights some of the links that are particularly relevant to learning more about the gene's normal and disease functions.

Link Name What Information It Provides
BioProjects Chromosome and sequencing studies that have involved the gene.
BioSystems Bodily functions the gene may be involved in.
Conserved Domains Functional domains, which are DNA regions that form distinct protein structures that affect the overall function of the protein. Functional domains are shared, or "conserved," among different members of the same gene family.
Full text in PMC Scientific articles, with free access to full text, published on the gene.
GEO Profiles How much protein is made from this gene in different tissues and in scientific studies, referred to as the gene's expression profile.
HomoloGene A list of potential homologs of the gene (evolutionarily related genes in different animals)
Nucleotide Links to where you can find the DNA sequence of the gene.
OMIM Information on the gene on the OMIM database. The links here discuss the history and discovery of the gene, its function, how the disease manifests, and more.
Protein Links to where you can find the amino acid sequence of the protein the gene codes for.
PubMed Scientific articles published on the gene. Note: Some articles cannot be freely accessed.
RefSeq Proteins Amino acid sequence of the protein the gene codes for and additional gene information.
RefSeq RNAs mRNA and amino acid sequences that the gene (DNA) codes for.
RefSeqGene The genomic DNA sequence of the gene (includes introns and exons) and other information on the gene.
SNP Links to where you can find short genetic variations of the gene.
SNP: GeneView A list of short genetic variations of the gene and the functional amino acid changes they cause.
SNP: Genotype A list of which short genetic variations of the gene are found in people based on ethnicity and other factors.
SNP: VarView A list of the short genetic variations of the gene with lots of information on the variations, including what the DNA mutations are and which variations are pathogenic.
Table 2. On the right side of the NCBI Gene page for a given gene, there is a list of links in the "Links" section, circled in red in Figure 3 above. This table shows what resources some of these links will provide.

I want to look up a gene involved in a genetic disease and find out how it is mutated in that disease. How can I do this?

Once you have completed the tutorial section "How can I look up a gene and find out more information on it?", here we will show how to find mutated versions of a gene that cause a genetic disease. For the purpose of simplifying the directions, we will use cystic fibrosis as the example in this tutorial.

  1. Once you have located the NCBI Gene page for your gene of interest (step 4 above), scroll down through the "Links" section on the right (circled in red in Figure 3 above) until you see the "SNP: VarView" link, circled in red in Figure 4 below. Click on this link.
Clicking on the 'SNP: VarView,' under the 'Links' section, will take you to additional information on the variations of this gene.
Figure 4. Scroll down through the "Links" section on the right side of your gene page until you see "SNP: VarView," circled in red above. Click on this link to learn about the different variations of this gene.
  1. A gene can have many different alleles, or alternative forms that occur through mutation of the DNA. Each row of data on this page, shown in Figure 5 below, lists a different allele for the gene you just searched for.
    1. Click on the tab "Clinical interpretation," circled in yellow in Figure 5 below, to sort the alleles. Here are the different clinical interpretations for alleles:
      1. "Probable-pathogenic:" Alleles that are thought to be likely to cause disease, but are not proven.
      2. "Pathogenic:" Alleles that have been proven to cause disease.
      3. Alleles for which the "clinical interpretation" column is blank. There is "no data" for these alleles. These still could be pathogenic.
    2. The other columns on this page can tell you other information on each allele, including what DNA and amino acids have been mutated. See Figure 5 below for details.
The SNP: VarView page lists different alleles of a given gene, where information for each allele takes up a different row.
Figure 5. Clicking on "SNP: VarView," circled in red in Figure 4, takes you to a table listing different alleles, or alternative forms that occur through mutation of the DNA, for your gene. Each row is a different allele of the gene. You can sort these alleles by their "Clinical interpretation," circled in yellow above, find their amino acid mutation listed under "Protein," circled in green above, or find more information on them by click on their "rs id," circled in red above.
  1. For each allele, click on its "rs id" link, circled in red in Figure 5 above, to go to a new page with information on that specific allele. This information is part of the SNP Database (http://www.ncbi.nlm.nih.gov/projects/SNP/).
    1. For each allele page, scroll down to the section titled "Gene View," shown in Figure 6 below.
    2. Look where "Residue change" is listed, circled in yellow in Figure 6 below, and there should be an amino acid mutation that matches the "Protein" information that was listed with this allele on the previous page, which is circled in green in Figure 5 above.
      1. For example, the CFTR allele listed in Figure 5 above had a protein mutation of "p.Arg347Pro." This means that the 347th amino acid in the protein has been changed from Arginine (abbreviated Arg or R) to Proline (abbreviated Pro or P). This matches the "Residue change," which is listed as "R [Arg] ' P [Pro]".
    3. What other interesting information is available on the gene?
On the SNP Database page for a given allele, there is a variety of information available, including the DNA and amino acid mutations the allele has.
Figure 6. The SNP Database gives information on the different alleles for a given gene, including the amino acid differences between alleles, under "Residue change," circled in yellow above.

Where can I find additional help on using the NCBI Gene database?

  1. To find basic genetic tutorials, go to the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene) and click on "How To" in the top left corner. The tutorials include topics like:
    1. "Genes & Expression"
    2. "Genetics & Medicine"
    3. "Genomes & Maps"
  2. To find tips on navigating the NCBI Gene database, go to the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene) and under "Using Gene," click on "Gene Quick Start."
    1. Explore the other links under "Using Gene" for additional tips, advice, and tools for finding data in the database.
  3. To follow a useful tutorial on using the NCBI Gene database, go to the main NCBI Gene page, (http://www.ncbi.nlm.nih.gov/gene) and under the section titled "Getting Started" at the bottom of the page, click on "Training & Tutorials."
    1. From here, click on "NCBI Education Page."
    2. Under "Documentation," click on "Fact Sheets."
    3. Here click on "Gene" for the NCBI Gene database tutorial.

I don't understand some of the terms or concepts used in the NCBI Gene database. Where can I look up more information?

  1. For a glossary of terms used on the NCBI databases, from the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene) go to the section titled "Getting Started" and click on "Training & Tutorials."
    1. Click on the "NCBI Glossary."
  2. For information about nomenclature, on the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene), look under "Using Gene" for a link to go to the "FAQ" page.
  3. To learn more about biology/genetics in general, see Table 3 below.
Resource Area Resource Name Website What You Will Learn
General Genetics Genetics Home Reference
(National Institutes of Health)
http://ghr.nlm.nih.gov/ Terms and concepts related to genetics and what genes cause different genetic conditions.
Human Genetics and Medical Research: A Revolution in Progress
(National Institutes of Health)
http://history.nih.gov/exhibits/genetics/index.htm General genetics concepts, including what genes are, information on the Humane Genome Project, and how gene therapy works. Includes a cartoon guide for kids.
Human Genome Project Information
(Oak Ridge National Laboratory)
http://www.ornl.gov/sci/techresources /Human_Genome/project/info.shtml How the Human Genome Project was done and what it can tell us about our genetics.
Learn.Genetics, Genetic Science Learning Center
(The University of Utah)
http://learn.genetics.utah.edu/ Terms and concepts related to genetics, including how DNA turns into protein and heredity. Includes an animated "tour" and a game to build a DNA molecule.
DNA from the Beginning
(Cold Spring Harbor Laboratory)
http://www.dnaftb.org/ Terms and concepts related to general genetics and information on historic genetics experiments.
Gene Screen app
(Cold Spring Harbor Laboratory: Dolan DNA Learning Center, Harlem DNA Lab & DNA Learning Center West)
http://www.dnalc.org/resources /gene_screen_app.html Interactive explanations of general genetics concepts, including inheritance. Interactive iPhone/iPod Touch app.
Genetics & Diseases Genes and Disease
(National Center for Biotechnology Information)
http://www.ncbi.nlm.nih.gov/books/NBK22183/ Genes and the genetic disorders and diseases that they cause.
Your Genes, Your Health
(Cold Spring Harbor Laboratory: Dolan DNA Learning Center)
http://www.ygyh.org/ Information on genetic diseases, including their incidence, testing, symptoms, causes, treatments, and more.
Gene Testing Understanding Cancer: Gene Testing
(National Cancer Institute at the National Institutes of Health)
http://www.cancer.gov/cancertopics /understandingcancer/genetesting/ What genes are and how to have gene testing done.
spacer
Table 3. There are many resources available online to help provide a basic understanding of genetics concepts and terms.