Home Store Project Ideas Project Guide Ask An Expert Blog Careers Teachers Parents Students
Share to Classroom

NCBI Gene & SNP Tutorial

The National Center for Biotechnology Information (NCBI) Gene database (http://www.ncbi.nlm.nih.gov/gene) is an online resource to learn about gene sequences, gene alleles and mutations, genomes, and much more. It was created for the scientific community, but with a little effort and this guide, anyone with a basic understanding of genetics can learn to use it (see Table 3 for a list of resources to brush up on genetics). Following are instructions, tips, and advice on how to get started using this resource.

What can I use the NCBI Gene database for?

The NCBI Gene database has information on gene sequences, gene alleles and mutations, genomes, amino acid sequences for proteins, and much more genetic data on humans, as well as many other animal species. You can explore many resources on the NCBI Gene database. In this tutorial, you will use the database to look up a gene of interest and learn what specific mutations in that gene may cause certain genetic diseases. The end of this tutorial covers additional resources and the NCBI's own tutorials for learning more about other NCBI Gene functions and tools.

How can I look up a gene and find out more information about it?

Here we will show you how to look up a gene of interest to learn more about it. For the purpose of simplifying the directions, we will use cystic fibrosis as the example in this tutorial.

  1. Go to the NCBI Gene database website, shown in Figure 1: http://www.ncbi.nlm.nih.gov/gene. (Note: This link will open a new window so you can more easily follow the steps.)
  2. At the top, enter the name of your gene of interest and click "Search."
    1. For example, the gene that is mutated in cystic fibrosis is CFTR. (Note: If you were interested in a disease, but did not know the related gene(s), you could look that up using another Science Buddies resource, the Genetics Home Reference Tutorial.) To look up this gene, enter: CFTR
The NCBI Gene database has information on gene sequences, gene alleles and mutations, genomes, and much more genetic data on 
humans and other animal species.
Figure 1. The NCBI Gene database has information on gene sequences, gene alleles and mutations, genomes, and much more genetic data on humans and other animal species.

  1. The resulting page, shown in Figure 2, may have a long list of related results. The top results are usually the most relevant ones. You are looking for the first entry that both starts with your gene name and includes the species name for humans (Homo sapiens). In our CFTR example, this is the first result; click on it to proceed to the gene page.
National Center for Biotechnology Information (NCBI) Gene database screenshot
Figure 2. When you enter a gene name, you will get many results on the NCBI Gene database. The gene name is given on the left, followed by its description (unabbreviated name) in the second column. The species name is given in brackets at the end of the description entry. Additional gene information, including the chromosome location, is given in the columns farther to the right. Pick the top gene result, (circled in red) for this tutorial.

  1. The gene page, shown in Figure 3, contains a large amount of information about a given gene.
National Center for Biotechnology Information (NCBI) Gene database screenshot
Figure 3. The NCBI gene database contains a large amount of information for any given gene. This tutorial explores the links in the sections titled "Table of contents" (circled in green), and "Related information" (circled in red), both on the right side of the page.

Use the table of contents (circled in green in Figure 3) to navigate to different information on the gene page. Table 1 gives an overview of the different types of information provided.

Link Name What Information It Provides
Summary Summary of the gene name and its known functions.
Genomic context A graphical representation of where the gene is located on the chromosome.
Genomic regions, transcripts,
and products
A graphical representation of different areas of the gene, including where known mutations are located.
Bibliography Scientific articles related to this gene.
Phenotypes Diseases and conditions related to mutations in this gene.
Variation Collection of known variants of this gene.
Pathways from BioSystems Metabolic pathways the gene is involved in.
Interactions Proteins known to interact with the protein made by this gene.
General gene info General information on the gene, including:
  • Other animals this gene belongs to (under "Homology")
  • Pathways that this gene is involved in (under "Pathways from BioSystems")
  • The different functions the protein made from this gene has (under "Gene Ontology")
General protein info Names of the protein made from this gene.
NCBI Reference sequences (RefSeq) Links to where you can find the entire DNA sequence of this gene.
Related sequences Sequences closely related to this gene.
Additional links Links to more information on this gene and other genetic tools.
Table 1. On the right side of the NCBI Gene page for a given gene, there is a list of links in the "Table of contents" (circled in green in Figure 3). This table shows what information these links will provide.

Use the "Related information" section (circled in red in Figure 3) to navigate to additional NCBI pages with information on the gene and its role in human biology. Table 2 highlights some of the links that are particularly relevant to learning more about the gene's normal and disease functions.

Link Name What Information It Provides
BioProjects Chromosome and sequencing studies that have involved the gene.
BioSystems Bodily functions the gene may be involved in.
Conserved Domains Functional domains, which are DNA regions that form distinct protein structures that affect the overall function of the protein. Functional domains are shared, or "conserved," among different members of the same gene family.
Full text in PMC Scientific articles, with free access to full text, published on the gene.
GEO Profiles How much protein is made from this gene in different tissues and in scientific studies, referred to as the gene's expression profile.
HomoloGene A list of potential homologs of the gene (evolutionarily related genes in different animals)
Nucleotide Links to where you can find the DNA sequence of the gene.
OMIM Information about the gene on the OMIM database. The links here discuss the history and discovery of the gene, its function, how the disease manifests, and more.
Protein Links to where you can find the amino acid sequence of the protein the gene codes for.
PubMed Scientific articles published on the gene. Note: Some articles cannot be freely accessed.
RefSeq Proteins Amino acid sequence of the protein the gene codes for and additional gene information.
RefSeq RNAs mRNA and amino acid sequences that the gene (DNA) codes for.
RefSeqGene The genomic DNA sequence of the gene (includes introns and exons) and other information about the gene.
SNP Links to where you can find short genetic variations of the gene.
SNP: GeneView A list of short genetic variations of the gene and the functional amino acid changes they cause.
Variation Viewer A list of the short genetic variations of the gene with a lot of information about the variations, including what the DNA mutations are and which variations are pathogenic.
Table 2. On the right side of the NCBI Gene page for a given gene, there is a list of links in the "Related information" section (circled in red in Figure 3). This table shows what resources some of these links will provide.

I want to look up a gene involved in a genetic disease and find out how it is mutated in that disease. How can I do this?

Once you have completed the tutorial section "How can I look up a gene and find out more information on it?", here we will show how to find mutated versions of a gene that cause a genetic disease. For the purpose of simplifying the directions, we will use cystic fibrosis as the example in this tutorial.

  1. Once you have located the NCBI Gene page for your gene of interest (step 4), scroll down through the "Related information" section on the right (circled in red in Figure 3) until you see the "Variation Viewer" link (circled in red in Figure 4). Click on this link.
National Center for Biotechnology Information (NCBI) Gene database screenshot
Figure 4. Scroll down through the "Related information" section on the right side of your gene page until you see "Variation Viewer" (circled in red). Click on this link to learn about the different variations of this gene.

  1. A gene can have many different alleles, or alternative forms that occur through mutation of the DNA. Each row of data on this page, shown in Figure 5, lists a different allele for the gene you just searched for.
    1. On the left side of the page you can choose different options to filter the data. Click on "Pathogenic" and "Likely pathogenic" (circled in blue in Figure 5) to sort the alleles according to these criteria. Here are the different clinical interpretations for alleles:
      1. "Likely pathogenic:" Alleles that are thought to be likely to cause disease, but are not proven.
      2. "Pathogenic:" Alleles that have been proven to cause disease.
      3. Alleles for which the "Clinical interpretation" column is blank. There is "no data" for these alleles. These still could be pathogenic.
    2. The other columns on this page can tell you other information about each allele, including variant type and and location of the gene mutation. See Figure 5 for details.
National Center for Biotechnology Information (NCBI) Gene database screenshot
Figure 5. Clicking on "Variation Viewer" (circled in red in Figure 4), takes you to a table listing different alleles, or alternative forms that occur through mutation of the DNA, for your gene. Each row is a different allele of the gene. You can filter these alleles by their "Most severe clinical significance" (circled in blue), sort by "Variant type" (circled in green), or find more information about them by clicking on their "Variant ID" (circled in red).

  1. Once you have applied all your filter criteria (variant type, clinical significance, etcetera), click on the arrow to the left of the variant ID (circled in yellow in Figure 5 and Figure 6) to open a drop-down window that provides more information on this specific gene variant. Here you will find more allele information, such as the "Transcript change," which lists what the DNA mutation is (circled in green in Figure 6) or the "Protein change" that result from the mutation (circled in red in Figure 6).
National Center for Biotechnology Information (NCBI) Gene database screenshot
Figure 6. Clicking on the small arrow (circled in yellow) to the left of the variant ID (circled in blue), pulls up more allele information, such as the “Transcript change” (circled in green) or “Protein change” (circled in red).

  1. For each selected allele, click on its "Variant ID" link (circled in blue in Figure 6), to go to a new page with information on that specific allele. This information is part of the SNP Database (http://www.ncbi.nlm.nih.gov/projects/SNP/).
    1. For each allele page, scroll down to the section titled "Gene View" shown in Figure 7.
    2. Look where "Residue change" is listed (circled in yellow in Figure 7), and there should be an amino acid mutation that matches the "Protein change" information that was listed with this allele on the previous page, which is circled in red in Figure 6.
      1. For example, the CFTR allele listed in Figure 6 had a protein mutation of "Met1Val" This means that the first amino acid in the protein has been changed from Methionine (abbreviated Met or M) to Valine (abbreviated Val or V). This matches the "Residue change," which is listed as "M [Met] ' V [Val]" at position "1".
    3. What other interesting information is available on the gene?
National Center for Biotechnology Information (NCBI) Gene database screenshot
Figure 7. The SNP Database gives information on the different alleles for a given gene, including the amino acid differences between alleles, under "Residue change," circled in yellow.

Where can I find additional help on using the NCBI Gene database?

  1. To find basic genetic tutorials, go to the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene) and click on "How To" in the top left corner. The tutorials include topics like:
    1. "Genes & Expression"
    2. "Genetics & Medicine"
    3. "Genomes & Maps"
  2. To find tips on navigating the NCBI Gene database, go to the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene) and under "Using Gene," click on "Gene Quick Start."
    1. Explore the other links under "Using Gene" for additional tips, advice, and tools for finding data in the database.
  3. To follow a useful tutorial on using the NCBI Gene database, go to the main NCBI Gene page, (http://www.ncbi.nlm.nih.gov/gene) and under the section titled "Getting Started" at the bottom of the page, click on "Training & Tutorials."
    1. From here, click on "Documentation".
    2. Find "Gene" in the list and click on "Factsheet" (on the right) for the NCBI Gene database tutorial.

I do not understand some of the terms or concepts used in the NCBI Gene database. Where can I look up more information?

  1. For a glossary of terms used on the NCBI databases, from the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene) go to the section titled "Getting Started" and click on "Training & Tutorials."
    1. On the very bottom of the page under NCBI, click on the "Resources list."
    2. Under "N" you will find the "NCBI Glossary," which includes a long list of terms.
  2. For information about nomenclature, on the main NCBI Gene page (http://www.ncbi.nlm.nih.gov/gene), look under "Using Gene" for a link to go to the "FAQ" page.
  3. To learn more about biology/genetics in general, see Table 3.
Resource Area Resource Name Website What You Will Learn
General Genetics Genetics Home Reference
(National Institutes of Health)
http://ghr.nlm.nih.gov/ Terms and concepts related to genetics and what genes cause different genetic conditions.
Human Genetics and Medical Research: A Revolution in Progress
(National Institutes of Health)
http://history.nih.gov/exhibits/genetics/index.htm General genetics concepts, including what genes are, information on the Humane Genome Project, and how gene therapy works. Includes a cartoon guide for kids.
Human Genome Project Information
(Oak Ridge National Laboratory)
http://www.ornl.gov/sci/techresources /Human_Genome/project/info.shtml How the Human Genome Project was done and what it can tell us about our genetics.
Learn.Genetics, Genetic Science Learning Center
(The University of Utah)
http://learn.genetics.utah.edu/ Terms and concepts related to genetics, including how DNA turns into protein and heredity. Includes an animated "tour" and a game to build a DNA molecule.
DNA from the Beginning
(Cold Spring Harbor Laboratory)
http://www.dnaftb.org/ Terms and concepts related to general genetics and information on historic genetics experiments.
Gene Screen app
(Cold Spring Harbor Laboratory: Dolan DNA Learning Center, Harlem DNA Lab & DNA Learning Center West)
http://www.dnalc.org/resources /gene_screen_app.html Interactive explanations of general genetics concepts, including inheritance. Interactive iPhone/iPod Touch app.
Genetics & Diseases Genes and Disease
(National Center for Biotechnology Information)
http://www.ncbi.nlm.nih.gov/books/NBK22183/ Genes and the genetic disorders and diseases that they cause.
Your Genes, Your Health
(Cold Spring Harbor Laboratory: Dolan DNA Learning Center)
http://www.ygyh.org/ Information on genetic diseases, including their incidence, testing, symptoms, causes, treatments, and more.
Gene Testing The Genetics of Cancer
(National Cancer Institute at the National Institutes of Health)
http://www.cancer.gov/about-cancer/causes-prevention /genetics/genetic-testing-fact-sheet#q1 What genes are and how to have gene testing done.
spacer
Table 3. There are many resources available online to help provide a basic understanding of genetics concepts and terms.