Areas of Science Genetics & Genomics
Medical Biotechnology
Big Data
Pandemics – COVID-19
Time Required Average (6-10 days)
Prerequisites Excellent computer skills. Basic understanding of immunology and protein sequences or willingness to learn about these topics.
Material Availability None required
Cost Very Low (under $20)
Safety No issues


Remember going to the doctor and getting vaccine shots? It is no fun getting poked with a needle, but fortunately, a vaccine gives you protection against a serious illness for years to come. But what about the flu vaccine? How come there is a new one every year? This science fair project will show you why.


Use free Internet-based computer tools to analyze and estimate the effectiveness of different flu vaccines.

Share your story with Science Buddies!

I did this project Yes, I Did This Project! Please log in (or create a free account) to let us know how things went.


Author: Kirindi V. Choi; Teisha Rowland, PhD, Science Buddies
Sponsor: The Molecular Sciences Institute, Berkeley, CA
Editor: Ken Hess, Science Buddies

Cite This Page

General citation information is provided here. Be sure to check the formatting, including capitalization, for the method you are using and update your citation, as needed.

MLA Style

Science Buddies Staff. "BLASTing Flu Viruses." Science Buddies, 20 Nov. 2020, Accessed 15 May 2021.

APA Style

Science Buddies Staff. (2020, November 20). BLASTing Flu Viruses. Retrieved from

Last edit date: 2020-11-20


Influenza, commonly known as the flu, is caused by a virus that attacks the upper respiratory tract (i.e., the nose, the throat and the lungs). Cold and dry weather allows the virus to survive longer outside the body than in warm weather. Therefore, in temperate regions like North America, when we are planning to enjoy Halloween, Thanksgiving, or Christmas, it is also the time when we or our family members have a higher chance of getting the flu.

There are three types of influenza virus: A, B and C. Type A can infect humans, other mammals and birds and can spread fast and affect many people. Types B and C affect only humans and type C causes only a mild infection. Influenza type A viruses are sub-typed into two categories based on proteins, specifically the proteins hemagglutinin and neuraminidase, on the surface of the virus. The virus uses the hemagglutinin protein (often abbreviated "H" or "HA") to latch on to the host's cell and uses the neuramidase protein (often abbreviated "N" or "NA") to spread the infection. Types A and B viruses continually evolve genetically, with changes being made to the amino acid sequence of the H and N proteins. Since hosts recognize the H and N surface proteins to identify and attack the virus, by changing these proteins a little bit the virus prevents the hosts from enjoying any prolonged protection against the virus.

When a person is vaccinated with the influenza vaccine, it should stimulate a protective immune response, particularly against the viral surface proteins in the viral strains used to make the specific vaccine. The influenza vaccine typically contains three virus strains, two are subtypes of type A and one is of type B. Type C is not included in the vaccine because it only causes a mild illness and does not lead to epidemics. To make the influenza vaccine, gene fragments that encode the H and N viral surface proteins are used from each strain. For the vaccine to give a person good protection against the virus, the protein sequences for the H and N proteins that are used in the vaccine should closely match the sequences in the strains the person may be exposed to. Every February, the World Health Organization (WHO), based on the analysis of various laboratories across the globe, will decide what influenza virus strains to include in the vaccine for the new year.

How can scientists check that the protein sequence of the H and N proteins used in the vaccine match the ones in the virus strains they want to protect people against? If you imagine that you can hold the H or N protein with both hands and stretch it out, you will then have a linear protein sequence in your hands. A protein sequence is made up of amino acids. Unlike the English alphabet, which has 26 letters, there are 20 standard amino acids that can be used to "spell" a protein. In English, it is easy to align two words and compare their spellings. Even so, there is often more than one possible alignment, as shown in Figures 1 and 2. In Figure 1, one possible alignment of the words "strawberry" and "blueberry" is shown, where the only matching letter, "r," is highlighted in red.

s t r a w b e r r y
b l u e b e r r y _
Figure 1. One possible alignment of the words "strawberry" and "blueberry," showing the matching single letter "r" in this alignment highlighted in red.

In Figure 2, another possible alignment of these words is shown, where several matching letters, spelling "berry,"

s t r a w b e r r y
_ b l u e b e r r y
Figure 2. A second possible alignment of the words "strawberry" and "blueberry," showing the matching letters "berry" in this alignment highlighted in red.

For the words "strawberry" and "blueberry," the alignment in Figure 2 clearly gives us a greater number of matched letters between these words. Similarly, you can take two protein sequences and compare if their spelling is alike; this is called sequence alignment in bioinformatics.

The alignment example is simple enough that we can do it manually. However, when we want to align two protein sequences, they can be over 100 letters long and consequently it is much more difficult and more time consuming to do it manually. Luckily, bioinformatics comes to the rescue. Bioinformatics is the collection and analysis of large amount of biological data using computers and computational/statistical methods.

A powerful Internet-based bioinformatics tool for aligning sequences is BLAST, which stands for Basic Local Alignment Search Tool. It aligns your query sequence of interest to a collection of sequences stored in the database, or to a specific second sequence you are interested in. It compares the results, telling you which sequences or segments are similar to your query sequence.

All else being equal, we would expect that a strong match between the protein sequences for the H and/or N proteins used in the vaccine virus and the corresponding sequences in the "wild" virus to result in good protection against that virus. On the other hand, a poor match would result in weak protection against the virus. But to create a strong match, the WHO would need to accurately predict which strains people should be vaccinated against for the upcoming flu season. Is the prediction always accurate? How often is there a good match, and how often does the prediction fail and the vaccine does not give good protection against the common strains of the season? In this genetics and genomics science project, you will use BLAST to measure the quality of the match and estimate the effectiveness of a vaccine against different viruses.

Terms and Concepts

  • Influenza (or flu)
  • Virus
  • Influenza virus type
  • Surface proteins
  • Vaccine
  • Virus strains
  • Epidemic
  • Protein sequence
  • Amino acids
  • Sequence alignment
  • Bioinformatics
  • Flu notation


  • How are flu viruses named?
  • BLAST can be used to align and compare both DNA (nucleotide) sequences and protein (amino acid) sequences. What are some reasons for using a protein alignment instead of a DNA alignment?
  • There are many H and N subtypes for influenza A. Why is it that in recent years the annual vaccine has only included influenza A subtypes H1N1 and H3N2? What is happening with the other subtypes? Under what conditions might they be included in the annual vaccine?
  • How does a vaccine help prevent the spread of a disease?


To do this science project you will need to use this database.
  • National Center for Biotechnology Information (NCBI). (August 27, 2012). GenBank Overview. NCBI GenBank, U.S. National Library of Medicine. Retrieved January 18, 2013.
This article from the Centers for Disease Control and Prevention describes how strains of influenza are selected for vaccines. This website has BLAST, as well as a BLAST tutorial. This website has sequence information as well as a BLAST tool. You can use the strain information from the CDC reports (from the CDC resource) to search this database. Table 1 in the Procedure was created using data from this database.

News Feed on This Topic

, ,

Materials and Equipment

  • Computer with an Internet connection
  • Lab notebook

Experimental Procedure

  1. First, study the Terms and Concepts in the Background tab. It is especially important that you research and understand flu notation.
  2. Next, pick an influenza season that you would like to investigate from Table 1. The seasons are listed in the far left column, under "Influenza Season."
    1. Note that each influenza season spans two years. This is from October of the first year to May of the second year.
      1. For example, if the influenza season is listed as "2002 - 2003" this means that the data is from October 2002 to May 2003.
    2. If you want to study more recent flu data for your science project (data that are not included in Table 1), go to the Flu Activity & Surveillance webpage at The U.S. Centers for Disease Control and Prevention (CDC) website.
      1. Click on the link for "Past Weekly Surveillance Reports" and choose the influenza season you want to investigate.
      2. You will want to investigate a season that has ended so that there is a complete influenza season summary available.
      3. Click on "Go!" next to the influenza season you want to investigate. Make sure that the middle column has the season summary selected, and not a weekly report.
      4. To find the most common influenza strains subtyped that season, read the section titled "Antigenic Characterization."
      5. To find information on the strains in that season's influenza vaccine, you will need to go back to the "Past Weekly Surveillance Reports" webpage and select the previous influenza season. For example, if you are investigating the 2012–2013 influenza season, to find information about the 2012–2013 vaccine you will need to look at the 2011–2012 influenza season summary and read the section titled "Composition of the 2012–2013 Influenza Vaccine."
    3. In your lab notebook, record the years of the influenza season that you chose to investigate.
Influenza Season Influenza Type Most Common Influenza Strains Subtyped That Season Strains in That Season's Influenza Vaccine
2000–2001 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
A (H3N2) A/Panama/2007/99 A/Moscow/10/99
B B/Beijing/184/93 B/Beijing/184/93
2001–2002 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
A (H3N2) A/Panama/2007/99 A/Moscow/10/99
B B/Yamagata/16/88 B/Sichuan/379/99
2002–2003 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
A (H3N2) A/Panama/2007/99 A/Moscow/10/99
B B/Hong Kong/330/01 B/Hong Kong/330/2001
2003–2004 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
A (H3N2) A/Fujian/411/2002 A/Moscow/10/99
B B/Hong Kong/330/01 B/Hong Kong/330/2001
2004–2005 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
A (H3N2) A/Wyoming/3/2003 A/Fujian/411/2002
B B/Yamagata/16/88 B/Shanghai/361/2002
2005–2006 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
A (H3N2) A/California/7/2004 A/California/7/2004
B B/Shanghai/361/2002 B/Shanghai/361/2002
2006–2007 A (H1N1) A/New Caledonia/20/99 A/New Caledonia/20/99
  A/Solomon Islands/3/2006  
A (H3N2) A/Wisconsin/67/2005 A/Wisconsin/67/2005
B B/Ohio/1/2005 B/Malaysia/2506/2004
2007–2008 A (H1N1) A/Solomon Islands/3/2006 A/Solomon Islands/3/2006
A (H3N2) A/Wisconsin/67/2005 A/Wisconsin/67/2005
B B/Yamagata/16/88 B/Malaysia/2506/2004
2008–2009 A (H1N1) A/Brisbane/59/2007 A/Brisbane/59/2007
A (H3N2) A/Brisbane/10/2007 A/Brisbane/10/2007
B B/Yamagata/16/88 B/Florida/4/2006
2009–2010 A (H1N1) A/Brisbane/59/2007 A/Brisbane/59/2007
A (H3N2) A/Brisbane/10/2007 A/Brisbane/10/2007
B B/Brisbane/60/2008 B/Brisbane/60/2008
2010–2011 A (H1N1) A/California/07/2009 A/California/07/2009
A (H3N2) A/Perth/16/2009 A/Perth/16/2009
B B/Brisbane/60/2008 B/Brisbane/60/2008
Table 1. This table lists the most commonly subtyped (characterized) strains of influenza from different influenza seasons, as well as the influenza strains used in the influenza vaccine for that season. If a common strain was different from any strains used in the vaccine for that year, the common strain's name has been bolded. The information used to generate this table was collected from the Flu Activity & Surveillance webpage at the CDC website.
  1. Take a moment to look at the different strains listed in Table 1 for the season you selected. Look at both the strains that were the most common ones subtyped that season as well as the strains that were in that season's vaccine.
    1. The influenza type of each specific virus strain is listed in the column labeled "Influenza Type," on the same row as the virus strain's name.
      1. For example, in the 2010–2011 season, the A/California/07/2009 strain is listed as a type A virus, specifically an H1N1 virus. The "H" and "N" refer to the type of hemagglutinin and neuraminidase surface proteins on the virus.
      2. You may notice that in some seasons there were multiple common strains of the same type (or subtype) of influenza virus. For example, in the 2009–2010 season, there were two common types of Type A (H1N1) strains, specifically A/Brisbane/59/2007 and A/California/07/2009.
    2. Note how the strains are written in flu notation. What do the notations tell you about the strains?
      1. For example, in the 2010–2011 season, the strain "A/California/07/2009" is listed. The notation means that this viral strain is a type A virus and it was the 7th influenza virus isolated in 2009 in California.
  2. In your lab notebook, record all of the data listed in Table 1 for your season of interest. To do this you may want to make a small data table similar to Table 1.
  3. Next you will compare the sequences for the strains that were common that season to the strains that were used in that season's vaccine. This will show you how good of a match the vaccine was to the strains that were prevalent. But before you do this, check the data table in your lab notebook to see if any of the common strains have the same name as the strains in the vaccine. If they are the same, you will not need to compare their sequences since they should be identical.
    1. To make this easier for you to spot, in Table 1, if a common strain was different than the ones used in the vaccine, the common strain's name has been bolded.
    2. For example, in Table 1, all of the common strains for the 2010–2011 influenza season are the same as the strains that were used in the vaccine that year (none of the strains are bolded).
    3. As another example, in the 2009–2011 influenza season, the A/Brisbane/59/2007 strain was both common and included in the vaccine. However, another Type A (H1N1) strain, A/California/07/2009, was also common but was not included in the vaccine that season. (The A/California/07/2009 strain has consequently been bolded there.)
    4. If, in your data table, any of the strains common that season were the same as the strains used in the vaccine, do not use that common strain in the next steps of the Procedure.
  4. Compare the sequences of one of the common virus strains to the same type of virus that was included in the vaccine that season. You will only be analyzing the sequences for the hemagglutinin and neuraminidase proteins. (If you are unsure of why this is, reread the Introduction in the Background tab.) Obtain the sequences for these strains from the NCBI GenBank website.
    1. On the top of the webpage next to the search bar, select "Protein" from the drop-down menu.
    2. Type in the name of one of the common influenza strains (e.g., A/California/07/2009) in the search box from your data table.
    3. Look for the full-length protein sequence of the hemagglutinin protein. When you find this result, click on it.
      1. For example, for the A/California/07/2009 strain, the desired result is listed as "hemagglutinin [Influenza A virus (A/California/07/2009(H1N1))]" and on the next line says "566 aa protein," meaning this protein contains 566 amino acids. Do not select a result that says "partial" in its title, as this is not the full protein.
      2. If there is no data on the hemagglutinin protein for this strain (the protein is not listed in the results), skip this strain. Start step 6 over using a different common strain from your data table.
    4. You should now be on a webpage for the hemagglutinin protein for the strain. It should look similar to this hemagglutinin example.
    5. Copy the accession number for the full-length hemagglutinin protein, listed at the top of the webpage. Record this number in your lab notebook. You may want to add to your existing data table so that it can easily include this information.
      1. For example, for the A/California/07/2009 strain's hemagglutinin protein, the accession number is AFM72832.1.
    6. Repeat steps 6a to 6e but this time in step 6b type in the name of the vaccine strain that is the same type of influenza as the common strain you just searched for.
      1. For example, if you are investigating the 2009–2010 influenza season and just searched for the A/California/07/2009 strain, you will now want to search for the A/Brisbane/59/2007 strain (the Type A, H1N1, strain used in the vaccine that season).
      2. Do not forget to record the accession number for hemagglutinin protein for this strain in your lab notebook.
    7. Click on the "Run BLAST" link (column on right side of webpage, at the top).
    8. Click the box next to "Align two or more sequences."
    9. In the query box at the top, enter the accession number for the hemagglutinin protein of either the common strain or the vaccine strain that is the same type of influenza. In the query box below that, enter the accession number for the other strain's hemagglutinin protein.
    10. The database should be set automatically to use the algorithm "blastp."
    11. Click the BLAST button at the bottom. You may need to wait a few seconds for your results to appear.
    12. Look at your BLAST results. How similar are the two sequences to each other? In your lab notebook, record the percentage that is identical between the two sequences.
  5. Repeat step 6 until you have compared the hemagglutinin protein of each common strain in your data table with the same type of influenza virus that was used in the vaccination for that season.
    1. If you want a more advanced challenge, print your BLAST results each time. When you do comparisons between the different strains, see if there is a region of the hemagglutinin protein that tends to be different from the strain used in the vaccination.
      1. Tip: Differences between BLASTed sequences show up on the line of text that is between the query and subject lines of text.
  6. Repeat steps 6 to 7 but this time use the neuraminidase protein sequences for the strains instead of the hemagglutinin protein sequences.
  7. Repeat steps 2–8 four more times using different influenza seasons (from Table 1) so that you have analyzed a total of five influenza seasons. Do you notice any trends?
    1. Overall, how similar are the hemagglutinin and neuraminidase protein sequences in the common influenza strains to the same type of influenza virus that was used in the vaccination for that season?
    2. Based on how similar the sequences are, how well do you think the vaccine protected a vaccinated person from the different strains in a given season?
    3. How well does it seem that an influenza vaccine from one year will protect a person against the common influenza strains one, two, three, or more years later?
    4. Do you notice any other trends in your data?

If you like this project, you might enjoy exploring these related careers:

Log in to add favorite
Career Profile
Do you like a good mystery? Well, an epidemiologist's job is all about solving mysteries—medical mysteries—but instead of figuring out "who done it" like a police detective would, they figure out "what caused it." They find relationships between a medical condition and things like human behavior, environmental toxins, genes, medical treatments, other diseases, and geographical location. For example, they ask questions like what causes multiple sclerosis? How can we prevent brain… Read more
Log in to add favorite
Career Profile
The human body can be viewed as a machine made up of complex processes. Scientists are working on figuring out how these processes work and on sequencing and correlating the sections of the genome that correspond to the individual processes. (The genome is an organism's complete set of genetic material.) In the course of doing so, they generate large amounts of data. So large, in fact, that to make sense of it, the data must be organized into databases and labeled. This is where bioinformatics… Read more
Log in to add favorite
Career Profile
Microorganisms (bacteria, viruses, algae, and fungi) are the most common life-forms on Earth. They help us digest nutrients; make foods like yogurt, bread, and olives; and create antibiotics. Some microbes also cause diseases. Microbiologists study the growth, structure, development, and general characteristics of microorganisms to promote health, industry, and a basic understanding of cellular functions. Read more
Log in to add favorite
Career Profile
Many aspects of peoples' daily lives can be summarized using data, from what is the most popular new video game to where people like to go for a summer vacation. Data scientists (sometimes called data analysts) are experts at organizing and analyzing large sets of data (often called "big data"). By doing this, data scientists make conclusions that help other people or companies. For example, data scientists could help a video game company make a more profitable video game based on players'… Read more


  • Pick an influenza season from Table 1 in the Procedure. If you could travel back in time and redesign the influenza vaccine for the year you pick, which influenza strains would you use for the vaccine? Do you think you could make the vaccine more effective than it was? Based on sequence alignment, if the choice of virus strains you suggest are not available, are there any alternative strains you can use that you think are similar enough to still make an effective vaccine?
  • When designing an influenza vaccine, is it important to make sure the vaccine targets certain types of influenza virus more than other types? To figure this out, you can look into how common the different types of influenza virus are each season. The following resource, which is listed in the Bibliography in the Background tab, contains this information. To find the information, follow step 2b in the Procedure. You will need to carefully read through the influenza summary webpage for the season you are interested in. Which types of influenza virus are most common? Does this change from season to season, or does it stay fairly constant?
    1. Centers for Disease Control and Prevention (CDC). (2009). Flu Activity & Surveillance. Retrieved January 18, 2013, from

Share your story with Science Buddies!

I did this project Yes, I Did This Project! Please log in (or create a free account) to let us know how things went.

Ask an Expert

The Ask an Expert Forum is intended to be a place where students can go to find answers to science questions that they have been unable to find using other resources. If you have specific questions about your science fair project or science fair, our team of volunteer scientists can help. Our Experts won't do the work for you, but they will make suggestions, offer guidance, and help you troubleshoot.

Ask an Expert

Related Links

News Feed on This Topic

, ,

Looking for more science fun?

Try one of our science activities for quick, anytime science explorations. The perfect thing to liven up a rainy day, school vacation, or moment of boredom.

Find an Activity
Free science fair projects.