Hi Swimmy - I have been in contact with the team of scientists here at Science Buddies and the author of the project you have been working on. Based on the questions you've raised, we are editing the project some to make parts of it clearer. I have some answers to pass along to you from the project author. I hope these help. I'm copying your question in here along with his answer:
1. The page with the sequences of the vaccines does not work.
http://www.flu.lanl.gov/vaccine - OUT OF COMMISSION
Could I use sequences of vaccines from
http://www.biohealthbase.org/GSearch/va ... =influenza instead?
Answer: Yes. But it might be easier to use the NBCI site:
http://www.ncbi.nlm.nih.gov/Genbank/2. Where would the raw sequences for the actual flu be? How could I find them? Are they on the NCBI page?
Answer: See
http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html 3. Should I compare the Hemagluttinin protein of the vaccines and viruses or the Neuraminidase protein?
Answer: Use the hemagluttinin (HA) protein sequence. This is the protein that is assayed by the hemaglutination assay that is used to test how well the vaccine is working. And protein sequence is more informative that dna since dna accumulates silent mutations that complicate interpretation of the blast results.
4. Could I compare how similar all the vaccines are to swine flu? To find out if they are effective for the swine flu outbreak?
Answer: Yes. Go to the NCBI Flu page for the sequence: See
http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html 5. I also don't really get the results that appear after trying the experimental procedure with NCBI's BLAST. What percentage should be matched for the vaccine to be effective?
Answer: It is hard to say up front what percentage should be matched. I suggest that you formulate a new question that the BLAST output is able to address. I have not tried any of the projects below, so they may not be good ways to proceed, but they will give you an idea of the kind of project I have in mind. For example, you could ask
A) What part of the protein is most subject to change? (Might try to form a hypothesis as to why certain parts of the protein are more subject to change). Might want to focus on one variation to make the project manageable. For example, this part of the HA protein has a change from a D to an N.
Query 121 YASLRSLVASSGTLEFNDESFNWTGVTQNGT
YASLRSLVASSGTLEFN+ESFNWTGVTQNGT
Sbjct 121 YASLRSLVASSGTLEFNNESFNWTGVTQNGT
B) How are various strains distributed geographically?
C) Can you identify when (what year) a certain variant appeared and how it spread?
6. Should I just enter the DNA sequence of the vaccine, or Align two or more sequences? If I do just the DNA sequence of the vaccine, then it says "No significant similarities found" and there are no results.
Answer: Use protein sequence.
7. I also don't get the "database" comparing to flu sequences. Which one should I use if any?
Answer: Use the protein database
8. Also, what vaccine should I choose? I would pick a year for which there is good molecular data, and not too recent. Say 2004.
Answer: It is up to you!
I hope these help, Swimmy. If you have additional questions, please post here, and I'll follow up with the scientist. I'll also post here and let everyone know when the project is officially updated.
Amy Cowen
Science Buddies