Page 1 of 1

Computational Exploration of Protein Function

Posted: Mon Dec 01, 2014 7:30 pm
by deleted-247940
Hello,

I am doing the Computational Exploration of Protein Function project and I have got the amino acid sequence for a specific protein. However, when I search for the mRNA sequence, a proper sequence does not show up. Instead, titles like, transcript variant 1, transcript variant 2, DNA clone, etc. come up. Which one do I choose as the mRNA sequence? I have checked this with other proteins as well and it comes up the same way.
Also, do we only research proteins with unknown functions, known functions, or both?

Thank you!

Re: Computational Exploration of Protein Function

Posted: Mon Dec 01, 2014 8:14 pm
by SciB
Hi,

Sometimes when you get a protein or peptide sequence from NCBI, the entry will also have the cDNA sequence from which you can determine the mRNA, but if not then you can use this reverse translation tool: http://www.bioinformatics.org/sms2/rev_trans.html

Just follow the instructions and paste your amino acid sequence into the box and click Submit. You will get the most likely DNA sequence that corresponds to that amino acid sequence but it may not be exact because codon usage varies with the species from which the amino acid sequence came.

Once you have the cDNA, you can determine the corresponding mRNA sequence using a sequence editor: http://www.fr33.net/seqedit.php

If you type the name and species of the gene into the Nucleotide search box of the NCBI site (http://www.ncbi.nlm.nih.gov/pubmed) you should be able to get the cDNA sequence that way. Is that what you tried? Sometimes you do have to look through a number of hits before you find the right one which usually says "complete cDNA sequence".

Let us know how this works out for you and if you have more questions.

Good luck!

Sybee

Re: Computational Exploration of Protein Function

Posted: Sat Dec 06, 2014 6:57 pm
by deleted-247940
Hello,

I used the reverse translation tool, and I inputted my amino acid sequence of my protein and got a corresponding DNA sequence. I wasn't sure which DNA sequence to use, so I used the base sequence of most likely codons. I then used the sequence editor and used a specific button to turn the DNA sequence into and mRNA sequence. However, you had stated to use the cDNA to convert the sequence to convert it into mRNA sequence, but I used the DNA sequence instead. Would that be correct as well?

Thank you!

Re: Computational Exploration of Protein Function

Posted: Sun Dec 07, 2014 5:30 pm
by SciB
Hi,

Human genes and those of most eukaryotes contain variable numbers of noncoding sequences called introns (https://www.dartmouth.edu/~cbbc/courses ... Genes.html). The DNA sequences in a gene that code for the protein are in specific elements called exons. When a gene is transcribed by RNA polymerase, the introns are deleted and various combinations of exons may be spliced together to make the final mRNA. This process is called RNA editing and is a critical part of turning a gene sequence into a working protein.

So, what I am trying to tell you is that you cannot use a genomic DNA sequence to generate a corresponding RNA sequence unless the DNA sequence is an exon with no noncoding regions. cDNA is the reverse complement of the mRNA so can be used directly to generate RNA. When we clone DNA sequences that represent a protein or peptide we usually clone the cDNA. If your search returned the gene sequence try it again and include 'cdna' in the search terms. Are you looking at an entire protein or just a part of one?

Once you get what you think is the correct mRNA sequence for your protein you can test it by doing a search on the NCBI site and see if the correct protein comes up.

If you have more questions let us know.

Good luck!

Sybee

Re: Computational Exploration of Protein Function

Posted: Sun Dec 07, 2014 8:31 pm
by deleted-247940
Hi,

Thank you so much for that information! I tried to see whether my mRNA sequence for an entire protein was correct in the NCBI search, it would come up, but the sequences had the letters A,T,G , and C in them. From my previous knowledge, I know that mRNA had U's instead of T's. Are there any other ways to make sure my mRNA sequences are correct? I tried searching up some ways, and I did see that to convert a DNA sequence into an mRNA, the T's would convert into U's. It was also difficult for me to find cDNA. I did see things like mRNA, (cDNA clone for...), but I wasn't sure it was the proper sequence.

Thank you!

Re: Computational Exploration of Protein Function

Posted: Mon Dec 08, 2014 8:34 am
by SciB
You are correct--the RNA sequence should have U instead of T. I think the NCBI program does not insert the correct base because most of the actual manipulation, cloning, sequencing, etc is done with DNA and it is just understood that RNA polymerase will insert U opposite to A in the mRNA.

If you did a search using your derived mRNA sequence and got the correct protein then you have it. You don't need to do any more than that. There are many other interesting and useful programs that you can use to get information from an amino acid sequence and comparing two similar sequences can give some ideas about how protein structure is related to activity. If you have more questions about the protein analysis software let us know.

Sybee

Re: Computational Exploration of Protein Function

Posted: Mon Dec 08, 2014 8:31 pm
by deleted-247940
Hello!

Thank you very much. Just a quick question - how would I search up the entire mRNA sequence in the NCBI website? Would there be a part to search to see whether I have the correct protein? I have tried to look things up, such as include mRNA in the search of NCBI but other topics would show up.

Thank you.

Re: Computational Exploration of Protein Function

Posted: Tue Dec 09, 2014 4:48 pm
by SciB
Hi,

The mRNA sequence is not usually given in a protein search on NCBI. When you do a search for your protein do you see an entry that gives the cDNA that corresponds to it? If you can find the cDNA you can easily convert it to mRNA using one of the editing programs. When you get the sequence just change the T's to U's and it should be identical to the mRNA.

If you would tell me the name of your protein sequence i will search it and make sure you are using the right entry. I know what you mean about getting too much information when you do a search! The NCBI database sometimes seems to be cluttered up with so many partial sequences, clones, variants and duplicate entries that different researchers have submitted that it's almost impossible to find what you want.

Good luck!

Sybee

Re: Computational Exploration of Protein Function

Posted: Tue Dec 09, 2014 7:51 pm
by deleted-247940
Hello!

That's great. Sometimes, I get confused whether I am looking at the right cDNA sequence or if it even is cDNA! The words in the title, such as mRNA (cDNA clone...), make me wonder whether the sequence is cDNA. Would the sequence I would be looking at be cDNA? The protein I am doing is Neuroglobin, but I am doing a few more. Quick question - can the function of the protein we choose be known, or do they have functions that are not yet finalized?

Thank you so much!

Re: Computational Exploration of Protein Function

Posted: Fri Dec 12, 2014 7:55 pm
by SciB
Hi,

Sorry i didn't see your question sooner but it was way down in the list.

I know what you mean about the entries being confusing. If you do a protein search and a resulting hit has a nucleotide sequence it will be cDNA unless it says otherwise. When the GENE sequence is included it is always stated as the gene and usually the exons and introns are also given.

The amount of information that is known about a specific protein varies widely. You can tell a little about a protein's function by comparing its amino acid sequence to other proteins of known function. Proteins have domains that are separate amino acid parts that function in catalysis, substrate binding, phosphorylation, and other activities. When you do a comparison and you see that your protein has the same kinds of domain it usually means the functions are similar. Protein researchers would like to be able to take the amino acid sequence of a protein, run it through a program and determine how the protein folds up into its native structure in 3D, but we're still a long ways from that.

Play around with your neuroglobin amino acid sequence using the existing protein software and see what you get. I'm sure there is a lot known about neuroglobin's activity and you could see if the structural features that show up in the sequence analysis match up with neuroglobin's known functions.

Good luck!

Sybee

Re: Computational Exploration of Protein Function

Posted: Mon Dec 15, 2014 4:40 pm
by deleted-247940
Hello,

Thank you for that information.
When I aligned the mRNA sequences, the line number do not line up properly. For example, my query line 79 would line up with subject line 1. It was quite surprising - why did that happen, and is it okay for me to use that sequence comparison?

Thank you!

Re: Computational Exploration of Protein Function

Posted: Mon Dec 15, 2014 7:04 pm
by SciB
What did you align with what?

Does your neuroglobin mRNA sequence start with AUG? Was the alignment from position 79 to the end perfect?

Send me some more specific info so I can try to understand what is going on.

Sybee

Re: Computational Exploration of Protein Function

Posted: Mon Dec 15, 2014 8:22 pm
by deleted-247940
Hello,

I actually aligned the cDNA before it was converted into mRNA because NCBI would not translate cDNA's to mRNA's. Isn't cDNA's sequence similar to the mRNA sequence, just the T's become U's. I aligned the cDNA of neuroglobin homosapien with the cDNA of a Gorilla and a Northern white-cheeked gibbon. For some reason, there would the dashes on one of the organism's sequences and would cut out an entire part of the sequence. I will give you links to explain it better. When I compared the gorilla to the human, there were large amounts of dashes. I am assuming the alignments were perfect.

http://www.ebi.ac.uk/Tools/services/web ... nucleotide
(First sequence = human, second sequence = gorilla)

http://www.ebi.ac.uk/Tools/services/web ... nucleotide
(First sequence = human, second sequence = Northern white-cheeked gibbon )

Thank you so much!

Re: Computational Exploration of Protein Function

Posted: Sat Dec 20, 2014 6:58 pm
by SciB
Hi,

The cDNA is the reverse complement of the mRNA. In other words, it is a template that would produce the mRNA if it were transcribed by RNA polymerase. The 'c' in cDNA stands for complementary. When you do an alignment you use cDNAs not mRNAs.

The parts of the sequences where there are dashes may indicate different isoforms of neuroglobin. Other primates may have neuroglobin genes in which the exons are spliced differently from humans. Parts of the protein's amino acid sequence are the same but other parts are different. The function of the protein is usually the same, but there may be some differences in post-translational modifications like phosphorylation or in how the protein is secreted. The differences may be important or relatively minor. See if you can find some papers that describe the architecture of neuroglobin so that you know which domains in the protein are doing what.

You can make a really interesting story out of differences in amino acid sequences between humans and other animals. Our genome is very similar to that of other primates but physically and mentally we seem very different. The big question for the 21st century is to understand how the genes are turned on or off to produce a human instead of a gibbon.

Good luck!

Sybee