Question about bio informatics
Moderators: AmyCowen, kgudger, MadelineB, Moderators
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
Sorry for the late post!
->Answers for your questions
1. Do you have your hypothesis yet?
My first hypothesis was finding common parts between genes from two groups and look for unknown genes which also have common parts.
And find the function of these genes if it is possible. I hoped to find more genes which has an ability to regenerate heart so I could apply them to human later.
2. Remind me again how you got your two sets of genes -
Genes from zebrafish are from a graduate student. I contacted a professor who researches about zebrafish and she connected me to that graduate student.
He did some kind of a research and find out that all 15 genes were found from retina,fin, and heart regeneration. (Zebrafish can regenerate three parts: a fin, a retina, and a heart)
And list of genes related to heart disease was found through medical journals and websites. I searched through Google. I know it is not good to search through Google especially when I am researching because I can get inaccurate data, but it was the only way I could find.
3. Do you think there will be motifs that are present in both sets of genes but it is something that is not exclusive to heart function? how would you control for this?
I am sure that there will be some motifs which are not related to heart function at all. I think I should check if other known heart regeneration genes(except my list of 15 genes) also have same motifs what I found. And if there are websites like NCBI where I can search motifs exists, I will need to find functions of motifs from that wbsite.
I am not sure I clearly answered your questions!
Best,
Ryan
Sorry for the late post!
->Answers for your questions
1. Do you have your hypothesis yet?
My first hypothesis was finding common parts between genes from two groups and look for unknown genes which also have common parts.
And find the function of these genes if it is possible. I hoped to find more genes which has an ability to regenerate heart so I could apply them to human later.
2. Remind me again how you got your two sets of genes -
Genes from zebrafish are from a graduate student. I contacted a professor who researches about zebrafish and she connected me to that graduate student.
He did some kind of a research and find out that all 15 genes were found from retina,fin, and heart regeneration. (Zebrafish can regenerate three parts: a fin, a retina, and a heart)
And list of genes related to heart disease was found through medical journals and websites. I searched through Google. I know it is not good to search through Google especially when I am researching because I can get inaccurate data, but it was the only way I could find.
3. Do you think there will be motifs that are present in both sets of genes but it is something that is not exclusive to heart function? how would you control for this?
I am sure that there will be some motifs which are not related to heart function at all. I think I should check if other known heart regeneration genes(except my list of 15 genes) also have same motifs what I found. And if there are websites like NCBI where I can search motifs exists, I will need to find functions of motifs from that wbsite.
I am not sure I clearly answered your questions!
Best,
Ryan
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hi
Are you out of town, Caroline?
I haven't hear from you for long time so I was just wondering.
Best,
Ryan
Are you out of town, Caroline?
I haven't hear from you for long time so I was just wondering.
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
Sorry its taken so long. I've been busy with work and then needed to really think about your project before answering.
I think we need to work at refining your hypothesis and that should make it easier to narrow down what you have to do. You don't have a clear hypothesis right now. Thats ok - it is common with bioinformatics to have a hypothesis generating portion of the work and then move on to the hypothesis testing.
So we use your sets of genes to try and find a motif, we call this a "training set". Then we set the hypothesis - that the motif is responsible for shared function among heart genes - then test the hypothesis on other genes and maybe also find some new ones. But if the training set is not reliable it can be hard to get a good hypothesis.
You said your zebrafish gene set came from an experiment by a grad student. You need to find out what kind of experiment that was - was it a gene expression experiment (also called a microarray)?
I am a little worried about the reliability of your human gene set. I think it would be better to either use a set of human genes from a microarray experiment on heart cells (don't worry you don't need to do the experiment, we would search the GEO microarray experiment database to find data from one) or to use a list of genes that were curated for their role in heart disease. Curated means that a scientist has read papers about the genes and diseases and listed them together in a database - this is actually what I do for a job. I work for a database that curates data on human genes, drugs and diseases but mostly looking at how genes and drugs interact. Its called the Pharmacogenomics Knowledge Base (PharmGKB). We have lists of genes associated with heart disease on this page.
http://www.pharmgkb.org/do/serve?objId= ... bview=tab2
But there is a database called HuGEnet that is more disease focused that might have a more comprehensive list of genes associated with heart disease. They have a search called gene prospector that should give a list of genes associated with heart disease. It also ranks them about how much literature and evidence there is.
http://hugenavigator.net/HuGENavigator/ ... artPage.do
I think that would make a better place to get the human list than google.
My last question was trying to get you to think about how you might need a negative control. Genes that are not associated with heart regeneration or heart disease to show your motif is specific for heart. I was thinking how the motif called the TATA box - which is a signal for the polymerase to bind DNA and start transcription, might show up as a motif in your genes because it is present in most genes.
Hope this all makes sense,
Caroline
Sorry its taken so long. I've been busy with work and then needed to really think about your project before answering.
I think we need to work at refining your hypothesis and that should make it easier to narrow down what you have to do. You don't have a clear hypothesis right now. Thats ok - it is common with bioinformatics to have a hypothesis generating portion of the work and then move on to the hypothesis testing.
So we use your sets of genes to try and find a motif, we call this a "training set". Then we set the hypothesis - that the motif is responsible for shared function among heart genes - then test the hypothesis on other genes and maybe also find some new ones. But if the training set is not reliable it can be hard to get a good hypothesis.
You said your zebrafish gene set came from an experiment by a grad student. You need to find out what kind of experiment that was - was it a gene expression experiment (also called a microarray)?
I am a little worried about the reliability of your human gene set. I think it would be better to either use a set of human genes from a microarray experiment on heart cells (don't worry you don't need to do the experiment, we would search the GEO microarray experiment database to find data from one) or to use a list of genes that were curated for their role in heart disease. Curated means that a scientist has read papers about the genes and diseases and listed them together in a database - this is actually what I do for a job. I work for a database that curates data on human genes, drugs and diseases but mostly looking at how genes and drugs interact. Its called the Pharmacogenomics Knowledge Base (PharmGKB). We have lists of genes associated with heart disease on this page.
http://www.pharmgkb.org/do/serve?objId= ... bview=tab2
But there is a database called HuGEnet that is more disease focused that might have a more comprehensive list of genes associated with heart disease. They have a search called gene prospector that should give a list of genes associated with heart disease. It also ranks them about how much literature and evidence there is.
http://hugenavigator.net/HuGENavigator/ ... artPage.do
I think that would make a better place to get the human list than google.
My last question was trying to get you to think about how you might need a negative control. Genes that are not associated with heart regeneration or heart disease to show your motif is specific for heart. I was thinking how the motif called the TATA box - which is a signal for the polymerase to bind DNA and start transcription, might show up as a motif in your genes because it is present in most genes.
Hope this all makes sense,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
If I made you rush to write this post while you are busy, I’m so sorry.
I just wanted make sure what was going on.
As you said, I’m not clear with my hypothesis.. How about I gradually build up my hypothesis as I compare genes?
And zebrafish genes are from a microarray experiment. This is what a grad student said. “I have done quite a bit of bioinformatics work comparing microarray datasets from regenerating tissues in zebrafish (heart, fin, retina) which lead to a very short list of common genes differentially expressed in all three tissues after damage.”
About the human gene set… I have total 14 human genes which are related to heart disease. In HuGEnet, I typed “coronary artery disease”, and I could find 13 genes out of 14 in the list. They are related to coronary artery disease but the thing is that some of them are ranked really low in the list. For example, GATA2 is ranked as 239 in the list. Do you think I should rather use genes which are ranked high than ranked low genes? I realized that I first set a scope of heart diseases too broad. I think it may be good idea to focus on one heart disease such as coronary artery disease.
I get what you are saying by a negative control. So I guess it will be a good idea to have two controls. Right? I won’t any problems with comparing genes by using FASTA. But the real problem is setting hypothesis up. As I said earlier, my first hypothesis was finding common parts(in this case, motifs) between heart regeneration genes and heart disease genes and find other genes which have same motifs and have abilities to cure heart diseases. How do you think about this? I think this hypothesis will have a problem since I can’t actually perform experiments to check whether the genes what I found do have abilities to cure heart diseases.
Best,
Ryan
I just wanted make sure what was going on.
As you said, I’m not clear with my hypothesis.. How about I gradually build up my hypothesis as I compare genes?
And zebrafish genes are from a microarray experiment. This is what a grad student said. “I have done quite a bit of bioinformatics work comparing microarray datasets from regenerating tissues in zebrafish (heart, fin, retina) which lead to a very short list of common genes differentially expressed in all three tissues after damage.”
About the human gene set… I have total 14 human genes which are related to heart disease. In HuGEnet, I typed “coronary artery disease”, and I could find 13 genes out of 14 in the list. They are related to coronary artery disease but the thing is that some of them are ranked really low in the list. For example, GATA2 is ranked as 239 in the list. Do you think I should rather use genes which are ranked high than ranked low genes? I realized that I first set a scope of heart diseases too broad. I think it may be good idea to focus on one heart disease such as coronary artery disease.
I get what you are saying by a negative control. So I guess it will be a good idea to have two controls. Right? I won’t any problems with comparing genes by using FASTA. But the real problem is setting hypothesis up. As I said earlier, my first hypothesis was finding common parts(in this case, motifs) between heart regeneration genes and heart disease genes and find other genes which have same motifs and have abilities to cure heart diseases. How do you think about this? I think this hypothesis will have a problem since I can’t actually perform experiments to check whether the genes what I found do have abilities to cure heart diseases.
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
I am happy to help. I juggle working as a curator with being a mom and so can't be a full time mentor for you but try and get to your questions! This is a really good project and I want to make sure I am setting you off on the right path at the beginning that why so much talk on the best approach. Experimental design is really important - its what makes the difference between and ok project and a good one. Its so much better to make sure you have thought about all the controls and possible sources of error before you start than realise it part way through.
So it sounds like your zebrafish set of genes is a good quality set. It is key that you remember though that the student said "heart, fin, retina" so it is not completely specific for heart.
I agree that narrowing your disease specification may be helpful. I am not a clinician but my instinct is to choose a heart disease that is more about failure of the heart tissue to maintain itself rather than damage and plaque build up in the heart vessels - so I think "heart failure" may be a better choice than "coronary artery disease". I think picking the top ranked genes will give you a better chance of success than the lower ranked ones.
Lastly lets talk about what is feasible to discover with this project. Genes cannot "cure" disease. But knowing about the genes involved in disease and repair can provide information that is used for the design of new drugs to treat disease. Knowing the genes involved may help to identify gene variants that may promote disease so we can identify people at risk for the disease or who may respond to treatment. What you can hope to do here is identify a new candidate gene or give supporting evidence for a candidate gene that is lower ranked.
Do you have someone at school helping guide you in this project as well? Do you have a timeline for the science fair, when things are due?
best wishes,
Caroline
I am happy to help. I juggle working as a curator with being a mom and so can't be a full time mentor for you but try and get to your questions! This is a really good project and I want to make sure I am setting you off on the right path at the beginning that why so much talk on the best approach. Experimental design is really important - its what makes the difference between and ok project and a good one. Its so much better to make sure you have thought about all the controls and possible sources of error before you start than realise it part way through.
So it sounds like your zebrafish set of genes is a good quality set. It is key that you remember though that the student said "heart, fin, retina" so it is not completely specific for heart.
I agree that narrowing your disease specification may be helpful. I am not a clinician but my instinct is to choose a heart disease that is more about failure of the heart tissue to maintain itself rather than damage and plaque build up in the heart vessels - so I think "heart failure" may be a better choice than "coronary artery disease". I think picking the top ranked genes will give you a better chance of success than the lower ranked ones.
Lastly lets talk about what is feasible to discover with this project. Genes cannot "cure" disease. But knowing about the genes involved in disease and repair can provide information that is used for the design of new drugs to treat disease. Knowing the genes involved may help to identify gene variants that may promote disease so we can identify people at risk for the disease or who may respond to treatment. What you can hope to do here is identify a new candidate gene or give supporting evidence for a candidate gene that is lower ranked.
Do you have someone at school helping guide you in this project as well? Do you have a timeline for the science fair, when things are due?
best wishes,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
You must be really busy doing all the things! I totally understand even if you can't post quickly. I will just wait you to answer.
You said those genes are not completely specific for heart. Do you mean that is it better to find genes which are only specific for heart? I thought it would be better to have genes which are all common in retina, heart,and fin. Anyway, I was reading a research paper (http://www.ncbi.nlm.nih.gov/pmc/article ... ool=pubmed) and I think I may be able to get new regeneration gene set.
Next, about a human gene set.. I understnad what you are saying. I will just get rid of my current list and replace them with genes related to heart failure from rank 1 to rank 15.
I think it is a good idea to identify a new candidate gene. I know that I first need to find motifs to find a new gene. After I find motifs, which process should I go through?
I don't have another person helping me at school. I didn't start this as science project but as independent studty at my school. So there is a science teacher who checks my weekly works but he doesn't know anything about bio-informatics. Additionally, I am just doing this research for my own sake. Of course, I am willing to join the science fair if this project ends earlier than the science fair registration. However, my first goal is to finish this project well.
Best,
Ryan
You must be really busy doing all the things! I totally understand even if you can't post quickly. I will just wait you to answer.
You said those genes are not completely specific for heart. Do you mean that is it better to find genes which are only specific for heart? I thought it would be better to have genes which are all common in retina, heart,and fin. Anyway, I was reading a research paper (http://www.ncbi.nlm.nih.gov/pmc/article ... ool=pubmed) and I think I may be able to get new regeneration gene set.
Next, about a human gene set.. I understnad what you are saying. I will just get rid of my current list and replace them with genes related to heart failure from rank 1 to rank 15.
I think it is a good idea to identify a new candidate gene. I know that I first need to find motifs to find a new gene. After I find motifs, which process should I go through?
I don't have another person helping me at school. I didn't start this as science project but as independent studty at my school. So there is a science teacher who checks my weekly works but he doesn't know anything about bio-informatics. Additionally, I am just doing this research for my own sake. Of course, I am willing to join the science fair if this project ends earlier than the science fair registration. However, my first goal is to finish this project well.
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Thanks for your answers Ryan. It's good to know what you are aiming for and that you don't have any pressing deadlines. It's great that you are doing this from our own initiative and interest. One key thing I forgot to ask was what grade are you in?
About the zebrafish list: i went back and looked at the list and think it reads that they are all changed in heart as well as in fin and retina so that should be fine. For some reason i thought it was a mix and maybe not all of them were changed in heart. The paper attached looks like it could be a good list also. I would stick with your original list but keep this paper in case it comes in handy for testing against later. Although you mentioned before that you were having trouble finding genomic sequences for some of your genes on the list - did you have any further luck with that or are there several missing sequences still?
Have a good week,
Caroline
About the zebrafish list: i went back and looked at the list and think it reads that they are all changed in heart as well as in fin and retina so that should be fine. For some reason i thought it was a mix and maybe not all of them were changed in heart. The paper attached looks like it could be a good list also. I would stick with your original list but keep this paper in case it comes in handy for testing against later. Although you mentioned before that you were having trouble finding genomic sequences for some of your genes on the list - did you have any further luck with that or are there several missing sequences still?
Have a good week,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello,
I am in senior year now. Actually, when I first started this project as my independent study in senior year, I was trying to finish this by end of December so I could send this as my supplement to universities which I applied. However, as I worked on this project, I realized that it would be hard to happen. So I just decided to forget about the "deadline".
I tried to upload a zipfile which contains the list of a zebrafish gene set but it seems like it is not working. How do I upload the file in the post?
Yes, I have problems with finding sequences of some zebrafish heart regeneration genes such as wu:fc60b09, wu:fc59b06, wu:fb58g10, LOC100001907. 4 out of 15 genes don't have genomic sequences or data have been removed so I don't think I can use them in my project.
I started comparing new gene sets and I saved all the results. But I am not sure how to interpret those results. In what way, do you think I can interpret the results the best?
Best,
Ryan
I am in senior year now. Actually, when I first started this project as my independent study in senior year, I was trying to finish this by end of December so I could send this as my supplement to universities which I applied. However, as I worked on this project, I realized that it would be hard to happen. So I just decided to forget about the "deadline".
I tried to upload a zipfile which contains the list of a zebrafish gene set but it seems like it is not working. How do I upload the file in the post?
Yes, I have problems with finding sequences of some zebrafish heart regeneration genes such as wu:fc60b09, wu:fc59b06, wu:fb58g10, LOC100001907. 4 out of 15 genes don't have genomic sequences or data have been removed so I don't think I can use them in my project.
I started comparing new gene sets and I saved all the results. But I am not sure how to interpret those results. In what way, do you think I can interpret the results the best?
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
We need to figure out a way to test if the regions highlighted as similar/identical in your comparisons are the same regions that keep coming up or all different.
Here is a possible way : make a spreadsheet that gives the range of bases in the gene. Are you good with Excel?
Pick one zebrafish gene and start a sheet, list the human gene compared and then the range of bases from the zebrafish gene that matched. If you list the range in two cells you should be able to plot a graph that shows lines for where your matches are and see if they localize to a particular part of the gene.
let me now how it goes,
Caroline
Here is a possible way : make a spreadsheet that gives the range of bases in the gene. Are you good with Excel?
Pick one zebrafish gene and start a sheet, list the human gene compared and then the range of bases from the zebrafish gene that matched. If you list the range in two cells you should be able to plot a graph that shows lines for where your matches are and see if they localize to a particular part of the gene.
let me now how it goes,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
I want to ask you what you mean by the range of bases.
46020 46010 46000 45990 45980 45970 45960 45950
ref|N- AGGAAGCAGAGGCTGCAGTGAGCCGAGATGGCACCATTGCACTCCAGCCTGGACATCCCAGCAAGACTCAATCTCAAAAA
:::::: : : : ::: ::: : : : ::: ::
gnl|AS TCACAAATGTTTGCACACTGATGTATGAAAAGCTGTAATGACTCCAACTGGCAGATCACAG-TCGCCAGTACTGCAACAA
900 910 920 930 940 950 960 970
In this comparison, would the range of bases be 45987~45959?
And I did not quite what you mean by making a spread sheet. I know how to use excel and what spread sheet is but I'm not sure I understand well.
Here it what I think.
---------------------------------------Anxa5
ACE--------------------------Range 45987~45959
ADRB2-----------------------Range xxxxx~xxxxx
(ACE, ADRB2 are human genes and ANXA5 is a zebrafish gene. And please ignore the dashes. Just view them as blanks.)
Are you thinking this kind of excel chart? If you don't, can you explain further?
Thank you
Ryan
I want to ask you what you mean by the range of bases.
46020 46010 46000 45990 45980 45970 45960 45950
ref|N- AGGAAGCAGAGGCTGCAGTGAGCCGAGATGGCACCATTGCACTCCAGCCTGGACATCCCAGCAAGACTCAATCTCAAAAA
:::::: : : : ::: ::: : : : ::: ::
gnl|AS TCACAAATGTTTGCACACTGATGTATGAAAAGCTGTAATGACTCCAACTGGCAGATCACAG-TCGCCAGTACTGCAACAA
900 910 920 930 940 950 960 970
In this comparison, would the range of bases be 45987~45959?
And I did not quite what you mean by making a spread sheet. I know how to use excel and what spread sheet is but I'm not sure I understand well.
Here it what I think.
---------------------------------------Anxa5
ACE--------------------------Range 45987~45959
ADRB2-----------------------Range xxxxx~xxxxx
(ACE, ADRB2 are human genes and ANXA5 is a zebrafish gene. And please ignore the dashes. Just view them as blanks.)
Are you thinking this kind of excel chart? If you don't, can you explain further?
Thank you
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
I have tried to do a mock up of what i was thinking. See attached word file. It is comparisons for some made up genes. Then see how i logged them in the excel - but i couldn't upload both files so just pasted the table into the word file. Hope this gives good illustration of what i meant.
I couldn't get excel to draw the ind of graph i wanted yet though! I will have to try and work on that. I thought it could do a simple line that showed start to finish.
Best wishes,
-Caroline
I have tried to do a mock up of what i was thinking. See attached word file. It is comparisons for some made up genes. Then see how i logged them in the excel - but i couldn't upload both files so just pasted the table into the word file. Hope this gives good illustration of what i meant.
I couldn't get excel to draw the ind of graph i wanted yet though! I will have to try and work on that. I thought it could do a simple line that showed start to finish.
Best wishes,
-Caroline
- Attachments
-
[The extension doc has been deactivated and can no longer be displayed.]
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
Now I get what you are saying. I will make an example spreadsheet and try to upload here so you can see.
When I was choosing genomic sequences I'm bit confused.
For example, I was looking for genomic sequences of ACE and it has two different types of genomic sequence.
1) RefSeqs maintained independently of Annotated Genomes
Genomic
NG_011648.1 RefSeqGene
2) RefSeqs of Annotated Genomes: Build 37.1
Genome Reference Consortium Human Build 37 (GRCh37), Primary_Assembly
Genomic
NC_000017.10
I always used genomic sequences like #2. Did I choose right? or doesn't it even matter?
-------------------------------------
And if I have multiple similar regions, I need to include all of them like an example below here, right?
For example,
---------------------------------ABCDE
ACD-------------------------10000-10009
ACD-------------------------10293-10307
Best,
Ryan
Now I get what you are saying. I will make an example spreadsheet and try to upload here so you can see.
When I was choosing genomic sequences I'm bit confused.
For example, I was looking for genomic sequences of ACE and it has two different types of genomic sequence.
1) RefSeqs maintained independently of Annotated Genomes
Genomic
NG_011648.1 RefSeqGene
2) RefSeqs of Annotated Genomes: Build 37.1
Genome Reference Consortium Human Build 37 (GRCh37), Primary_Assembly
Genomic
NC_000017.10
I always used genomic sequences like #2. Did I choose right? or doesn't it even matter?
-------------------------------------
And if I have multiple similar regions, I need to include all of them like an example below here, right?
For example,
---------------------------------ABCDE
ACD-------------------------10000-10009
ACD-------------------------10293-10307
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
i think you did the right thing choosing the sequence, it shouldn't matter but better to be consistent if possible.
Yes, include all the regions of similarity from each gene.
Looking good,
Caroline
Yes, include all the regions of similarity from each gene.
Looking good,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
I will attach my questions in Microsoft Word file.
Best,
Ryan
I will attach my questions in Microsoft Word file.
Best,
Ryan
- Attachments
-
[The extension docx has been deactivated and can no longer be displayed.]
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
When you pasted these into the word file it changed the formatting a little so the colons that indicate identity between the sequences shifted and the numbers don't line up as well. There is a trick that will help with this. Change the font formatting to one where the letters are all the same width - i usually use "courier" and size 8 so they all fit ok, then view the file at 150%. Then you will see more clearly where the identity starts and stops and line up the colons. Getting the numbers to line up again looks more difficult. I counted from the beginning of the line.
You file attached with it marked in red what i did on first example.
Hope this helps,
Caroline
You file attached with it marked in red what i did on first example.
Hope this helps,
Caroline
- Attachments
-
[The extension docx has been deactivated and can no longer be displayed.]
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
I got your file but I am still confused.
Let's say there is a pair like this.
---------------2950----2960
ABC----------ATGC---ACGC
When I read base 'A' which is under 2955, is it gonna be 2960 or 2987?
When I read the number, in which direction should I read from?
I mean if I read from left, A under 2960 will be 2960 and if I read from right, A under 2960 will be 2987.
I am confused in which direction I should read from.
-----------------------
And here is a question for what you did in file "questionplusanswer".
In the first question, you said number of bases is 2890 to 2951. However, shouldn't it be 2891~2947?
Best,
Ryan
I got your file but I am still confused.
Let's say there is a pair like this.
---------------2950----2960
ABC----------ATGC---ACGC
When I read base 'A' which is under 2955, is it gonna be 2960 or 2987?
When I read the number, in which direction should I read from?
I mean if I read from left, A under 2960 will be 2960 and if I read from right, A under 2960 will be 2987.
I am confused in which direction I should read from.
-----------------------
And here is a question for what you did in file "questionplusanswer".
In the first question, you said number of bases is 2890 to 2951. However, shouldn't it be 2891~2947?
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
The numbers get moved with the formatting too. It can be hard to get them to line up again. Either note the results as you get them or if its too late for that measure from the start of the sequence shown. The number at the beginning is the first base shown so you can highlight the bases up to (but not including) the first base of identical sequence and use "word count" (under the tools tab) to see how many characters (bases). Then add.
So for example 1, the first base was 2850 + 40 = 2890.
and in the second row 2930+21 = 2951 so span was 2890-2951.
Is a little bit more tricky when the numbering is backwards like example 2 because you have to adjust. You might want to count down a few to check yourself.
280 - 31 -1 = 248
280 - 67 = 223
so span was 223-248
Hope this helps,
Caroline
The numbers get moved with the formatting too. It can be hard to get them to line up again. Either note the results as you get them or if its too late for that measure from the start of the sequence shown. The number at the beginning is the first base shown so you can highlight the bases up to (but not including) the first base of identical sequence and use "word count" (under the tools tab) to see how many characters (bases). Then add.
So for example 1, the first base was 2850 + 40 = 2890.
and in the second row 2930+21 = 2951 so span was 2890-2951.
Is a little bit more tricky when the numbering is backwards like example 2 because you have to adjust. You might want to count down a few to check yourself.
280 - 31 -1 = 248
280 - 67 = 223
so span was 223-248
Hope this helps,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
I just finished comparing anxa5 (zebrafish gene) with other fifteen human genes.
However, I am not sure what kind of graph I should make.
There are so many graphs in EXCEL so if you can tell me what kind of graph is suitable, it will be great.
Best,
Ryan
I just finished comparing anxa5 (zebrafish gene) with other fifteen human genes.
However, I am not sure what kind of graph I should make.
There are so many graphs in EXCEL so if you can tell me what kind of graph is suitable, it will be great.
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
Good job with getting all your alignments for anxa5!
I'm having trouble with all the graph styles too - attached is quick sketch of what i am looking for. Human genes on the y axis and bases numbering for anxa5 on the x axis. A line for each human gene that shows the span from start to stop. The order of the human genes down the y axis doesn't matter. But hopefully when you show them all if there are a bunch that span the same part of anxa5 it should show, and that's your putative motif region. It may be quicker to do it the old fashioned way with graph paper and a pen!
best wishes,
Caroline
Good job with getting all your alignments for anxa5!
I'm having trouble with all the graph styles too - attached is quick sketch of what i am looking for. Human genes on the y axis and bases numbering for anxa5 on the x axis. A line for each human gene that shows the span from start to stop. The order of the human genes down the y axis doesn't matter. But hopefully when you show them all if there are a bunch that span the same part of anxa5 it should show, and that's your putative motif region. It may be quicker to do it the old fashioned way with graph paper and a pen!
best wishes,
Caroline
- Attachments
-
[The extension ppt has been deactivated and can no longer be displayed.]
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
I finished drawing the graph of anxa5 with other 15 human genes. (I tried to attach the graph but somehow it doesn't work....)
On the graph, I can see that # of bps around 3000 and 6500 may have putative motif because those are regions where dots are populated.
Do I need to find the exact range?
When I get the range of putative motif, do you think it is good to compare with other zebrafish genes?
Best,
Ryan
I finished drawing the graph of anxa5 with other 15 human genes. (I tried to attach the graph but somehow it doesn't work....)
On the graph, I can see that # of bps around 3000 and 6500 may have putative motif because those are regions where dots are populated.
Do I need to find the exact range?
When I get the range of putative motif, do you think it is good to compare with other zebrafish genes?
Best,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
Good job. 3000-6500 is a bit too broad a range to be a motif. I would expect a motif to be approx 100bp or less. Or did you mean there is a cluster near 3000 and one near 6500 and both might be candidate motifs? If you are going to use those regions to test for function then you would need to get a good definition of them.
The first thing i would do is to look at the annotations on anxa5 on the zfin database and see if those base regions have any function annotated.
Let me know if you see anything.
best,
-Caroline
Good job. 3000-6500 is a bit too broad a range to be a motif. I would expect a motif to be approx 100bp or less. Or did you mean there is a cluster near 3000 and one near 6500 and both might be candidate motifs? If you are going to use those regions to test for function then you would need to get a good definition of them.
The first thing i would do is to look at the annotations on anxa5 on the zfin database and see if those base regions have any function annotated.
Let me know if you see anything.
best,
-Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
I meant that there are two separate regions which seem like they are putative motif.
However, I just realized that region around 3000bps is only legitimate one.
As you know, when I compare two genes, it shows the similar regions such as (2882-2943:8872-8930).
And I just realized that those two regions are basically same region but different in numbering.
In other words, If I number the common region based on anxa5, the region will be 2882-2943. If I number the common region, it will be 8872-8930.
So they are basically same region. So I will just use the 2882-2943 region since it is numbered according to anxa5.
Here are my questions.
1) I tried to find the definite region of putative motif. However, results varied between 2900~3100 a lot so I was confused how I should find the definite region.
For example... I got the data...
Start End
2902 3052
2899 2999
3027 3077
2988 3267
I definitely know that they are around 3000. However, when I find the common region, should I only find the region which is common to all?
2) I went to ZFIN and I think this website ( http://zfin.org/cgi-perl/gbrowse/current/) is the one I use to find the function of region.
However, I am not sure how I should type in the box. As an example, the website shows the result of "16:15087000..15149000".
What do 16 and other numbers mean?
If I try to find the region 2902--3052, should I type as x:2902..3052?
Thank you
Ryan Kim
I meant that there are two separate regions which seem like they are putative motif.
However, I just realized that region around 3000bps is only legitimate one.
As you know, when I compare two genes, it shows the similar regions such as (2882-2943:8872-8930).
And I just realized that those two regions are basically same region but different in numbering.
In other words, If I number the common region based on anxa5, the region will be 2882-2943. If I number the common region, it will be 8872-8930.
So they are basically same region. So I will just use the 2882-2943 region since it is numbered according to anxa5.
Here are my questions.
1) I tried to find the definite region of putative motif. However, results varied between 2900~3100 a lot so I was confused how I should find the definite region.
For example... I got the data...
Start End
2902 3052
2899 2999
3027 3077
2988 3267
I definitely know that they are around 3000. However, when I find the common region, should I only find the region which is common to all?
2) I went to ZFIN and I think this website ( http://zfin.org/cgi-perl/gbrowse/current/) is the one I use to find the function of region.
However, I am not sure how I should type in the box. As an example, the website shows the result of "16:15087000..15149000".
What do 16 and other numbers mean?
If I try to find the region 2902--3052, should I type as x:2902..3052?
Thank you
Ryan Kim
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
Good job with getting your target area outlined. I think at this stage I would choose the largest target area to be the putative motif so from your numbers below that would be 2899-3267.
The Gbrowse section you found at zfin is for looking at how zfin genes are arranged on the chromosomes. When you see the numbering 16:15087000..15149000 that means on chromosome 16 bases 15087000 to 15149000 (yes, chromosomes are that big that there are several million bases on each!)
I was hoping on the zfin page about anxa5 they would have information about motifs.
http://zfin.org/cgi-bin/webdriver?MIval ... -080220-29
The link above shows that the gene has a known molecular function of calcium binding but doesn't say which part of the gene or protein is responsible.
What we need to find is a website that has more fine details about the sequence within the anxa5 gene. I think the best thing would be to go to the site where you got your anxa5 genomic sequence and get it in Genbank format instead of FASTA. There should be notes about the sequence with the numbers of where the features are. Then we can see if 2899-3267 is part of the coding sequence that makes protein (shorthand on the Genbank format for this is CDS) or if it is part of an intron, the sequence between the protein coding parts.
Best of luck,
Caroline
Good job with getting your target area outlined. I think at this stage I would choose the largest target area to be the putative motif so from your numbers below that would be 2899-3267.
The Gbrowse section you found at zfin is for looking at how zfin genes are arranged on the chromosomes. When you see the numbering 16:15087000..15149000 that means on chromosome 16 bases 15087000 to 15149000 (yes, chromosomes are that big that there are several million bases on each!)
I was hoping on the zfin page about anxa5 they would have information about motifs.
http://zfin.org/cgi-bin/webdriver?MIval ... -080220-29
The link above shows that the gene has a known molecular function of calcium binding but doesn't say which part of the gene or protein is responsible.
What we need to find is a website that has more fine details about the sequence within the anxa5 gene. I think the best thing would be to go to the site where you got your anxa5 genomic sequence and get it in Genbank format instead of FASTA. There should be notes about the sequence with the numbers of where the features are. Then we can see if 2899-3267 is part of the coding sequence that makes protein (shorthand on the Genbank format for this is CDS) or if it is part of an intron, the sequence between the protein coding parts.
Best of luck,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
Before I go further, I have questions to ask.
1) Do you remember that I had a list of 15 genes related to zebrafish?
In that list, it has a website which it directly leads me to genebank website
So I was checking FASTA sequence of anxa5. <-- (http://www.ncbi.nlm.nih.gov/nuccore/284 ... rt=GenBank)
And I figured that out it had a sequence of mRNA.
My question is that anxa5b, what I found, is in gene sequence and anxa5, which is in the list of 15 genes, has mRNA sequence. So they have totally different sequence.
If I use a gene sequence instead of mRNA sequence which is listed, will it matter?
2) I was checking out all lists and I found out that they were all in mRNA sequence...
Do you think I should start all things with mRNA sequence or do you think I can just research with gene sequences what I have been using?
3) So I was looking for 2899-3267 region in Zfin and it showed nothing. ( I typed 1:2899..3267 since it is in the first chromosome)
Does this mean that there is nothing going on that region even though my research says that is it the common region appeared on all?
4) I was comparing "si:ch211-142e24.2" with CNTN5 and it shows no data at all in FASTA program. Because of my curiosity I cut CNTN5 sequence in half and run FASTA again and then it showed the result. Does this mean that if sequence too long it won't show the result because some errors occur?
I mean CNTN5 sequence is 360 pages long.... so I thought that was possible reason.
Please give me your thoughts!!
Thank you!
Ryan Kim
Before I go further, I have questions to ask.
1) Do you remember that I had a list of 15 genes related to zebrafish?
In that list, it has a website which it directly leads me to genebank website
So I was checking FASTA sequence of anxa5. <-- (http://www.ncbi.nlm.nih.gov/nuccore/284 ... rt=GenBank)
And I figured that out it had a sequence of mRNA.
My question is that anxa5b, what I found, is in gene sequence and anxa5, which is in the list of 15 genes, has mRNA sequence. So they have totally different sequence.
If I use a gene sequence instead of mRNA sequence which is listed, will it matter?
2) I was checking out all lists and I found out that they were all in mRNA sequence...
Do you think I should start all things with mRNA sequence or do you think I can just research with gene sequences what I have been using?
3) So I was looking for 2899-3267 region in Zfin and it showed nothing. ( I typed 1:2899..3267 since it is in the first chromosome)
Does this mean that there is nothing going on that region even though my research says that is it the common region appeared on all?
4) I was comparing "si:ch211-142e24.2" with CNTN5 and it shows no data at all in FASTA program. Because of my curiosity I cut CNTN5 sequence in half and run FASTA again and then it showed the result. Does this mean that if sequence too long it won't show the result because some errors occur?
I mean CNTN5 sequence is 360 pages long.... so I thought that was possible reason.
Please give me your thoughts!!
Thank you!
Ryan Kim
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
1.
Let's recap on some terminology.
There are different parts to a gene - these include the parts that make the protein and the parts that control when the protein is made. The whole sequence of the gene on the chromosome is called the genomic sequence. When the gene undergoes transcription, the first step in the process of making protein, a copy is made of the part of gene that codes for the protein, the part of the gene is called the coding sequence and the copy is called the mRNA. So when looking for sequences for a particular gene you usually find the whole thing - genomic - as well as the mRNA, just the part that goes on to make protein.
We talked towards the start of this project about using protein sequences or DNA sequences to look for motifs and I suggested we do DNA so we might find motifs that had to do with protein structure or function but also to do with the gene regulation. This is why we used the whole genomic sequence of the gene not just the mRNA part that makes protein.
SO yes it makes a difference using mRNA sequence vs genomic sequence. But the sequences are not completely different - the mRNA sequence is a subset of the genomic sequence. If aligned against each other the mRNA would match up completely where the exons are on the gene with some spaces in between.
2. I think its fine that the list you had discussed the mRNA sequences and that we used the corresponding genomic sequences.
3. Just because you didn't find anything annotated to that region doesn't mean it doesn't do something. Some genes are very well annotated - a person has listed the important regions into a database that we can find. Other genes, people have reported functions for different regions but that research is in the text of papers and not been entered in a database. It just means we need to search some more.
4. There probably is a size limit for FASTA. I havent looked but can go and try to. That was a smart idea to try cutting it in half to get it to work.
Best wishes,
-Caroline
1.
Let's recap on some terminology.
There are different parts to a gene - these include the parts that make the protein and the parts that control when the protein is made. The whole sequence of the gene on the chromosome is called the genomic sequence. When the gene undergoes transcription, the first step in the process of making protein, a copy is made of the part of gene that codes for the protein, the part of the gene is called the coding sequence and the copy is called the mRNA. So when looking for sequences for a particular gene you usually find the whole thing - genomic - as well as the mRNA, just the part that goes on to make protein.
We talked towards the start of this project about using protein sequences or DNA sequences to look for motifs and I suggested we do DNA so we might find motifs that had to do with protein structure or function but also to do with the gene regulation. This is why we used the whole genomic sequence of the gene not just the mRNA part that makes protein.
SO yes it makes a difference using mRNA sequence vs genomic sequence. But the sequences are not completely different - the mRNA sequence is a subset of the genomic sequence. If aligned against each other the mRNA would match up completely where the exons are on the gene with some spaces in between.
2. I think its fine that the list you had discussed the mRNA sequences and that we used the corresponding genomic sequences.
3. Just because you didn't find anything annotated to that region doesn't mean it doesn't do something. Some genes are very well annotated - a person has listed the important regions into a database that we can find. Other genes, people have reported functions for different regions but that research is in the text of papers and not been entered in a database. It just means we need to search some more.
4. There probably is a size limit for FASTA. I havent looked but can go and try to. That was a smart idea to try cutting it in half to get it to work.
Best wishes,
-Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello
AHH~! I just had first snow haha. outside looks very beautiful right now haha..
Anyway, here are some more questions!
1. I made a graph which marked the gene regions. I want you to see it but this websites do not allow me to upload any files in jpg or in other extensions...
Do you know how to upload that kind of files in here?
2. I found a website (http://uswest.ensembl.org/Danio_rerio/Info/Index)
I think this website is almost same as Zfin but I not sure which one is more useful.
According to my research, two putative motifs in ANXA5 gene are 2881-3533 and 6031-6336.
They both are on chromosome 1.
I searched them in previous website and I got...
2881-3533 ( http://uswest.ensembl.org/Danio_rerio/L ... :2881-3533 )
6031-6336 ( http://uswest.ensembl.org/Danio_rerio/L ... :6031-6336 )
I can see some greenish spot in 6031-6336 region so I searched it. What does this region mean?
and do you think this website is better than zfin?
3. I want to reinstate the purpose of my research again. After I found putative motifs, I was gonna look for similarities between them and look for new genes which have them.
How do you think about this? I want to more research about unknown putative motifs...but I guess there is no way to do this without performing an actual experiment in lab.
I just thought this time is good to reinstate my purpose again!
4. In the precious post, you talked about sources of papers which are not yet put into zebrafish database websites. What do you mean by those papers? You mean papers which are in Pubmed? and how do I search about 2882-3533 region in those papers..? I tried to find the region in pubmed but I could not find any.
Thank you,
Ryan
AHH~! I just had first snow haha. outside looks very beautiful right now haha..
Anyway, here are some more questions!
1. I made a graph which marked the gene regions. I want you to see it but this websites do not allow me to upload any files in jpg or in other extensions...
Do you know how to upload that kind of files in here?
2. I found a website (http://uswest.ensembl.org/Danio_rerio/Info/Index)
I think this website is almost same as Zfin but I not sure which one is more useful.
According to my research, two putative motifs in ANXA5 gene are 2881-3533 and 6031-6336.
They both are on chromosome 1.
I searched them in previous website and I got...
2881-3533 ( http://uswest.ensembl.org/Danio_rerio/L ... :2881-3533 )
6031-6336 ( http://uswest.ensembl.org/Danio_rerio/L ... :6031-6336 )
I can see some greenish spot in 6031-6336 region so I searched it. What does this region mean?
and do you think this website is better than zfin?
3. I want to reinstate the purpose of my research again. After I found putative motifs, I was gonna look for similarities between them and look for new genes which have them.
How do you think about this? I want to more research about unknown putative motifs...but I guess there is no way to do this without performing an actual experiment in lab.
I just thought this time is good to reinstate my purpose again!
4. In the precious post, you talked about sources of papers which are not yet put into zebrafish database websites. What do you mean by those papers? You mean papers which are in Pubmed? and how do I search about 2882-3533 region in those papers..? I tried to find the region in pubmed but I could not find any.
Thank you,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
Sorry its been a while.
1. Can you import the jpg as a picture into a word file or powerpoint and attach that? otherwise i will ask Amy the board administrator.
2. That does look like another useful website but we have got some wires crossed with the numbering here- the numbering you have for you putative motif is within ANXA5. When you search those numbers on the database it searches the whole of chromosome 1. So what you see on the browser then is not specific to ANXA5.
I went back to the gene page for ANXA5 and instead of looking at the FASTA format of the genomic sequence looked at the Genbank format. This format lists the locations of features in the sequence - exons, coding sequence and occasionally motifs. I didn't see anything annotated for your regions with putative motifs. So I think we need to try searching the sequence of the putative motif regions against some databases. Maybe a simple blast is the best starting point to see what else it pulls out. I would be useful to figure out whether your regions are in the protein coding part of the gene or the regulatory parts of the gene also. What is the accession number of the genomic sequence so i know exactly what your numbering is based on?
3. Yes, the idea to to find a possible role for your putative motif - and yes, its tough to do that without being able to do lab experiments but there are some "in silico" experiments that can be done. (in silico means by computer). For example if your putative motif is in a regulatory region and you think its specific for expression in heart cells maybe we can search existing microarray data in databases to look at genes the motif is present in and their pattern of expression.
4. Its tough to find something so specific in Pubmed. There are very few papers on anxa5 in zebrafish and no one seems to have done a detailed analysis of the gene's sequence with respect to function. As i said before about the numbering, those numbers are specific to your genomic sequence for anxa5 so searching with those globally is not going to turn up something.
hope this helps,
Caroline
Sorry its been a while.
1. Can you import the jpg as a picture into a word file or powerpoint and attach that? otherwise i will ask Amy the board administrator.
2. That does look like another useful website but we have got some wires crossed with the numbering here- the numbering you have for you putative motif is within ANXA5. When you search those numbers on the database it searches the whole of chromosome 1. So what you see on the browser then is not specific to ANXA5.
I went back to the gene page for ANXA5 and instead of looking at the FASTA format of the genomic sequence looked at the Genbank format. This format lists the locations of features in the sequence - exons, coding sequence and occasionally motifs. I didn't see anything annotated for your regions with putative motifs. So I think we need to try searching the sequence of the putative motif regions against some databases. Maybe a simple blast is the best starting point to see what else it pulls out. I would be useful to figure out whether your regions are in the protein coding part of the gene or the regulatory parts of the gene also. What is the accession number of the genomic sequence so i know exactly what your numbering is based on?
3. Yes, the idea to to find a possible role for your putative motif - and yes, its tough to do that without being able to do lab experiments but there are some "in silico" experiments that can be done. (in silico means by computer). For example if your putative motif is in a regulatory region and you think its specific for expression in heart cells maybe we can search existing microarray data in databases to look at genes the motif is present in and their pattern of expression.
4. Its tough to find something so specific in Pubmed. There are very few papers on anxa5 in zebrafish and no one seems to have done a detailed analysis of the gene's sequence with respect to function. As i said before about the numbering, those numbers are specific to your genomic sequence for anxa5 so searching with those globally is not going to turn up something.
hope this helps,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
First of all, let me ask you things which I did not understand from your response.
1. You said, "I would be useful to figure out whether your regions are in the protein coding part of the gene or the regulatory parts of the gene also. What is the accession number of the genomic sequence so i know exactly what your numbering is based on?"
And how do I find out whether my regions are in the proten coding part or in the regulatory parts?
I did a quick search about accession number and it says it is specific number which indicates gene sequence. So if I get common regions such as "2300~2500" from ANXA5, will it be accession number?
2. When I do the blast, do I need to do it to specific genes or to whole gene sequences of zebrafish?
3. It seems like FASTA program hasn't been working at least more than a week. Does it also not work for you?
4. I am trying to write an abstract for my research so I can submit it to my colleges. Will you be able to just look over my abstract once it is done?
Thank you,
Ryan
1. You said, "I would be useful to figure out whether your regions are in the protein coding part of the gene or the regulatory parts of the gene also. What is the accession number of the genomic sequence so i know exactly what your numbering is based on?"
And how do I find out whether my regions are in the proten coding part or in the regulatory parts?
I did a quick search about accession number and it says it is specific number which indicates gene sequence. So if I get common regions such as "2300~2500" from ANXA5, will it be accession number?
2. When I do the blast, do I need to do it to specific genes or to whole gene sequences of zebrafish?
3. It seems like FASTA program hasn't been working at least more than a week. Does it also not work for you?
4. I am trying to write an abstract for my research so I can submit it to my colleges. Will you be able to just look over my abstract once it is done?
Thank you,
Ryan
-
carolinethorn
- Former Expert
- Posts: 393
- Joined: Tue Sep 20, 2005 2:40 pm
Re: Question about bio informatics
Hi Ryan,
I am just on my way to catch a flight to england so can't answer everything now.
I will try and post later in the week.
I am happy to look at your abstract for college applications - when is it due? and where are you applying? (you might want to style it a little differently depending on what kind of colleges - ones with lots of research and med schools vs liberal arts type etc) Make sure you remind the reader in the first paragraph about how it would not be possible to test all the human genome in functional assays so we need bioinformatics approaches to narrow down possible target areas to focus on.
best,
Caroline
I am just on my way to catch a flight to england so can't answer everything now.
I will try and post later in the week.
I am happy to look at your abstract for college applications - when is it due? and where are you applying? (you might want to style it a little differently depending on what kind of colleges - ones with lots of research and med schools vs liberal arts type etc) Make sure you remind the reader in the first paragraph about how it would not be possible to test all the human genome in functional assays so we need bioinformatics approaches to narrow down possible target areas to focus on.
best,
Caroline
-
kyhekm
- Posts: 45
- Joined: Tue Jun 16, 2009 12:18 am
- Occupation: Student
- Project Question: How to start selecting projects.... etc //many several questions!
- Project Due Date: not started yet
- Project Status: I am just starting
Re: Question about bio informatics
Hello!
It's been long time!
I was really busy writing essays and doing other stuff..
I am hoping to resume my research by early January.
Here is my abstract I would like you to look over.
I know it is end of year so it is unlikely for you to check this forum at this time...
But if you see it, I would like to ask you to edit this.
I need to mail this by 12/31 in US time so if you can look over and get back to me before that date, it will be super great!
I hope you end your year strong and have a blast
Ryan
It's been long time!
I was really busy writing essays and doing other stuff..
I am hoping to resume my research by early January.
Here is my abstract I would like you to look over.
I know it is end of year so it is unlikely for you to check this forum at this time...
But if you see it, I would like to ask you to edit this.
I need to mail this by 12/31 in US time so if you can look over and get back to me before that date, it will be super great!
I hope you end your year strong and have a blast
Ryan
- Attachments
-
[The extension docx has been deactivated and can no longer be displayed.]

