Virtual Screening Project From Scratch

Ask questions about projects relating to: computer science or pure mathematics (such as probability, statistics, geometry, etc...).

Moderators: kgudger, bfinio, Moderators

Locked
ericjang
Posts: 83
Joined: Sat Nov 01, 2008 3:13 pm
Occupation: Student
Project Question: virtual screening using html5
Project Due Date: Jan 27
Project Status: I am conducting my research

Virtual Screening Project From Scratch

Post by ericjang »

Hi All,

I'm trying to implement my own virtual screening software in javascript (to be run on a web browser). In particular my project wants to identify compounds that bind well to a certain nucleic acid structure and would reveal quite a bit about the function and behavior of this relatively obscure molecule (PDB code 3IBK).

I came up with this project on my own, and as a result of having no lab mentors my knowledge in statistical mechanics and molecular simulation is patchy at best. However I'm willing to work very hard and study up whatever need be. I've asked a couple questions on Blue Obelisk but in the end, sciencebuddies experts tend to be much more open and willing to explain stuff in detail. :D

Please correct me if any of my reasoning below is incorrect or patchy.

So I've read from a lot of papers that virtual screening consists of 3 main parts.

1. docking - searching the conformational space of the receptor with the ligand, usually accounting for flexibility
2. scoring - function to evaluate free binding energy of a particular ligand pose
3. The best-binding molecules are the ones with the lowest energy determined by the score. Not always but this is a common evaluation?

I'm using the ChemDoodle Web Components API to represent my molecules, as it already provides a nice data model for defining molecules, their coordinates and accessing/setting bond lengths, along with a few cheminformatics algorithms. See picture http://cl.ly/9w9J to get a kind of the idea of what the data looks like: (JSON)

I don't think any textbook or paper actually describes the implementation process in code or even pseudocode, so I'd like to ask if my reasoning below is correct, and what other things I need to insert.

1. Subject ligand and receptor to molecular mechanics minimization, done separately. This will produce 'good' bond lengths and angles.

2. Construct minimized ligand and receptor in 3D Cartesian space, with the ligand close to a target site on the ligand. A key objective of my project is find RNA-selective compounds over DNA because I am looking for a ligand that binds selectively to the RNA quadruplex over the DNA quadruplex.

3. Using Metropolis Monte Carlo conformational search, translate the ligand around the region, also change global rotation of the ligand and torsion angles. Restrict ligand to only search in a particular area of interest reduce some degrees of freedom.
3.1: How do I decide which parameters to change in a given Monte Carlo iterative step?

4. Scoring function checks binding energy of each iteration of the Monte Carlo algorithm.
4.1: I know that some studies use a separate function for ranking poses and ranking ligands, but I'll stick with just one for now.
4.2: I'm not going to count on an 'easier' way out with an empirical scoring function, so I'll assume that I'm going to have to implement a force field of some sort. Aside from the force field coming into play during scoring, where else will I need it?

Please let me know if I missed anything in the overal design of things.

Thanks so much! More than 3 years later, science buddies is still my favorite resource.

- Eric
hhemken
Former Expert
Posts: 266
Joined: Mon Oct 03, 2005 3:16 pm

Re: Virtual Screening Project From Scratch

Post by hhemken »

ericjang,

Overall you appear to be on the right track. I am somewhat disturbed by "I'm going to have to implement a force field of some sort" if by "implement" you mean write one yourself. That is a truly monumental task that goes far, far beyond the scope of your project. If by "implement" you mean "get one from somewhere and use it," then that's fine. A popular free one is ghemical, and I just found this interesting site. There are a few others.

Your highlighted steps look good. Your remarks 3.1, 4.1, and 4.2 are related. I would assert that for each complex generated by a Monte Carlo iteration, you would have to run it through an energy minimization to get rid of steric conflicts, unreasonable empty space, strained structures, etc. That step would generate metrics that could be used for scoring, at very least the global conformational energy value, possibly per atom (since the energy will also vary by ligand size. You can also count hydrogen bonds as well as their quality, and no doubt many other things. 4.2 asks when you would need to use the force field, so the answer would then be that you would use it in 1 as planned as well as for each complex generated by Monte Carlo. For scoring, you might want to only use the lowest energy Monte Carlo-generated energy-minimized complex per ligand, under the assumption that it would be the most common one or "best" one that would actually exist in a real system, and the rest would be uncommon enough that you can ignore them.

BTW, Math and Computer Science might not be the ideal place for this project, since it is mainly computational chemistry (which I used to do back in the day). I'll leave it up to you whether you want to stay here or move it elsewhere. I only monitor this section, though.

Good luck, and keep us posted!
Heinz Hemken
Mentor
Science Buddies Expert Forum
ericjang
Posts: 83
Joined: Sat Nov 01, 2008 3:13 pm
Occupation: Student
Project Question: virtual screening using html5
Project Due Date: Jan 27
Project Status: I am conducting my research

Re: Virtual Screening Project From Scratch

Post by ericjang »

Hi hhemken,

Sure, I suppose it would be a better idea to migrate this thread to chemistry while I'm still trying to figure out theoretical stuff. In regards to implementation of statistical mechanics algorithms, I'll probably ask those mathy-compsci-stuff here when the time comes.

Could you please clarify what you meant by

4.2 asks when you would need to use the force field, so the answer would then be that you would use it in 1 as planned as well as for each complex generated by Monte Carlo.


By that, do you mean that I need to use Force Fields to perform energy minimization? How is that process different from scoring then?

Thanks again!
ericjang
Posts: 83
Joined: Sat Nov 01, 2008 3:13 pm
Occupation: Student
Project Question: virtual screening using html5
Project Due Date: Jan 27
Project Status: I am conducting my research

Re: Virtual Screening Project From Scratch

Post by ericjang »

bump! anybody out there?
hhemken
Former Expert
Posts: 266
Joined: Mon Oct 03, 2005 3:16 pm

Re: Virtual Screening Project From Scratch

Post by hhemken »

ericjang,

Sorry for the delay, I must not have seen the alert that you had posted something.

If you are using the Monte Carlo method to randomly generate docking structures, i.e. the ligand sitting in the receptor, I assume you are going to calculate a score for the complex to decide whether it is a good complex or a bad complex, correct? If so, you will probably need to do an energy minimization so that atoms don't overlap and you don't have any unreasonable conformations. You would thus be scoring a chemically reasonable docking complex.

How did you plan on scoring them? I may be misunderstanding something in your approach.
Heinz Hemken
Mentor
Science Buddies Expert Forum
ericjang
Posts: 83
Joined: Sat Nov 01, 2008 3:13 pm
Occupation: Student
Project Question: virtual screening using html5
Project Due Date: Jan 27
Project Status: I am conducting my research

Re: Virtual Screening Project From Scratch

Post by ericjang »

Hi hhemken,

Thanks so much for the reply! Sorry I've been away for awhile trying to read literature and figure this out (with little success).
I had a couple questions about this energy minimization process, along with some clarification questions (some answers from other threads are thoroughly confusing me so I'm not sure if I phrased my question properly or if I'm looking at multiple ways to tackle the same problem):

1. I would like to find a free force field (possibly ghemical) so that I can assign a binding energy 'S' such that the best pose of the best ligand will return the most energetically favorable (negative) value of 'S'. I'm not interested in the actual conformation so much as the binding value.
2. The monte carlo is a conformational search algorithm that adjusts the ligand's position and shape in little increments at a time. So for each of these increments I have to perform energy minimization? Wouldn't that cause structures to just revert to their original shapes if I minimize a structure that I just attempted to modify using MC? how does minimization integrate with the docking algorithm? Seems like docking algorithm (e.g. metropolis) also guides the conformational search, but if docking gets to guide the molecules, why is minimization required?

3. I read that a basic force field (scoring function?) is simply MME = ΣEstr + ΣEbend + ΣEoop + ΣEtor + ΣEvdw + ΣEele. But I don't know how MME can tell me the 'binding affinity of a molecule' - does MME by itself have any meaning?
4. Suppose I have the ligand sitting in some cavity of the receptor. How should I go about iterating over pairs of atoms? How do I even begin to calculate values describing the interactions between two molecules in cartesian space? Sorry if this is a silly question...

- Eric
hhemken
Former Expert
Posts: 266
Joined: Mon Oct 03, 2005 3:16 pm

Re: Virtual Screening Project From Scratch

Post by hhemken »

ericjang,

You are struggling with the same problems any other scientist would have to face, so you are not alone. You will have to compromise, and in your case that means make approximations. There are two separate issues that are related to each other but difficult to calculate at the same time: conformational energy and binding energy. The conformational energy of a complex will reflect the binding energy of the ligand to the receptor, but it is probably hard to tease that value out of the complex. Your calculations will probably be done in the absence of solvation, so that is another approximation you need to accept unless you have significant compute resources.

Also, there is the chicken and egg problem of Monte Carlo versus minimization. If you are doing Monte Carlo using rigid ligands and receptors as people used to do back in the day, you will get very crude values. If you energy-minimize each time then you are correct that they will converge to a smaller set of final ligand-receptor complexes than the starting set of Monte Carlo candidates, which sort of defeats the purpose of MC. You say that "the Monte Carlo is a conformational search algorithm that adjusts the ligand's position and shape in little increments at a time." It sounds like you do not want to do energy minimization because the MC method is changing the ligand-receptor complex on its own. In that case you can use ghemical (or whatever) to just calculate a static conformational energy (no minimization) and use that. This is the scoring function you mention in item 3. You can do a Monte Carlo search to find the lowest such static energy for a given ligand and receptor, and then do the same for lots of different ligands and the same receptor. Then you will have in effect calculated an energy minimized ligand-receptor conformation for each, and you can use that energy as an approximate representation of the binding energy. It isn't really a binding energy, but you can claim that as a first approximation, the binding energy should be roughly proportional to the stability of the complex. Note that the combination of the MC conformation tweaking scored by static conformational calculations constitutes an energy minimization scheme, so you would not need to minimize in ghemical (or whatever such tool you use). I am assuming that your MC method does in fact tweak the conformation of the ligand as well as its position and orientation in the receptor. Sounds like a pretty expensive calculation, though.

Item 4 is not a silly question, it is a difficult problem. Since this is your first attempt to figure the problem out, I suspect you should not try to go that far. The energies you will get from the MC calculations will be a very good initial simulation of ligand binding, and that would be a good set of results in and of itself. You could go further some time in the future.

It sounds to me like you are on the right track. It would be cool if you could post some of your initial results to see how it goes.

Good Luck!

Heinz
Heinz Hemken
Mentor
Science Buddies Expert Forum
ericjang
Posts: 83
Joined: Sat Nov 01, 2008 3:13 pm
Occupation: Student
Project Question: virtual screening using html5
Project Due Date: Jan 27
Project Status: I am conducting my research

Re: Virtual Screening Project From Scratch

Post by ericjang »

Hi hhemken,

Ah, that does make sense now. So theoretically I could use a (Metropolis) Monte Carlo to do the conformational search, and simply evaluate the static conformational energy at that point as an approximation of 'best ranking pose' (using force field calculations), then repeat for a lot of ligands. For starters, I may as well implement this basic system before implementing faster algorithms.

But like you said, if the minimization scheme is implemented as part of the MC search, then it may be computationally expensive. I may want to implement energy minimization in the future, so how does that factor into the docking/scoring process? The way I currently understand it, energy minimization is more deterministic/faster than the stochastic MC search, so minimization is used to 'refine' the pose generated by MC. Basically, MC only translates/rotates entire molecule and changes torsion angles for ligand flexibility, but the other parameters (bond angle, bond length, etc) are optimized using the energy minimization algorithm for speed improvements? Please correct me if my understanding is wrong...

Sorry, I think I might have made my 4th question a bit unclear: Basically I am trying to create this project from scratch, and I have a working model for chemical structures. The whole thing is web-based and html5-powered, and I know javascript isn't as speedy as C/fortran but this is sort of a proof-of-concept/self-teaching project that I thought would be cool. That said, I am using a chemical model that represents molecule objects in Cartesian space, but contains JSON data to describe bond lengths, rings, individual atoms, etc (see images below)

Image
JSON object describing chemical in human-readable format. ^

Image
Each atom has the following info ^
Image
Each atom-atom bond has the following info ^

Suppose I had two molecules:

4a. Superpositioning the two molecules next to each other sounds easy enough, I just constrain the molecule near the desired 'binding site' of the receptor and make sure the VDW radii don't clash...
4b. Okay, the two molecules are close to each other now. I want to use the force field scoring functions, so which atoms do I take into account into the calculation?

I know that many molecular modeling packages use matrices and/or graph theory to represent chemical models, but at this point I am constrained to my type of chemical model. Will the implementing of force fields to calculate energies between these JSON models be impossible?

Thanks!
- Eric
hhemken
Former Expert
Posts: 266
Joined: Mon Oct 03, 2005 3:16 pm

Re: Virtual Screening Project From Scratch

Post by hhemken »

Eric,

I think you should consider the Monte Carlo and the energy minimization approaches as separate and only do one or the other. I also suggest you don't even think of writing your own energy minimization program, it is not trivial at all. It was a monumental achievement back in the day. Use as much existing software as possible. Only code things yourself when you absolutely have to.

A quick and dirty test of the energy minimization approach would be to load the ligand and the receptor into ghemical or some other package and do a minimization. You can use the final conformational energy as your score. You would still have the problem of testing several different ligand orientations, but maybe you could use the MC approach as a starting point.

Question 4b: You should use the total conformational energy of the complex. Choosing atoms becomes nightmarish because you have to explain how to compare ligands with very different structures.

"MC only translates/rotates entire molecule and changes torsion angles for ligand flexibility:" This is fine. It is not such a bad approximation to keep bond lengths and angles static. Most of the flexibility is in the torsional angles.

"Will the implementing of force fields to calculate energies between these JSON models be impossible?" Most molecular modeling packages, including ghemical, can load molecules using only the Cartesian coordinates of all of the atoms. You might need to write some code to 1) load the coordinate files of the ligand and the receptor, 2) orient the ligand using your Monte Carlo techniques, 3) write the coordinate file for the resulting complex to a new file, and 4) load it into ghemical to calculate the initial static energy and optionally minimize it. Since in step 4 you pretty much only have to do a mouse click to minimize the energy, you might as well do it. What kind of computer will you be using, i.e. is it fast? Is it Windows, Mac, or Linux?

Instead of running you javascript code in the browser, you may want to figure out how to run your scripts at the command line, since you will be reading and writing a lot of coordinate files, among other things. Try googling this:

Code: Select all

running javascript from the command line
I think your ready to start doing some preliminary calculations at this point, possibly after modifying your code a bit to deal with ghemical-compatible coordinate files.

Good Luck!
Heinz Hemken
Mentor
Science Buddies Expert Forum
ericjang
Posts: 83
Joined: Sat Nov 01, 2008 3:13 pm
Occupation: Student
Project Question: virtual screening using html5
Project Due Date: Jan 27
Project Status: I am conducting my research

Re: Virtual Screening Project From Scratch

Post by ericjang »

Hi hhemken,

I do believe you when you say that minimization and force fields are non-trivial. But my project design (with intent on web-based distributed computing) requires that I at least re-implement everything in javascript, even if I don't do algorithm modification. I will try my best, and even if I do not succeed I think I will learn a lot. After all, nothing ventured, nothing gained!
Regarding your answer to my question 4b:
You should use the total conformational energy of the complex. Choosing atoms becomes nightmarish because you have to explain how to compare ligands with very different structures.
By this, do you mean that total conformational energy is measured by both the ligand and receptor considered as a single 'molecule'? That does make sense, but at the most basic level there needs to be some way to iterate through each atom of the complex in calculating the conformational energy?

According to http://employees.csbsju.edu/hjakubowski ... ynam2.html, the molecular mechanics energy is given by:

Energy (E) = E Stretch + EBending + ETorsion + E Non-bonded Interactions

I don't know if this is even a reasonable request, but suppose I have a propane molecule 1 Angstrom away from a water molecule. Can you describe how the Monte Carlo search algorithm would proceed to dock the water molecule (ligand) into the propane molecule (receptor), and explain how the total conformational energy of the complex would be calculated?

But MM energies itself has no meaning, only the difference between 2 or more conformations have meaning. Could you also also explain what this means?

Thanks so much!
- Eric
hhemken
Former Expert
Posts: 266
Joined: Mon Oct 03, 2005 3:16 pm

Re: Virtual Screening Project From Scratch

Post by hhemken »

ericjang,
By this, do you mean that total conformational energy is measured by both the ligand and receptor considered as a single 'molecule'?
Yes, I do. It is the interaction of the ligand and the receptor that you are interested in.

I'm worried about your re-implementing everything in javascript. Even if things go well, you will either 1) do a much simplified version with no minimization (although your use of Monte Carlo may compensate for this), or 2) take a very long time to finish. Software development is like house to house combat. You never know what you will find, it is always a lot harder than you thought it would be, and all you have is what you are carrying with you.

Why don't you just use one of the freely available packages? You'll still have to write some code to move data around. Are you guiding yourself with any existing source code?
Heinz Hemken
Mentor
Science Buddies Expert Forum
Locked

Return to “Grades 9-12: Math and Computer Science”