For the sci-fi minded. was: Re: Big Bad Beowulfs

Sun May 14 22:56:06 PDT 2000

(caution, deep speculation mode ahead)

Borries Demeler writes:

 > 1. What theoretical approaches are currently available to
 > deterministically design proteins/enzymes (i.e., simulated annealing,
 > charge density/surface map calculation, folding predictions, 3-d sequence
 > comparisons, etc)?
 > If random or semi-random sequence design and basic trial and error 
 > is the way to go, Beowulfs would be of little value to this project.

De novo enzymes are equivalent to solving the Inverse Protein Folding
Problem. Right now, the only simple way to do it afaIcs, is to use
evolutionary algorithms on a large number of simultaneous operating
engines capable of solving the Protein Folding Problem in realtime or
near realtime (~minutes to hours for a single design iteration). IBM's
100 M$ BlueGene project attempts to solve the PFP by brute force
molecular dynamics on a massively parallel hardware (to be developed),
using embedded RAM technology. (By the time BlueGene is deployed,
Beowulf-grade COTS hardware is likely to use embedded RAM as well --
right now it is currently only used in the Playstation 2 and a single
switch chipset). Provided, a sufficiently accurate forcefield for
protein folding can be developed (obviously lots of tweakable knobs
necessary, using GA on forcefield parameters, using empirically solved
structures (Brookhaven pdb) and forcefield constants derived from QM
layer as source of constraints all sound like applicable ideas), this
will require about a year runtime on abovementioned first generation
of BlueGene for an amino acid sequence of moderate size. Imo, a rather
sound approach, as brute force goes.

After the code base has settled, this can be moved in a maspar ASIC
bank, or whatever technology is by then available. 

Similiar as to what rendering engines have done to the game market
(there is a long way from ray tracing to current game engines, and
hardware accelerated voxel rendering), it should be possible to use
similiar methods to accelerate computation of physical laws, with
problem-domain insignificant compromises to the accuracy. The best
candidate for such shortcut algorithms, appear still to be integer
lattice gas based algorithms. Both steady pace of progress and
relatively recent result
	    http://physics.bu.edu/~bruceb/MolSim/ 
seems to indicate that large scale MD on cellular automata machines
appears to be within reach.

 > 2. What software is available to do this?

As the first starting point, http://SAL.KachinaTech.COM/Z/2/index.shtml
seems like a good choice.

 > 3. If it is available, has anything been implemented in parallel Beowulf
 > fashion?

Yes. Large scale MD runs quite nicely on Beowulfs. See SPaSM for an
illustration http://bifrost.lanl.gov/MD/MD.html

 > 4. What groups are working on this? 

In the academia too many to name here. PFP is a valid Grand Challenge
in structural biology. I presume that big pharma will soon realize the
importance of solving it as far as possible, and will allocate
sufficient funds to it, not unlikely what Celera did to the large
scale sequencing projects.

 > And most importantly: 1) Is it a lack of computational power that is limiting
 > the succes of protein design, or is it 2) a lack in physical/theoretical
 > understanding that's a bottleneck? (assuming b-f is really routine and 
 > can't be improved much. I could see that if random sequences are the way
 > to go, b-f are really the bottleneck)

Both. However, 2) should be solvable with dirty methods.

 > If it is 1, then the prediction may well be true, otherwise it's really hard
 > to say when we would be able to do such a thing.

It very much depends how much priority a PFP project can obtain. With
sufficient funds and effort, things can be accelerated.