[Beowulf] Maker2 genomic software license experience?

Tim Cutts tjrc at sanger.ac.uk
Thu Nov 8 06:10:12 PST 2012


On 8 Nov 2012, at 13:52, Skylar Thompson <skylar.thompson at gmail.com> wrote:

> I guess if your development time is sufficiently shorter than the
> equivalent compiled code, it could make sense.

This is true, and a lot of what these guys are writing is pipeline glue joining other bits of software together, for which scripting languages are perfect.  But there is an element of the "to the man with a hammer everything looks like a nail" thing going on, and people are writing analysis algorithms in these languages too.  That's fine for prototyping, but once you run it in production and it's going to use thousands of CPU-years, it might be nice if occasionally the prototypes were replaced with something that could run in hundreds of CPU years instead.  In those cases, investing a few extra weeks in implementing in a "harder" language is cost-effective.

> In Genome Sciences here
> at University of Washington, the grad students are taught Python and R,
> and there's a number of people who love the Python MPI bindings. We also
> have some C MPI users, but it's not as popular as Python.
> 
> I supposed what you can say is, for the right application, Python MPI
> certainly is faster than serial Python.

Maybe, maybe not.  If the problem is embarrassingly parallel, which many genomics problems are, often not.  We never adopted MPI-BLAST at Sanger, taking an old example, because the throughput was always far greater running multiple independent serial BLAST jobs, at least in a mixed environment where the BLAST searches weren't terribly predictable.

Plus of course, writing that MPI version of the code is much harder to get right than the serial version, so it goes against the original argument for keeping the development time short.

I realise I'm playing devil's advocate here, to a great extent.  But most genomics that I've dealt with so far is really about high throughput, not about short turnaround time of a single analysis job.  Of course there are some exceptions, and I'm making far too many sweeping generalisations here.

Tim

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Beowulf mailing list