[Beowulf] Clustering vs Hadoop/spark [EXT]

John Hearns hearnsj at gmail.com
Wed Nov 25 09:46:20 UTC 2020


Or to put it simply:  "Alexa - sequence my genome"

On Wed, 25 Nov 2020 at 09:45, John Hearns <hearnsj at gmail.com> wrote:

> Tim, that is really smart. Over on the Julia discourse forum I have blue
> skyed about using Lambdas to run Julia functions (it is an inherently
> functional language) (*)
> Blue skying further, for exascale compute needs can we think of 'Science
> as a Service'?
> As in your example the scientist thinks about the analysis and how it is
> performed. Then sends it off to be executed. Large chunks are run using
> Lambda functions.
> Crucially, if a Lambda (or whatever) fails the algorithm should be able to
> continue. People building web scale applications think like this today
> anyway.
> Do you REALLY think you are connected to Amazon's single web server when
> you make a purchase? But it looks that way.
> Also if you are about to purchase something and your Wifi goes down - as a
> customer you would be very angry if you were billed for this item.
>
> (*) It is possible to insert your own 'payload' in a Lambda. There are
> standard ones like Python obviously.
> However at the time I looked there was a small size limit on the payload.
>
> Re-reading my won response
> https://discourse.julialang.org/t/lambda-or-cloud-functions-eventually-possible/39128/5
> you CAN have a larger payload, but this has to be in an S3 bucket
> https://docs.aws.amazon.com/lambda/latest/dg/nodejs-package.html
>
> BTW, I am sure everyone knows this but if you have a home assistant such
> as Alexa everytime you ask Alexa it is a lambda which is spun up
>
>
>
>
>
>
>
> On Wed, 25 Nov 2020 at 09:27, Tim Cutts <tjrc at sanger.ac.uk> wrote:
>
>>
>>
>> On 24 Nov 2020, at 18:31, Alex Chekholko via Beowulf <beowulf at beowulf.org>
>> wrote:
>>
>> If you can run your task on just one computer, you should always do that
>> rather than having to build a cluster of some kind and all the associated
>> headaches.
>>
>>
>> If you take on the cloud message, that of course isn’t necessarily the
>> case.  If you use very high level cloud services like lambda, you don’t
>> have to build that infrastructure.  It’s very unlikely to be anywhere near
>> as efficient, of course, but throughput efficiency is not what your average
>> scientist cares about.  What they care about is getting their answer
>> quickly (and to a lesser extent, cheaply)
>>
>> I saw a recent example where someone took a fairly simple sequencing read
>> alignment process, which normally runs on a single 16-core node in about 6
>> hours, and split the input files small enough that the alignment code
>> execution time and memory use would fit with AWS Lambda’s envelope.  The
>> result executed in a couple of minutes, elapsed, but used about four times
>> as many core-hours as the optimised single node version.  Of course, this
>> is an embarrassingly parallel problem, so this is a relatively easy
>> analysis to move to this sort of design.
>>
>> From the scientist’s point of view, which is better?  Getting their
>> answer in 5 minutes or 6 hours?  Especially if they’ve also reduced their
>> development time as well because they don’t have to worry so much about
>> infrastructure and optimisation.
>>
>> The total value is hard to work out, many of these considerations are
>> hard to put a dollar value on.  When I saw that article, I did ask the
>> author how much the analysis actually cost, and she didn’t have a number.
>> But I don’t think we can dogmatically say that we should always run a task
>> on a single machine if we can.
>>
>> Tim
>> -- The Wellcome Sanger Institute is operated by Genome Research Limited,
>> a charity registered in England with number 1021457 and a company
>> registered in England with number 2742969, whose registered office is 215
>> Euston Road, London, NW1 2BE.
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201125/3669c035/attachment-0001.htm>


More information about the Beowulf mailing list