[Beowulf] Large amounts of data to store and process

John Hearns hearnsj at googlemail.com
Sun Mar 10 04:05:47 PDT 2019


Also interesting to me is mixed precision arithmetic - which Julia makes
easy.
We are going to see more and more codes which will choose lower precision
to save energy, not just for running Deep Learning modesl on GPUs.

I share a code snippet not written by me. I think it is a brilliant idea.
Here a researcher in ocean modelling is able to change the types of numbers
his model uses. Run with lower precision and see what changes.
I guess this is easy in C/C++ also, but the concept is fantastic.

# NUMBER FORMAT OPTIONS
const Numtype = Float32
#const Numtype = Posit{16,2}
#const Numtype = Main.FiniteFloats.Finite16
#const Numtype = BigFloat
#setprecision(7)

Using Julias tyoe system you can change the type of the numbers.
Here the calculation is run as NumType  - and you can make NumType to be 32
bit floats or arbitrarily large floats
I see Float64 is not listed - that should be there.









On Sun, 10 Mar 2019 at 10:57, John Hearns <hearnsj at googlemail.com> wrote:

> Jonathan, damn good question.
> There is a lot of debate at the moment on how 'traditional' HPC can
> co-exist with 'big data' style HPC.
>
> Regarding Julia, I am a big fan of it and it bring a task-level paradigm
> to HPC work.
> To be honest though, traditional Fortran codes will be with us forever.
> No-one is going to refactor say a weather forecasting model in a national
> centre.
> Also Python has the mindset at the moment. I have seen people in my
> company enthusiastically taking up Python.
> Not because of a measured choice after scanning dozens of learned papers
> and Reddit reviews etc.
> If that was the case then they might opt for Go or some niche language.
> No, the choice is made because their colleagues already use Python and
> pass on start up codes, and there is a huge Python community.
>
> Same with traditional HPC codes really - we all know that batch scripts
> are passed on through the generations like Holy Books,
> and most scientists dont have a clue what these scratches on clay tablets
> actually DO.
> Leading people to continue to run batch jobs which are hard wired for 12
> cores on a 20 core machine etc. etc.
>
> (*)  this is worthy of debate. In Formula 1 whenever we updated the
> version of our CFD code we re-ran a known simulation and made sure we still
> had correlation.
> It is inevitable that old versions of codes will sop being supported
>
>
>
>
>
>
>
> On Sun, 10 Mar 2019 at 09:29, Jonathan Aquilina <jaquilina at eagleeyet.net>
> wrote:
>
>> Hi All,
>>
>> Basically I have sat down with my colleague and we have opted to go down
>> the route of Julia with JuliaDB for this project. But here is an
>> interesting thought that I have been pondering if Julia is an up and coming
>> fast language to work with for large amounts of data how will that affect
>> HPC and the way it is currently used and HPC systems created?
>>
>> Regards,
>> Jonathan
>>
>> -----Original Message-----
>> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of Michael Di
>> Domenico
>> Sent: 04 March 2019 17:39
>> Cc: Beowulf Mailing List <beowulf at beowulf.org>
>> Subject: Re: [Beowulf] Large amounts of data to store and process
>>
>> On Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina <jaquilina at eagleeyet.net>
>> wrote:
>> >
>> > As previously mentioned we don’t really need to have anything indexed
>> so I am thinking flat files are the way to go my only concern is the
>> performance of large flat files.
>>
>> potentially, there are many factors in the work flow that ultimately
>> influence the decision as others have pointed out.  my flat file example is
>> only one, where we just repeatable blow through the files.
>>
>> > Isnt that what HDFS is for to deal with large flat files.
>>
>> large is relative.  256GB file isn't "large" anymore.  i've pushed TB
>> files through hadoop and run the terabyte sort benchmark, and yes it can be
>> done in minutes (time-scale), but you need an astounding amount of hardware
>> to do it (the last benchmark paper i saw, it was something
>> 1000 nodes).  you can accomplish the same feat using less and less
>> complicated hardware/software
>>
>> and if your dev's are willing to adapt to the hadoop ecosystem, you sunk
>> right off the dock.
>>
>> to get a more targeted answer from the numerous smart people on the list,
>> you'd need to open up the app and workflow to us.  there's just too many
>> variables _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20190310/18ccab96/attachment.html>


More information about the Beowulf mailing list