[Beowulf] Large amounts of data to store and process

Thu Mar 14 14:02:06 PDT 2019

> Then given we are reaching these limitations how come we donât integrate
> certain things from the HPC world into every day computing so to speak.

Scalable/parallel computing is hard and hard costs time and money.
In HPC the performance often justifies the means, in other
sectors the cost must justify the means.

HPC has traditionally trickled down in to other sectors. However,
many or the HPC problem types are not traditional computing
problems. This situation is changing a bit with things
like Hadoop/Spark/Tensor Flow

--
Doug

>
> ï»¿On 14/03/2019, 19:14, "Douglas Eadline" <deadline at eadline.org> wrote:
>
>
>     > Hi Douglas,
>     >
>     > Isnt there quantum computing being developed in terms of CPUs at
> this
>     > point?
>
>     QC is (theoretically) unreasonably good at some things at other
>     there may me classic algorithms that work better. As far as I know,
>     there has been no demonstration of "quantum
>     supremacy" where a quantum computer is shown
>     to be faster than a classical algorithm.
>
>     Getting there, not there yet.
>
>     BTW, if you want to know what is going on with QC
>     read Scott Aaronson's blog
>
>     https://www.scottaaronson.com/blog/
>
>     I usually get through the first few paragraphs and
>     then whoosh over my scientific pay grade
>
>
>     > Also is it really about the speed any more rather then how
>     > optimized the code is to take advantage of the multiple cores that
> a
>     > system has?
>
>     That is because the clock rate increase slowed to a crawl.
>     Adding cores was a way to "offer" more performance, but introduced
>     the "multi-core tax." That is, programing for multi-core is
>     harder and costlier than a single core. Also, much
>     harder to optimize. In HPC we are lucky, we are used to
>     designing MPI codes that scale with more cores (no mater
>     where they live, same die, next socket, another server).
>
>     Also, more cores usually means lower single core
>     frequency to fit into a given power envelope (die shrinks help
>     with this but based on everything I have read, we are about
>     at the end of the line) It also means lower absolute memory
>     BW per core although more memory channels help a bit.
>
>     --
>     Doug
>
>
>     >
>     > Ã¯Â»Â¿On 13/03/2019, 22:22, "Douglas Eadline" <deadline at eadline.org>
> wrote:
>     >
>     >
>     >     I realize it is bad form to reply ones own post and
>     >     I forgot to mention something.
>     >
>     >     Basically the HW performance parade is getting harder
>     >     to celebrate. Clock frequencies have been slowly
>     >     increasing while cores are multiply rather quickly.
>     >     Single core performance boosts are mostly coming
>     >     from accelerators. Added to the fact that speculation
>     >     technology when managed for security, slows things down.
>     >
>     >     What this means, the focus on software performance
>     >     and optimization is going to increase because we can just
>     >     buy new hardware and improve things anymore.
>     >
>     >     I believe languages like Julia can help with this situation.
>     >     For a while.
>     >
>     >     --
>     >     Doug
>     >
>     >     >> Hi All,
>     >     >> Basically I have sat down with my colleague and we have opted
> to go
>     > down
>     >     > the route of Julia with JuliaDB for this project. But here is
> an
>     >     > interesting thought that I have been pondering if Julia is an
> up
>     > and
>     >     > coming fast language to work with for large amounts of data
> how
>     > will
>     >     > that
>     >     >> affect HPC and the way it is currently used and HPC systems
>     > created?
>     >     >
>     >     >
>     >     > First, IMO good choice.
>     >     >
>     >     > Second a short list of actual conversations.
>     >     >
>     >     > 1) "This code is written in Fortran." I have been met with
>     >     > puzzling looks when I say the the word "Fortran." Then it
>     >     > comes, "... ancient language, why not port to modern ..."
>     >     > If you are asking that question young Padawan you have
>     >     > much to learn, maybe try web pages"
>     >     >
>     >     > 2) I'll just use Python because it works on my Laptop.
>     >     > Later, "It will just run faster on a cluster, right?"
>     >     > and "My little Python program is now kind-of big and has
>     >     > become slow, should I use TensorFlow?"
>     >     >
>     >     > 3) <mcoy>
>     >     > "Dammit Jim, I don't want to learn/write Fortran,C,C++ and
> MPI.
>     >     > I'm a (fill in  domain specific scientific/technical
> position)"
>     >     > </mcoy>
>     >     >
>     >     > My reply,"I agree and wish there was a better answer to that
>     > question.
>     >     > The computing industry has made great strides in HW with
>     >     > multi-core, clusters etc. Software tools have always lagged
>     >     > hardware. In the case of HPC it is a slow process and
>     >     > in HPC the whole programming "thing" is not as "easy" as
>     >     > it is in other sectors, warp drives and transporters
>     >     > take a little extra effort.
>     >     >
>     >     > 4) Then I suggest Julia, "I invite you to try Julia. It is
>     >     > easy to get started, fast, and can grow with you
> application."
>     >     > Then I might say, "In a way it is HPC BASIC, it you are old
>     >     > enough you will understand what I mean by that."
>     >     >
>     >     > The question with languages like Julia (or Chapel, etc) is:
>     >     >
>     >     >   "How much performance are you willing to give up for
>     > convenience?"
>     >     >
>     >     > The goal is to keep the programmer close to the problem at
> hand
>     >     > and away from the nuances of the underlying hardware.
> Obviously
>     >     > the more performance needed, the closer you need to get to
> the
>     > hardware.
>     >     > This decision goes beyond software tools, there are all kinds
>     >     > of cost/benefits that need to be considered. And, then there
>     >     > is IO ...
>     >     >
>     >     > --
>     >     > Doug
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >> Regards,
>     >     >> Jonathan
>     >     >> -----Original Message-----
>     >     >> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of
> Michael
>     > Di
>     >     > Domenico
>     >     >> Sent: 04 March 2019 17:39
>     >     >> Cc: Beowulf Mailing List <beowulf at beowulf.org>
>     >     >> Subject: Re: [Beowulf] Large amounts of data to store and
> process
>     > On
>     >     > Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina
>     >     > <jaquilina at eagleeyet.net>
>     >     >> wrote:
>     >     >>> As previously mentioned we
> donÃÆÃÂ¢ÃâÃ¢âÂ¬ÃâÃ¢âÂ¢t really need to have
>     > anything
>     >     >>> indexed
>     >     > so I am thinking flat files are the way to go my only concern
> is
>     > the
>     >     > performance of large flat files.
>     >     >> potentially, there are many factors in the work flow that
>     > ultimately
>     >     > influence the decision as others have pointed out.  my flat
> file
>     > example
>     >     > is only one, where we just repeatable blow through the files.
>     >     >>> Isnt that what HDFS is for to deal with large flat files.
>     >     >> large is relative.  256GB file isn't "large" anymore.  i've
> pushed
>     > TB
>     >     > files through hadoop and run the terabyte sort benchmark, and
> yes it
>     > can
>     >     > be done in minutes (time-scale), but you need an astounding
> amount
>     > of
>     >     > hardware to do it (the last benchmark paper i saw, it was
> something
>     > 1000
>     >     > nodes).  you can accomplish the same feat using less and less
>     >     > complicated hardware/software
>     >     >> and if your dev's are willing to adapt to the hadoop
> ecosystem, you
>     > sunk
>     >     > right off the dock.
>     >     >> to get a more targeted answer from the numerous smart people
> on
>     > the
>     >     > list,
>     >     >> you'd need to open up the app and workflow to us.  there's
> just too
>     > many
>     >     > variables _______________________________________________
>     >     >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> Penguin
>     > Computing
>     >     > To change your subscription (digest mode or unsubscribe)
> visit
>     >     >> http://www.beowulf.org/mailman/listinfo/beowulf
>     >     >> _______________________________________________
>     >     >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> Penguin
>     > Computing
>     >     > To change your subscription (digest mode or unsubscribe)
> visit
>     >     >> http://www.beowulf.org/mailman/listinfo/beowulf
>     >     >
>     >     >
>     >     > --
>     >     > Doug
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > _______________________________________________
>     >     > Beowulf mailing list, Beowulf at beowulf.org sponsored by
> Penguin
>     > Computing
>     >     > To change your subscription (digest mode or unsubscribe)
> visit
>     >     > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     >     >
>     >
>     >
>     >     --
>     >     Doug
>     >
>     >
>     >
>     >
>
>
>     --
>     Doug
>
>
>
>

-- 
Doug