[Beowulf] Large amounts of data to store and process

Jeffrey Layton laytonjb at gmail.com
Thu Mar 14 14:08:34 PDT 2019


I don't want to interrupt the flow but I'M feeling cheeky. One word can
solve everything "Fortran". There I said it.

Jeff


On Thu, Mar 14, 2019, 17:03 Douglas Eadline <deadline at eadline.org> wrote:

>
> > Then given we are reaching these limitations how come we don’t
> integrate
> > certain things from the HPC world into every day computing so to speak.
>
> Scalable/parallel computing is hard and hard costs time and money.
> In HPC the performance often justifies the means, in other
> sectors the cost must justify the means.
>
> HPC has traditionally trickled down in to other sectors. However,
> many or the HPC problem types are not traditional computing
> problems. This situation is changing a bit with things
> like Hadoop/Spark/Tensor Flow
>
> --
> Doug
>
>
> >
> > On 14/03/2019, 19:14, "Douglas Eadline" <deadline at eadline.org> wrote:
> >
> >
> >     > Hi Douglas,
> >     >
> >     > Isnt there quantum computing being developed in terms of CPUs at
> > this
> >     > point?
> >
> >     QC is (theoretically) unreasonably good at some things at other
> >     there may me classic algorithms that work better. As far as I know,
> >     there has been no demonstration of "quantum
> >     supremacy" where a quantum computer is shown
> >     to be faster than a classical algorithm.
> >
> >     Getting there, not there yet.
> >
> >     BTW, if you want to know what is going on with QC
> >     read Scott Aaronson's blog
> >
> >     https://www.scottaaronson.com/blog/
> >
> >     I usually get through the first few paragraphs and
> >     then whoosh over my scientific pay grade
> >
> >
> >     > Also is it really about the speed any more rather then how
> >     > optimized the code is to take advantage of the multiple cores that
> > a
> >     > system has?
> >
> >     That is because the clock rate increase slowed to a crawl.
> >     Adding cores was a way to "offer" more performance, but introduced
> >     the "multi-core tax." That is, programing for multi-core is
> >     harder and costlier than a single core. Also, much
> >     harder to optimize. In HPC we are lucky, we are used to
> >     designing MPI codes that scale with more cores (no mater
> >     where they live, same die, next socket, another server).
> >
> >     Also, more cores usually means lower single core
> >     frequency to fit into a given power envelope (die shrinks help
> >     with this but based on everything I have read, we are about
> >     at the end of the line) It also means lower absolute memory
> >     BW per core although more memory channels help a bit.
> >
> >     --
> >     Doug
> >
> >
> >     >
> >     > On 13/03/2019, 22:22, "Douglas Eadline" <
> deadline at eadline.org>
> > wrote:
> >     >
> >     >
> >     >     I realize it is bad form to reply ones own post and
> >     >     I forgot to mention something.
> >     >
> >     >     Basically the HW performance parade is getting harder
> >     >     to celebrate. Clock frequencies have been slowly
> >     >     increasing while cores are multiply rather quickly.
> >     >     Single core performance boosts are mostly coming
> >     >     from accelerators. Added to the fact that speculation
> >     >     technology when managed for security, slows things down.
> >     >
> >     >     What this means, the focus on software performance
> >     >     and optimization is going to increase because we can just
> >     >     buy new hardware and improve things anymore.
> >     >
> >     >     I believe languages like Julia can help with this situation.
> >     >     For a while.
> >     >
> >     >     --
> >     >     Doug
> >     >
> >     >     >> Hi All,
> >     >     >> Basically I have sat down with my colleague and we have
> opted
> > to go
> >     > down
> >     >     > the route of Julia with JuliaDB for this project. But here is
> > an
> >     >     > interesting thought that I have been pondering if Julia is an
> > up
> >     > and
> >     >     > coming fast language to work with for large amounts of data
> > how
> >     > will
> >     >     > that
> >     >     >> affect HPC and the way it is currently used and HPC systems
> >     > created?
> >     >     >
> >     >     >
> >     >     > First, IMO good choice.
> >     >     >
> >     >     > Second a short list of actual conversations.
> >     >     >
> >     >     > 1) "This code is written in Fortran." I have been met with
> >     >     > puzzling looks when I say the the word "Fortran." Then it
> >     >     > comes, "... ancient language, why not port to modern ..."
> >     >     > If you are asking that question young Padawan you have
> >     >     > much to learn, maybe try web pages"
> >     >     >
> >     >     > 2) I'll just use Python because it works on my Laptop.
> >     >     > Later, "It will just run faster on a cluster, right?"
> >     >     > and "My little Python program is now kind-of big and has
> >     >     > become slow, should I use TensorFlow?"
> >     >     >
> >     >     > 3) <mcoy>
> >     >     > "Dammit Jim, I don't want to learn/write Fortran,C,C++ and
> > MPI.
> >     >     > I'm a (fill in  domain specific scientific/technical
> > position)"
> >     >     > </mcoy>
> >     >     >
> >     >     > My reply,"I agree and wish there was a better answer to that
> >     > question.
> >     >     > The computing industry has made great strides in HW with
> >     >     > multi-core, clusters etc. Software tools have always lagged
> >     >     > hardware. In the case of HPC it is a slow process and
> >     >     > in HPC the whole programming "thing" is not as "easy" as
> >     >     > it is in other sectors, warp drives and transporters
> >     >     > take a little extra effort.
> >     >     >
> >     >     > 4) Then I suggest Julia, "I invite you to try Julia. It is
> >     >     > easy to get started, fast, and can grow with you
> > application."
> >     >     > Then I might say, "In a way it is HPC BASIC, it you are old
> >     >     > enough you will understand what I mean by that."
> >     >     >
> >     >     > The question with languages like Julia (or Chapel, etc) is:
> >     >     >
> >     >     >   "How much performance are you willing to give up for
> >     > convenience?"
> >     >     >
> >     >     > The goal is to keep the programmer close to the problem at
> > hand
> >     >     > and away from the nuances of the underlying hardware.
> > Obviously
> >     >     > the more performance needed, the closer you need to get to
> > the
> >     > hardware.
> >     >     > This decision goes beyond software tools, there are all kinds
> >     >     > of cost/benefits that need to be considered. And, then there
> >     >     > is IO ...
> >     >     >
> >     >     > --
> >     >     > Doug
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >> Regards,
> >     >     >> Jonathan
> >     >     >> -----Original Message-----
> >     >     >> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of
> > Michael
> >     > Di
> >     >     > Domenico
> >     >     >> Sent: 04 March 2019 17:39
> >     >     >> Cc: Beowulf Mailing List <beowulf at beowulf.org>
> >     >     >> Subject: Re: [Beowulf] Large amounts of data to store and
> > process
> >     > On
> >     >     > Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina
> >     >     > <jaquilina at eagleeyet.net>
> >     >     >> wrote:
> >     >     >>> As previously mentioned we
> > don’t really need to have
> >     > anything
> >     >     >>> indexed
> >     >     > so I am thinking flat files are the way to go my only concern
> > is
> >     > the
> >     >     > performance of large flat files.
> >     >     >> potentially, there are many factors in the work flow that
> >     > ultimately
> >     >     > influence the decision as others have pointed out.  my flat
> > file
> >     > example
> >     >     > is only one, where we just repeatable blow through the files.
> >     >     >>> Isnt that what HDFS is for to deal with large flat files.
> >     >     >> large is relative.  256GB file isn't "large" anymore.  i've
> > pushed
> >     > TB
> >     >     > files through hadoop and run the terabyte sort benchmark, and
> > yes it
> >     > can
> >     >     > be done in minutes (time-scale), but you need an astounding
> > amount
> >     > of
> >     >     > hardware to do it (the last benchmark paper i saw, it was
> > something
> >     > 1000
> >     >     > nodes).  you can accomplish the same feat using less and less
> >     >     > complicated hardware/software
> >     >     >> and if your dev's are willing to adapt to the hadoop
> > ecosystem, you
> >     > sunk
> >     >     > right off the dock.
> >     >     >> to get a more targeted answer from the numerous smart people
> > on
> >     > the
> >     >     > list,
> >     >     >> you'd need to open up the app and workflow to us.  there's
> > just too
> >     > many
> >     >     > variables _______________________________________________
> >     >     >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> > Penguin
> >     > Computing
> >     >     > To change your subscription (digest mode or unsubscribe)
> > visit
> >     >     >> http://www.beowulf.org/mailman/listinfo/beowulf
> >     >     >> _______________________________________________
> >     >     >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> > Penguin
> >     > Computing
> >     >     > To change your subscription (digest mode or unsubscribe)
> > visit
> >     >     >> http://www.beowulf.org/mailman/listinfo/beowulf
> >     >     >
> >     >     >
> >     >     > --
> >     >     > Doug
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     > _______________________________________________
> >     >     > Beowulf mailing list, Beowulf at beowulf.org sponsored by
> > Penguin
> >     > Computing
> >     >     > To change your subscription (digest mode or unsubscribe)
> > visit
> >     >     > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >     >     >
> >     >
> >     >
> >     >     --
> >     >     Doug
> >     >
> >     >
> >     >
> >     >
> >
> >
> >     --
> >     Doug
> >
> >
> >
> >
>
>
> --
> Doug
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190314/7d89b83b/attachment-0001.html>


More information about the Beowulf mailing list