[Beowulf] Intel Phi musings

Tue Feb 12 08:38:01 PST 2013

Hey Stuart,

Thanks for your answer ...

That sounds compelling.  May I ask a few more questions?

So should I assume that this was a threaded SMP type application
(OpenMP, pthreads) or it is MPI based? Is the supporting CPU of the
multi-core Sandy Bridge vintage? Have you been able to compare
the hyper-threaded, multi-core scaling on that Sandy Bridge side of the
system with that on the Phi (fewer cores to compare of course).  Using the
Intel compilers I assume ... how well do your kernels vectorize?  Curious
about the observed benefits of hyper-threading, which generally offers
little to floating-point intensive HPC computations where functional unit
collision is an issue.  You said you have 2 Phis per node.  Were you
running a single job across both?  Were the Phis in separate PCIE
slots or on the same card (sorry I should know this, but I have just
started looking at Phi).  If they are on separate cards in separate
slots can I assume that I am limited to MPI parallel implementations
when using both.

Maybe that is more than a few questions ... ;-) ...

Regards,

Richard Walsh
Thrashing River Consulting

On Tue, Feb 12, 2013 at 10:46 AM, Dr Stuart Midgley <sdm900 at gmail.com>wrote:

> It was simple really.  Within 1hr, I had recompiled a large amount of our
> codes to run on the phi's and then ssh'ed to the Phi and ran them… Saw that
> a single phi was faster than our current 4 socket AMD 6276 (64 cores) and
> then ordered machines with 2 phi's in them :)
>
> I didn't bother with any of the compiler directives etc… just treated them
> like a 240core (hyper threaded) computer… and saw great scaling.
>
>
> --
> Dr Stuart Midgley
> sdm900 at sdm900.com
>
>
>
>
> On 12/02/2013, at 11:12 PM, Richard Walsh <rbwcnslt at gmail.com> wrote:
>
> >
> > Hey Stuart,
> >
> > I am interested in what sold you on the Phi.  My cursory look
> > suggested that using the Phi in Intel's offload mode (which
> > preserves the scalar performance) was not much easier to
> > program than writing in CUDA ... and that using the Phi as
> > a standalone processor while a programming convenience
> > suffers on scalar code.  Even that programming convenience
> > is limited by the fact that you have to think both in terms of
> > vectors and threads.
> >
> > Also, the speed ups I have seen generally seem modest,
> > understanding that GPU performance hype is exaggerated.
> >
> > Hearing what you like would be interesting.
> >
> > Thanks,
> >
> >
> > Richard Walsh
> > Thrashing River Consulting
> >
> > On Tue, Feb 12, 2013 at 10:02 AM, Dr Stuart Midgley <sdm900 at gmail.com>
> wrote:
> > I've started a blog to document the process I'm going through to get our
> Phi's going.
> >
> >     http://phi-musings.blogspot.com.au
> >
> > Its very sparse at the moment, but will get filled in a lot over the
> next day or so… I've finally got them booting.
> >
> > FYI we currently have 100 co-processors and should have the next 160 or
> so in a few weeks.
> >
> >
> > --
> > Dr Stuart Midgley
> > sdm900 at sdm900.com
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20130212/a444590e/attachment.html>