[Beowulf] Cluster OpenMP
lindahl at pathscale.com
Tue May 16 19:54:06 PDT 2006
On Tue, May 16, 2006 at 09:53:59PM -0400, Joe Landman wrote:
> SGI's Altix and O3k (and O2k) were hardware versions of DSM upon which
> OpenMP was based. It is possible to do it and to do it well, though if
> you want it fast it is going to cost you money (low latency). This is
> where coupling this with something like Infinipath and similar
> technologies *might* be interesting.
I'm not really holding my breath. InfiniPath can beat the Altix
outright on some MPI programs and clobber it in price/performance on
most MPI programs, but software DSM is a dog. And this Intel gizmo
requires source code changes (labeling shared variables), and it's
probably much harder to tune than your usual OpenMP system
(synchronization points are expensive.) It's a hard problem to solve,
and the Intel guys sound like they've done all the right things, but
I'm really skeptical about the practicality of the whole thing.
I'd be willing to bet that most OpenMP users would get more bang for
their $$s by buying a 4-socket dual-core Opteron and not changing
their code. Heck, buy a stack of 'em...
> I admit freely that I like the semantics of programming with OpenMP. It
> is easy to do, and it isn't all that hard to figure out how to get
> performance out of it, and where your performance has gone.
Most people have a problem with OpenMP performance. The ease of use is
that you don't have to worry too much about locality. Then performance
tuning involves worrying too much about locality, and false sharing.
Yeah, you can program halo-exchange for a stencil algorithm
efficiently in OpenMP, by standing on your head and hopping. Or you
can just write it in MPI, and the locality will always be perfect.
More information about the Beowulf