[Beowulf] hpl size problems
landman at scalableinformatics.com
Tue Oct 4 06:01:32 PDT 2005
Last I checked is was really hard to get mpich to happily switch between
ch_p4, ch_gm, ch_shmem, ch_p4.iwarp, ... easily. In fact you have to
build a new one for each interface (this is *bad* IMO, but the MPICH
folks on the list must have had a really good reason for doing this, or
at least I hope it was a good reason). This leads *precisely* to the
scenario that Chris indicates.
There are some mpich versions out there that let you dynamically link at
run time (this is *good*) such as the Scali. There is mvapich. There
is LAM. Some handle this a little better than others.
But you are still stuck with an explosion of mpich/... . And this is in
the truest sense of the word, a nightmare.
It must be nice to have a small single group of mandated codes,
libraries, and ABIs. "Thou shalt not use anything but what we provide,
and the heck with your requests". Only works until you get that one
user who will use lots of your cycles, but you know, they really need
xyz-2.0.1, and you have xyz-1.2.3. So how do you handle this?
(rhetorical question). Short answer is you need discipline in your
flexibility. You can be draconian and say "if it aint RPM then we
aren't gonna do it" in which case they supply you an RPM which breaks
other peoples codes. ... Hello. Is this thing on? That doesn't work,
wrong discipline IMO. You can err the other way "Sure we will install
it whereever you like" and whammo, some poor supercomputing user whom
has been using default paths for everything just happens to be on the
business end of your nice shiny new (and incompatible) xyz-2.0.1, as
theirs is linked against 1.2.3. Can't happen you say? Happens all the
time, with RPMs for that matter as well as with tarballs.
Its all about setting up a discipline for change, educating users,
understanding that it is inevitable, and that you need to adapt to it if
you have more than 1 (group of) user(s). If you need to have 14
different mpich, then you need to make sure your administrative and
installation processes can handle it. This turns out to be what
modules is exceptionally good at helping with. You can also change the
default install paths (remember the thing you buzzed me for
previously?), and select paths based upon a well defined algorithm
rather than a "database" lookup (modules). Lots of folks use this happily.
In short this is a real problem for large shared computing resource
facilities with lots of users of varying code requirements, that often
are beyond the initial scope of deployment and system build. If you
don't have a good method of handling such cases, you can either deny
they exist and insist upon LARTs (bad), or come up with a method to
adapt to the need (can be bad or good, depending upon how hard you work
at setting up a sensible system). I know it doesn't jive with "The One
Robert G. Brown wrote:
> Chris Samuel writes:
>> On Fri, 30 Sep 2005 08:15 pm, Leif Nixon wrote:
>>> So what about us poor souls who need to have several versions of the
>>> same package installed in parallel?
>> Our IBM Power5 Linux cluster has *14* different MPICHs, combinations
>> of 32/64-bits, myrinet/p4 and various mixes of gcc (different
>> versions) with xlc and xlf.
> You, my friend, need a sucker rod badly. [Ref: man syslogd, /Sucker].
> Or perhaps in Australia you have a different instrument?
>> Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
>> Victorian Partnership for Advanced Computing http://www.vpac.org/
>> Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf