[Beowulf] hpl size problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Oct 4 06:01:32 PDT 2005
- Previous message: [Beowulf] hpl size problems
- Next message: [Beowulf] hpl size problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
??? Last I checked is was really hard to get mpich to happily switch between ch_p4, ch_gm, ch_shmem, ch_p4.iwarp, ... easily. In fact you have to build a new one for each interface (this is *bad* IMO, but the MPICH folks on the list must have had a really good reason for doing this, or at least I hope it was a good reason). This leads *precisely* to the scenario that Chris indicates. There are some mpich versions out there that let you dynamically link at run time (this is *good*) such as the Scali. There is mvapich. There is LAM. Some handle this a little better than others. But you are still stuck with an explosion of mpich/... . And this is in the truest sense of the word, a nightmare. It must be nice to have a small single group of mandated codes, libraries, and ABIs. "Thou shalt not use anything but what we provide, and the heck with your requests". Only works until you get that one user who will use lots of your cycles, but you know, they really need xyz-2.0.1, and you have xyz-1.2.3. So how do you handle this? (rhetorical question). Short answer is you need discipline in your flexibility. You can be draconian and say "if it aint RPM then we aren't gonna do it" in which case they supply you an RPM which breaks other peoples codes. ... Hello. Is this thing on? That doesn't work, wrong discipline IMO. You can err the other way "Sure we will install it whereever you like" and whammo, some poor supercomputing user whom has been using default paths for everything just happens to be on the business end of your nice shiny new (and incompatible) xyz-2.0.1, as theirs is linked against 1.2.3. Can't happen you say? Happens all the time, with RPMs for that matter as well as with tarballs. Its all about setting up a discipline for change, educating users, understanding that it is inevitable, and that you need to adapt to it if you have more than 1 (group of) user(s). If you need to have 14 different mpich, then you need to make sure your administrative and installation processes can handle it. This turns out to be what modules is exceptionally good at helping with. You can also change the default install paths (remember the thing you buzzed me for previously?), and select paths based upon a well defined algorithm rather than a "database" lookup (modules). Lots of folks use this happily. In short this is a real problem for large shared computing resource facilities with lots of users of varying code requirements, that often are beyond the initial scope of deployment and system build. If you don't have a good method of handling such cases, you can either deny they exist and insist upon LARTs (bad), or come up with a method to adapt to the need (can be bad or good, depending upon how hard you work at setting up a sensible system). I know it doesn't jive with "The One True Way"(tm). Joe Robert G. Brown wrote: > Chris Samuel writes: > >> On Fri, 30 Sep 2005 08:15 pm, Leif Nixon wrote: >> >>> So what about us poor souls who need to have several versions of the >>> same package installed in parallel? >> >> Our IBM Power5 Linux cluster has *14* different MPICHs, combinations >> of 32/64-bits, myrinet/p4 and various mixes of gcc (different >> versions) with xlc and xlf. > > You, my friend, need a sucker rod badly. [Ref: man syslogd, /Sucker]. > Or perhaps in Australia you have a different instrument? > > rgb > >> >> Chris >> -- >> Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager >> Victorian Partnership for Advanced Computing http://www.vpac.org/ >> Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia >> > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] hpl size problems
- Next message: [Beowulf] hpl size problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
