[Beowulf] A couple of interesting comments
prentice at ias.edu
Tue Sep 23 13:08:01 PDT 2008
Oops. e-mailed to the wrong address. The cat's out of the bag now! No
big deal. I was 50/50 about CC-ing the list, anyway. Just remove the
phrase "off-list" in the first sentence, and that last bit about not
posting to the list because...
Great. I'll never get a job that requires security clearance now! ;)
Prentice <---- still can't figure out how to use e-mail properly
Prentice Bisbal wrote:
> I wanted to let you know off-list that I'm going through the same
> problems right now. I thought you'd like to know you're not alone. We
> purchased a cluster from the *allegedly* same vendor. The PXE boot and
> keyboard errors were the least of our problems.
> First, our cluster was delayed 2 months due to shortages of the network
> hardware we specified. It was not the vendor standard for clustering,
> but still a brand they resold.
> When it did arrive, the doors were damaged by the inadequately equipped
> delivery co.
> When the technician arrived to finish setting up the cluster, he
> discovered that the IB cables provided were too short to be within spec:
> the bend radius would be too tight, and were too short to be supported
> from above the connectors.
> And, the final problem I'm going to mention: the fiber network cables to
> connect our ethernet switches to each other (we have Ethernet and IB
> networks in this cluster) were missing.
> It's been over two weeks since our cluster arrived, and one week since
> the technician noticed these shortages and reported them. Still haven't
> had these problems rectified, and the technician will have to fly to our
> site again in a couple weeks to complete the installation.
> I'm writing an article about this experience for Doug to publish. I
> haven't posted this to the mailing list b/c I'm not sure what my
> management will be happy with me sharing (the article will be reviewed
> by them before publishing).
>> We recently purchased a set of hardware for a cluster from a hardware
>> vendor. We've encountered a couple of interesting issues with bringing
>> the thing up that I'd like to get group comments on. Note that the RFP
>> and negotiations specified this system was for a cluster installation,
>> so there would be no misunderstanding...
>> 1. We specified "No OS" in the purchase so that we could install CentOS
>> as our base. We got a set of systems with a stub OS, and an EULA for
>> the diagnostics embedded on the disk. After clicking thru the EULA, it
>> tells us we have no OS on the disk, but does not fail to PXE.
>> 2. BIOS had a couple of interesting defaults, including warn on
>> keyboard error (Keyboard? Not intentionally. This is a compute node,
>> and should never require a keyboard. Ever.) We also find the BIOS is
>> set to boot from hard disk THEN PXE. But due to item 1, above, we never
>> can fail over to PXE unless we load up a keyboard and monitor, and hit
>> F12 to drop to PXE.
>> In discussions with our sales rep, I'm told that we'd have had to pay
>> extra to get a real bare hard disk, and that, for a fee, they'd have
>> been willing to custom-configure the BIOS. OK, with the BIOS this isn't
>> too unreasonable: They have a standard BIOS for all systems and if you
>> want something special, paying for it's the norm... But, still, this is
>> a CLUSTER installation we were quoted, not a desktop.
>> Also, I'm now told that "almost every customer" ordered their cluster
>> configuration service at several kilobucks per rack. Since the team I'm
>> working with has some degree of experience in configuring and installing
>> hardware and software on computational clusters, now measured in at
>> least 10 separate cluster installations, this seemed like an unnecessary
>> expense. However, we're finding vendor gotchas that are annoying at the
>> least, and sometimes cause significant work-around time/effort.
>> Finally, our sales guy yesterday was somewhat baffled as to why we'd
>> ordered without OS, and further why we were using Linux over Windows for
>> HPC. Not trying to revive the recent rant-fest about Windows HPC
>> capabilities, can anyone cite real HPC applications generally run on
>> significant clusters (I'll accept Cornell's work, although I remain
>> personally convinced that the bulk of their Windows HPC work has been
>> dedicated to maintaining grant funding rather than doing real work)?
>> No, I won't identify the vendor.
>> Gerry Creager -- gerry.creager at tamu.edu
>> Texas Mesonet -- AATLT, Texas A&M University
>> Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
>> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf