[Beowulf] A couple of interesting comments

Prentice Bisbal prentice at ias.edu
Tue Sep 23 13:08:01 PDT 2008


Oops. e-mailed to the wrong address. The cat's out of the bag now! No
big deal.  I was 50/50 about CC-ing the list, anyway. Just remove the
phrase "off-list" in the first sentence, and that last bit about not
posting to the list because...

Great. I'll never get a job that requires security clearance now! ;)

--
Prentice <---- still can't figure out how to use e-mail properly



Prentice Bisbal wrote:
> Gerry,
> 
> I wanted to let you know off-list that I'm going through the same
> problems right now. I thought you'd like to know you're not alone.  We
> purchased a cluster from the *allegedly* same vendor. The PXE boot and
> keyboard errors were the least of our problems.
> 
> First, our cluster was delayed 2 months due to shortages of the network
> hardware we specified. It was not the vendor standard for clustering,
> but still a brand they resold.
> 
> When it did arrive, the doors were damaged by the inadequately equipped
> delivery co.
> 
> When the technician arrived to finish setting up the cluster, he
> discovered that the IB cables provided were too short to be within spec:
> the bend radius would be too tight, and were too short to be supported
> from above the connectors.
> 
> And, the final problem I'm going to mention: the fiber network cables to
> connect our ethernet switches to each other (we have Ethernet and IB
> networks in this cluster) were missing.
> 
> It's been over two weeks since our cluster arrived, and one week since
> the technician noticed these shortages and reported them. Still haven't
> had these problems rectified, and the technician will have to fly to our
> site again in a couple weeks to complete the installation.
> 
> I'm writing an article about this experience for Doug to publish. I
> haven't posted this to the mailing list b/c I'm not sure what my
> management will be happy with me sharing (the article will be reviewed
> by them before publishing).
> 
> --
> Prentice
> 
> 
>> We recently purchased a set of hardware for a cluster from a hardware 
>> vendor.  We've encountered a couple of interesting issues with bringing 
>> the thing up that I'd like to get group comments on.  Note that the RFP 
>> and negotiations specified this system was for a cluster installation, 
>> so there would be no misunderstanding...
>>
>> 1.  We specified "No OS" in the purchase so that we could install CentOS 
>> as our base.  We got a set of systems with a stub OS, and an EULA for 
>> the diagnostics embedded on the disk.  After clicking thru the EULA, it 
>> tells us we have no OS on the disk, but does not fail to PXE.
>>
>> 2.  BIOS had a couple of interesting defaults, including warn on 
>> keyboard error (Keyboard?  Not intentionally.  This is a compute node, 
>> and should never require a keyboard.  Ever.)  We also find the BIOS is 
>> set to boot from hard disk THEN PXE. But due to item 1, above, we never 
>> can fail over to PXE unless we load up a keyboard and monitor, and hit 
>> F12 to drop to PXE.
>>
>> In discussions with our sales rep, I'm told that we'd have had to pay 
>> extra to get a real bare hard disk, and that, for a fee, they'd have 
>> been willing to custom-configure the BIOS. OK, with the BIOS this isn't 
>> too unreasonable: They have a standard BIOS for all systems and if you 
>> want something special, paying for it's the norm...  But, still, this is 
>> a CLUSTER installation we were quoted, not a desktop.
>>
>> Also, I'm now told that "almost every customer" ordered their cluster 
>> configuration service at several kilobucks per rack.  Since the team I'm 
>> working with has some degree of experience in configuring and installing 
>> hardware and software on computational clusters, now measured in at 
>> least 10 separate cluster installations, this seemed like an unnecessary 
>> expense.  However, we're finding vendor gotchas that are annoying at the 
>> least, and sometimes cause significant work-around time/effort.
>>
>> Finally, our sales guy yesterday was somewhat baffled as to why we'd 
>> ordered without OS, and further why we were using Linux over Windows for 
>> HPC.  Not trying to revive the recent rant-fest about Windows HPC 
>> capabilities, can anyone cite real HPC applications generally run on 
>> significant clusters (I'll accept Cornell's work, although I remain 
>> personally convinced that the bulk of their Windows HPC work has been 
>> dedicated to maintaining grant funding rather than doing real work)?
>>
>> No, I won't identify the vendor.
>> -- 
>> Gerry Creager -- gerry.creager at tamu.edu
>> Texas Mesonet -- AATLT, Texas A&M University
>> Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
>> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
> 

-- 
Prentice



More information about the Beowulf mailing list