[Beowulf] Building new cluster - estimate
landman at scalableinformatics.com
Wed Aug 6 19:01:17 PDT 2008
Eric Thibodeau wrote:
>> Advantage of modules is you can upgrade them without upgrading the
>> kernel. Go ahead, build in that e1000 driver. I dare yah... :(
> Ok...I didn't put enought emphasis on "main" stuff....as in, _all you
> need to get the system booted, which essentially means HDD chipset
> drivers, the rest I do build as a module (NIC, video and such).
>> More to the point it does give some good flexibility for end users
>> with a need to keep the core "separate" from the drivers for maintenance.
>> Initrd is subtle and quick to anger. One must use burnt offerings to
>> placate the spirits of initrd.
... now I don't mean hardware burnt offerings ... smoke rising from
your motherboard may not placate the spirits of initrd, they definitely
may impede further operations ...
>> Well, it would be a heck of a lot nicer if the tools were a little
>> more forgiving ... Oh you don't have this driver in your initrd ... ok
>> ... PANIC (mwahahahaha)
> Pahahahahah... Point in case, I am building a CD-only cluster system
> (based on Gentoo) and I am currently _NOT_ using initrd because all that
> really needs to be built in is NFSroot support an all NICs I care to put
> in. Obviously this is a deprecated approach but it's proven to be the
> most effective and easy to maintain in my case.
We build an integrated NFSroot and e1000 and a few other things for a
customer. Fixed hardware for their cluster. From bare-metal-off to
operational infiniband compute node in ~45-60 seconds (I say 45, but a
few things took a little longer to start, like SGE).
>>> ...and such. I'd tell you to use the Gentoo Clustering LiveCD but
>>> that's work in progress...you could still build the cluster using
>>> Gentoo...if you're performance savvy...and want things like OpenMP
>>> capable compiler
>> I have been hearing claims like this for a long time. I have not seen
>> any real tests that back these claims up. Do you have any?
> I'm actually working on such benchmarks. Did you know that compiling
> with the default ICC optimization will cause your bridge to crumble due
> to floating point assumptions?...
> Ok, so my computation have diverged horribly mostly because I am
> computing 47(vector size)*5000(K-Means clusters)*6,787,955(learning
> dataset)*5(iterations to convergence) for a total of 7,975,847,125,000
> FLOPS (or about 8Tera FLOPS) as part of an iterative learning process,
> the error adds up. So performance is very sensitive to what your
> intended goal is too ;)
Hmmm.... sounds like a fun computation. Error definitely adds up.
Renormalization is your friend (well, some times, assuming a linear system).
>> Most of the arguments I have heard are "oh but its compiled with
>> -O3" or whatever. Any decent HPC code person will tell you that that
>> is most definitely not a guaranteed way to a faster system ...
> Hey...as I stated above, one would have to be quite silly to claim -O3
> as the all well and all good optimization solution. At least you can
> rest assured your solutions will add up correctly with GCC. To get a
Well, sometimes. You still need to be careful with it.
This said, I am not sure icc/pgi/... are uniformly better than gcc. I
did an admittedly tiny study of this http://scalability.org/?p=470 some
time ago. What I found was the gcc really held its own. It did a very
good job on a very simple test case.
Then again, the fortran version was simply faster than the C version,
but that can be explained ... by ... er ... ah ... something.
> "faster" system, you really have to look at your app, use strace, ltrace
> and gprof, then you can play with that. What I _am_ saying though is
> that Gentoo _does_ empower the administrator by giving him the ability
> to customize the OS if a bottleneck is to be identified.
Yup. There is nothing like a profile of an app running the code, to see
where it is spending its time to decide between code shifts and
>>> (gcc-4.3.1, or ICC ;) ) _integrated_ into your system (not a hackish
>> Er... We often use several different compilers in several different
>> trees. Several gccs, pgi, icc, eieio ... you name it. All are
> Are-you currently able to run GCC-4.3.x versions on your current setup,
Currently running 4.2.3-2ubuntu7 on my laptop. Other machines
(development box) has something like 4 different gccs there. I haven't
tried 4.3.x yet ... had planned to, but work gets in the way.
> I'm actually eager to know. I'm still living under the ASSumption od
> binary distributions not coping too well with multi-library
> environments. Point in case, one of my colleagues _really_ wanted
No, our systems (Ubuntu, SuSE, Centos) seem to have no real problems
apart from the occasional broken hard wired /usr/lib with the wrong ABI
in a configure/make file. Usually easy to fix.
> firefox 3 on his ubuntu system. The installer trickled down to having to
> uninstall glibc...and he forced it to YES (and this is just a browser,
> not something that is used to _make_ code and would be tied to glibc)
Hmmm... I have firefox 3 on this system (64 bit) and I run icecat for 32
bit access (java and other things). No glibc changes (apart from
security patches). He must have done something horribly wrong. We have
multiple mixed ABI ubuntu/centos/suse systems, and haven't had issues.
>>> afterthought of an RPM that pulls in a new glibc that breaks the install
>> Er ... not the slightest clue as to what you are talking about. I
>> haven't seen gcc, icc, pgi, ... touch our glibc.
>> Maybe I am missing the fun. Which ICC version is this? Which gcc is
>> this, which glibc is this?
> Sorry about that I might have been misleading, GCC is generally the one
> most sensitive to glibc, not the other ones although the latest ICC
> (10.1.x series) do claim compatibility with the GNU environment so it
> might get a little more dependency there.
We have installed the 10.1.015 on customer machines from Centos 5.2
through SuSE 10.x through Ubuntu with nary a problem. Very different
glibc's. No issues with code generation.
Binary distributions aren't evil. They do work, quite well in most cases.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf