[Beowulf] hpl size problems

Tue Sep 27 15:58:58 PDT 2005

I keep forgetting that RGB is actually a cluster of text generation 
'bots able to crank out more characters per second than most folks can 
speak ... ;)

Robert G. Brown wrote:
> Joe Landman writes:
> 
>> <slight aside>
>>

[... edited for clarity ...]

>> This has been a pet peeve of mine for a while.  I like the 

                                ^^^^^^^
(arrows should be under pet peeve)

>> install-minimum and add-needed-bits philosophy more than I like the 
>> everything-including-the-kitchen-sink.  Lots of services seem to get 
>> activated when you install the-kitchen-sink.

As for stuff that gets activated when you install it, yes, there are a 
fair number of things which get put onto the machine and start happily 
absorbing cycles, causing context switches, etc.

For the little test machine I have set up with our SuSE cluster, we are 
getting  zero load according to vmstat, and about 30ish cs/s.  This is 
what I expect.  Under heavy computational load, I don't expect the cs/s 
to rise much, unless we start beating on the IO system.

> What gets activated when you install tex?  Absolutely nothing.  It is a
> userspace non-daemonic application.  So this particular example is
> mostly irrelevant 

Where it is relevant is the wasted time/space to install it.  This is 
actually one of the reasons why warewulf is so interesting, as it puts 
all the "wasted" stuff off on the side, and only "installs" what you need.

> -- it might annoy you personally to have things
> installed that are never used, 

hence the phase "pet peeve"

> but it should have zero impact on node
> performance.  Open Office ditto -- it doesn't jump out and run itself
> AFAIK when nobody is logged into a console interface (or if the system
> HAS no console interface).  

Some of these things do as I remember, though you need to install and 
activate them in a number of cases.

> At most you waste a few seconds and a few
> hundred megabytes of disk by leaving them in, 

Latter yes (more like GB), former no.  Trimming the fat from the SuSE 
cluster install got it from over an hour down to about 8 minutes with 
everything, per node.  It can go even further if I want to push it, but 
I think I hit the point of diminishing returns.  I don't mind waiting up 
to about 10 minutes for a reload, beyond that, I mind.

> but the install time is
> generally parallelized and nearly irrelevant over an installation
> lifetime of weeks or months and the smallest disks one can buy nowadays
> are huge, huge, huge compared to the fattest possible linux install.

Generally true, but I wouldn't call them irrelevant.  Lost in the noise 
of use is more like it.  Still annoying though.

> You could install ALL the major linux distros and their entire repos on
> the 160 GB disk that comes in $600 systems that are Best Buy specials,
> and probably have room left for your entire ogg collection, a few
> movies, and your application's scratch space.  So if you have a disk at
> all, what you put on it is probably irrelevant.

???  A fair number of the distros are starting to top out at 10 GB +/- a 
bit.  Even with a 160 GB drive, this is still quite a few bytes to push.

[...]

>  http://www.phy.duke.edu/~rgb/wulfweb/vmstat.html
> 
> you can see stats (and for that matter look at the actual running
> applications) on almost all of our dual opterons at a glance.  Even
> nodes carrying load averages of 14 sustained (crazed grad student, don't
> ask:-) still see a measely 30 or so context switches per sec, ballpark
> of 1000 interrupts per second, and manage to run without swapping or
> paging.  Services that are offered but not being used just don't take
> that much by way of resources.

Better to turn them off, or even better, to un-install them completely.

[...]

> If a node/system has a disk and EVER might need a given application
> (even tex) I wouldn't hesitate to install it -- disk is cheap, services
> can be turned off, and installing things (and turning them off if need
> be in %post) can be most easily done via a kickstart file and then
> forgotten.  Much better than having to remember what was added to a node
> afterwards, MUCH better than having to add things to a node (or all
> nodes) afterwards by hand. 

Hmmm.   We have somewhat different philosophies.  I like a really small 
core install atop which you add what you need per node.  Applications 
are served (nfs, et al).  I especially like managing the thousand copies 
of bioperl or other bits you need to install on sizeable clusters.  Some 
of these things are simply not amenable to RPMs, so RPMs are not options 
for them (except in a very rudimentary way).

> OTOH, sure, if you know something isn't necessary (X11, for example, on
> a system without a video card or console) by all means leave it out, and
> leave out strange applications that DO suck up resources for sure

I view resources as machine time + disk space + headache of dealing with 
dependencies ...

> (recognizing that there aren't that many of them, really; even a fat old
> standard workstation install plus this and that probably doesn't have
> any).  Kickstart is ideal for this as you can tinker with your node
> configuration until your package list and %post are just right, and then
> just do a full (re)install.  Most of the stuff that DOES need some sort
> of handiwork or that DOES run and consume real-time resources is
> associated with e.g. video, audio, multimedia, and most cluster nodes
> don't need it.

Yes.  The point I was making with TeX (though I could have picked a 
better example), was that there is lots of extraneous stuff that has no 
business being on a cluster node.  wvdial for example.  Some of it may 
be listening/active, some might be dormant/inactive.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615