[Beowulf] Anybody using Redhat HPC Solution in their Beowulf

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Thu Oct 28 07:32:55 PDT 2010




On 10/28/10 7:09 AM, "Ellis H. Wilson III" <ellis at runnersroll.com> wrote:

> On 10/27/10 12:32, Lux, Jim (337C) wrote:
>> I don't know about this model.
>> This is like developing software on prototype hardware.  The hardware guys
>> and gals keep wanting to change the hardware, and the software developers
>> complain that their software keeps breaking, or that the hardware is buggy
>> (and it is).
> 
> I wasn't suggesting the CS guys affect the correctness of the stack or
> kernel, my comment was purely performance-specific:
> 
> "CS guys...can once in a while trace workloads, test new load balancing
>   mechanisms, try different kernel settings for performance, etc."
> 
> Obviously if you are altering things that endanger the correctness of
> the scientific workload people will be upset.  If your tracer fails,
> your load balancer degrades performance slightly, or your new cache
> replacement policy sucks then the program might run slow but it should
> complete correctly.

And I agree with you here, but the problem is what I next commented on:
You're asking the CS department (full of researchers wanting to do novel
research for their dissertation or to move them towards tenure) to be
sysadmins.  Being an SA is fun, once.
> 
>> But I don't think the CS guys would drool over the possibility of
>> administering a cluster. The CS guys get to be sysadmin/maintenance
>> types...not very fun for them, and not the kind of work that would work for
>> their dissertation.
> 
> The difficulty I have getting access to alter and research root-level
> stuff on clusters is so great that administration by me or my adviser
> would allow my dissertation to move forward much more rapidly.  Instead
> systems researchers try and simulate large systems, which as you can
> imagine often leads to inaccurate or downright incorrect results and
> consequent publications.

> 
> Frankly, I'd be the rock-star of the CS department if I had
> administrative control of a reasonably-sized cluster.  Everyone (in CS)
> would be coming to me to get their research done.  So it requires a
> little administration??  With all my spare cycles not having to write
> simulation codes for an entire I/O stack it would be totally worth it.
> 
Yes, but that would mean more like "sharing a cluster" as opposed to CS
providing support and SA services.  And "sharing a cluster" means that the
cluster architecture has to be appropriate for both people, which is
challenging as has been addressed here recently.  Then there's the "if
you're getting benefit, can you kick in part of the cash needed" which gets
into a whole other area of complexity.

It works like this.. You (A) need 1000 units of resources but can only
afford 500 by yourself. However, you don't need your 1000 units all the
time. So you find someone else who has similar needs who can share, call
them B.. They've only got 300 resource units of cash, so you both go find
someone else who needs similar stuff (call them C), and they need 1200
units, but only 20% of the time, but they've got enough cash, along with A,
to do the deal.  You don't want to leave B out in the cold so you make the
cluster a little bit bigger (say, 1500), and use A, B, and C money.  You're
moving along, got the procurements in place, etc.  Now C gets the unhappy
news that their funding stream has been "rephased" so you need to find a
fourth party "D" to pick up the slack.  Meanwhile, B is unhappy about C
coming and going, because B was excited about getting a bigger cluster and
revised their research plans to take advantage of it, to their sponsor's
delight and pleasure. Now that they won't get the bigger cluster, they have
to go back to the sponsor and descope from their recent upscope.  D saves
the day, for a while, but because their research needs a different
interconnect than A, B, and C were going to use, we have to change the
cluster architecture, just a bit.  Meanwhile A, who started the whole thing,
gets real tired of spending all their time negotiating cluster usage
agreements and looking for funding, and they throw up their hands and bails
out of the project, buying their own cluster with half as many computers
that are 3 times as fast. B,C, and D are now gently twisting in the wind,
trying to figure out what to do next, because the deadline for the paper and
grant applications is coming up soon.

The institution steps in and says, cease this wasteful squabbling,
henceforth all computing resources will be managed by the institution: "to
each according to their needs", and we'll come up with a fair way to do the
"from each according to their ability". Just submit your computing resource
request to the steering committee and we'll decide what's best for the
institution overall.

Yes.. Local control of a "personal supercomputer" is a very, very nice
thing.

And so it goes...


> ellis
> 





More information about the Beowulf mailing list