[Beowulf] Mixing 32-bit compute nodes with 64-bit head nodes
Andrew D. Fant
fant at pobox.com
Wed May 10 18:53:22 PDT 2006
Joe Landman wrote:
> Andrew D. Fant wrote:
>> I know that the common wisdom on this subject is "don't do that", but for
> Shouldn't be an issue if you have a sane distribution and distribution
> load system, a way to automatically handle the ABI (bit width) during
> installation/package selection. Distros which do this (mostly)
> correctly include FCx, SuSE, Centos, ...
We're using Gentoo, so I'm not worried about the system having ABI issues.
Compiling from scratch has eliminated most of the shared library and bit-width
issues I used to worry about.
>> into. The 64-bit motivation is mostly about providing adequate memory
>> multiple users running gui applications.
> Hmmm... so you want to provide a single 64 bit machine to run GUI code
> on rather than hacking stuff for the cluster? Assuming I understood
> this right, apart from contention for that resource, this should be
> fine. Is there any reason why the SGE/PBS methods (qrsh/qsub -I)
> wouldn't work? Or is this the pain of which you speak?
We're using a certain proprietary distributed load management facility that uses
their own job launch protocol and that doesn't support ssh tunneling of $DISPLAY
back to the desktop. ( $0.0021 if you can guess which one it is). We are forced
to have wrapper scripts that launch the application on a compute node after
asking the DLM for the least loaded system. We've found that it's not trivial
to validate things like $CWD and pass it cleanly to the remote system, so the
scripts are pretty limited in what they can do. Also, because the jobs don't
run under the auspices of the DLM, they don't show up in the batch accounting
logs and I end up having to mangle the pacct results from 40 different systems
to satisfy the reporting requirements of my (multiple) management chains. My
uncle used to tell me that there is no problem that cannot be solved by the
suitable application of high explosives. Given my limited staff and time,
throwing a big box at the problem seems easiest.
>> Has anyone had any success with this approach, or failing that, any
>> stories that would support the more flexible approach of separating
>> the shell
>> server from the head node?
> I think this is actually a good practice. You really don't want users
> logging onto a management node to run jobs. You would likely prefer them
> to run on some sort of user-login-node. Lots of cluster distros do fuse
> these two. This is assuming a non-SSI machine (e.g. not
Yeah. I don't see many people doing this, and it surprises me. If the users
can connect to a machine that isn't the super-duper must-be-up system, it means
that adding front-end capacity and fault-tolerance with simple round-robin DNS
becomes easier, and it means that a user bziping up their last set of runs on
the head node won't siphon off all the CPU cycles to the point that system
logging and batch scheduling are starved (don't laugh, I had a user fire off 10
bzips on a 2 CPU system and set my pager off in the middle of a staff meeting.
if cluster management had been on the user node, bad things could have happened).
> The only major issue is that if they then submit a job with a binary
> which happens to be the wrong ABI, you will get lots of dud runs and
> unhappy users. You can fix that with some clever defaults on the
> submission side for each user-login-node.
This is what my boss and I were worried about as we wargamed various scenarios
this afternoon. My bias is to put some prelaunch scripts in place that verify
binary formats for jobs coming from the "central" login node, and bomb out if
someone tries to run a 64-bit binary on a 32-bit machine. For jobs that
originate on the "local" user login node, I am more inclined to avoid that pain,
since people who are power-user enough to use the local cluster user node ought
to be smart enough to understand executable formats. It's people using the
central shell server who might get confused.
Now, if we could only get more application vendors to accept that separating the
gui and the computational engine via a batch engine is a good idea. If the
government can make this work with Ecce, and Ansys can do it, why can't (for the
sake of argument) Fluent understand that someone might want to set up a big
calculation in the GUI without running the simulation on the same system? And
don't get me started on vendors that ship a version of MPI that depends on a
static list of hosts. How many people really build a cluster for each application?
Andrew Fant | And when the night is cloudy | This space to let
Molecular Geek | There is still a light |----------------------
fant at pobox.com | That shines on me | Disclaimer: I don't
Boston, MA | Shine until tomorrow, Let it be | even speak for myself
More information about the Beowulf