[Beowulf] Mixing 32-bit compute nodes with 64-bit head nodes

Wed May 10 17:37:39 PDT 2006

Andrew D. Fant wrote:
> I know that the common wisdom on this subject is "don't do that", but for

Shouldn't be an issue if you have a sane distribution and distribution 
load system, a way to automatically handle the ABI (bit width) during 
installation/package selection.  Distros which do this (mostly) 
correctly include FCx, SuSE, Centos, ...

> various reasons, I have to look at the possibility of putting a 64-bit system
> (probably EMT as opposed to Opteron) as the user node of our cluster, I have a
> separate management node that handles the batch scheduler, license management
> and compute node imaging, and related duties, which would remain a 32-bit Xeon,
> so that isn't going to directly factor into the decision.  This is motivated by
> a desire to allow users to run interactive jobs on the user node instead of
> playing games with wrapper scripts to run them on compute nodes.  My personal
> preference would be to have a separate system that can remotely submit to the
> existing cluster via the batch queues, but there is a desire by management to
> limit the number of different systems that a user needs to know about logging
> into.  The 64-bit motivation is mostly about providing adequate memory for
> multiple users running gui applications.

Hmmm... so you want to provide a single 64 bit machine to run GUI code 
on rather than hacking stuff for the cluster?  Assuming I understood 
this right, apart from contention for that resource, this should be 
fine.  Is there any reason why the SGE/PBS methods (qrsh/qsub -I) 
wouldn't work?  Or is this the pain of which you speak?

> Has anyone had any success with this approach, or failing that, any horror
> stories that would support the more flexible approach of separating the shell
> server from the head node?

I think this is actually a good practice.  You really don't want users 
logging onto a management node to run jobs. You would likely prefer them 
to run on some sort of user-login-node.  Lots of cluster distros do fuse 
these two.  This is assuming a non-SSI machine (e.g. not 
Scyld/bproc/Clustermatic/...).

The only major issue is that if they then submit a job with a binary 
which happens to be the wrong ABI, you will get lots of dud runs and 
unhappy users.  You can fix that with some clever defaults on the 
submission side for each user-login-node.

> 
> Thanks,
> 	Andy
> 

-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615