[Beowulf] cluster building advice?

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Tue Sep 18 01:56:56 PDT 2012


Dear all,

really good advice here!

I would like to add something: For a smaller cluster and if you don't want to 
use puppet, what I am doing is I am rsync the nodes from a local directory on 
the headnode. That way I can update the software easily by simply adding it to 
the node-directory on the headnode and run rsync on the nodes. 

As you are doing an PXE on the nodes, you might want to add something like 
memtest and I also have installed an NFS boot here as well. So if there is a 
problem with then node I can look into it (memtest for memory, any other issue 
like disc problems the NFS boot is good for it). I am also using the NFS boot 
for the installation (same as above: copy the files over via rsync). 

I hope that helps a bit.

All the best from London

Jörg

On Tuesday 18 September 2012 07:42:58 Bill Broadley wrote:
> On 09/16/2012 02:52 PM, Jeffrey Rossiter wrote:> The intention is for
> the system to be
> 
> > used for scientific computation.
> 
> That doesn't narrow it down much.
> 
> > I am trying to decide on a linux
> > distribution to use.
> 
> I suggest doing it yourself based on whatever popular linux distro you
> have experience with.  Assuming general linux systems administrator
> proficiency, it's not particularly hard.  I'd suggest starting with
> Scientific linux (especially if your applications assume it) or
> Debian/Ubuntu (which seem to have larger repositories).  I'd lean
> towards Ubuntu if you are running new hardware since Sandy Bridge (new
> intel) and Bulldozer (new AMD) seem to benefit from the latest kernels.
> 
> Then add:
> * Cobbler for PXE installing (or functionally similar software), network
>   configuration, dhcp, dns, mac address, IP addresses, etc.
> * Puppet/Chef for configuration management (everything post-install)
> * Torque/Slurm for batch queue
> * Environmental modules or similar to help let users easily load the
>   needed libraries/apps/environment they need in a reproducible way.
> * Ganglia/cacti/munin for graphing resource utilization.
> * /share/apps/<application name>-<version number> for anything you
>   install that's not in the the repositories.
> 
> Get nodes to netboot, netinstall, and mount a shared /home.  Once users
> start using it listen to their needs and adapt accordingly.
> 
> Some suggestions:
> * If your campus has a standard username for each user, use it.
> * Use ssh certs for user authentication, you really don't want your
>   user's passwords, nor do they want to type it often.
> * start a wiki for documentation, allow users to edit it.
> * Have environmental modules output the name/version on module load,
>   much easier to figure out what a user has done when you have the
>   exact info to reproduce a run in the run's output.
> * set hardware physically to always netboot, then depend on the
>   central server to decide if it should be from local disk or a new
>   install.
> * Have compute nodes use host based ssh keys for auth (not user ssh
>   keys)
> * Have head node use user based keys for login, do not allow
>   ~/.ssh/authorized_keys
> * Allow exactly one ssh key per user.
> * Keep your configuration files in git or similar version control.  Or
>   if managed by puppet/chef, keep puppet/chef files in version control.
> * Strongly encourage any users writing source code to use a distributed
>   version control system like git.
> * Be very very clear on the status/lack of backups.  Be clear that loss
>   of files will happen and it's only a matter of time.
> * Use software RAID.
> 
> > Does it matter all that much?
> 
> Not particularly.  Random commercial software seems to assume RHEL based
> distros.  Ubuntu/Debian seems to have the largest repositories (read
> that as the most likely to have a user request handled by apt-get install).
> 
> > Any advice would be
> > greatly appreciated.
> 
> You didn't mention your current experience, if the above sounds daunting
> then start with warewulf.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html



More information about the Beowulf mailing list