[Beowulf] Building a 2 node cluster using mpich

Reuti reuti at staff.uni-marburg.de
Sun Dec 30 14:33:27 PST 2007


Hi,

Am 27.12.2007 um 18:33 schrieb Kalpana Kanthasamy:

> Hi guys, I am a beginner in linux and also for cluster, but I really
> need to experiment this for my project. Anyway I have documented what
> I have done so far, but I got stuck after a certain point... Let me
> explain what I have done
>
> After searching through the internet for a few days, I decided to use
>
> http://blizzard.rwic.und.edu/~nordlie/deuce/
> http://www.mcsr.olemiss.edu/bookshelf/articles/ 
> how_to_build_a_cluster.html
>
> 1.Installed a Linux distribution (I am using Open Suse on each
> computer in both computers in the cluster).
>
> 2.During the installation process, assign hostnames and of course,
> unique IP addresses for each node in your cluster, gateway is the
> router. Hostname – localhost, domain - localdomain
>
>
> 3.Cluster is private. I have used IP address 192.168.0.190 for the
> master node and 192.168.0.191 for the slave node.
>
> 4.Finally, create identical user accounts on each node. In our case,
> we create the user DevArticle on each node in our cluster. You can
> either create the identical user accounts during installation, or you
> can use the adduser command as root.

better use NIS (or LDAP). So you only have to define the users once.

>
>
> Configuration on all nodes
>
> On all nodes
> 5.We now need to configure rsh on each node in our cluster. Create
> .rhosts files in the user and root directories. Our .rhosts files for
> the DevArticle users are as follows:
> Master DevArticle
> Slave DevArticle
>
> The .rhosts files for root users are as follows:
>
> Master root
> Slave root
>
>
>
> On all nodes
> 6.Next, I modified the etc/hosts.equiv file, the same thing both in
> Master and Slave
>
> 192.168.0.190 Master.localhost.localdomain Master
> 127.0.0.1          localhost
> 192.168.0.191  Slave.localhost.localdomain Slave

There is only the hostname to put there, hence only two lines:

Master
Slave

>
> 7.Do not remove the 127.0.0.1 localhost line. The hosts.allow files on
> each node was modified by adding ALL+ as the only line in the file.
> This allows anyone on any node permission to connect to any other node
> in our private cluster.
>
>
>
> On all nodes
> 8.To allow root users to use rsh, I had to add the following lines to
> the /etc/securetty file:
>
> rsh
> rlogin
> rexec
>
> pts/0
> pts/1
>
>
> On all nodes
> 9.Also, I modified the /etc/pam.d/rsh file:
> #%PAM-1.0
> # For root login to succeed here with pam_securetty, "rsh" must be
> # listed in /etc/securetty.
> auth       sufficient   /lib/security/pam_nologin.so
> auth       optional     /lib/security/pam_securetty.so

You can try to comment-out the line above.

> auth       sufficient   /lib/security/pam_env.so
> auth       sufficient   /lib/security/pam_rhosts_auth.so
> account  sufficient   /lib/security/pam_stack.so service=system-auth
> session   sufficient   /lib/security/pam_stack.so service=system-auth
>
> On all nodes
> Rsh, rlogin, Telnet and rexec are disabled by default. To change this,
> I navigated to the /etc/xinetd.d directory and modified each of the
> command files (rsh, rlogin, telnet and rexec), changing the disabled =
> yes line to disabled = no.
>
> Once the changes were made to each file (and saved), I closed the
> editor and issued the following command:
>
> Turn on the rsh daemon using the chkconfig command: chkconfig rsh on
> 1.To check the rsh daemon's status, run the chkconfig command:
> chkconfig --list rsh
> 2.Run the /etc/rc.d/xinetd restart command.
> 3.Restart xinetd with /sbin/service xinetd restart
>
>
>
> The Mounting Process
>
> On the Master node
> I edited the etc/exports
>
> This is how my file is, I used the YAST – NFS server tool.I then
> double checked my etc/exports file, this is how it looks
>
> /home		192.168.1.190/255.255.255.0(rw,no_root_squash)
> /usr/local	192.168.1.190/255.255.255.0(rw,no_root_squash)
>
>
> On the Slave node
> I edited the etc/fstab
>
> This is how my file is, I used the YAST – NFS client tool.I then
> double checked my etc/fstab file, this is how it looks
> ---------------------------------------------------------------------- 
> ------------------------------------
> /dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part6	/	 
> ext3	acl,user_xattr
> 1 1
> /dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part1	/ 
> windows/C	ntfs-3g	 
> users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8
> 0 0
> /dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part5	 
> swap	swap	defaults
> 0 0
> proc	/proc	proc	defaults 0 0
> sysfs	/sys	sysfs	noauto 0 0
> debugfs	/sys/kernel/debug	debugfs	noauto 0 0
> usbfs	/proc/bus/usb	usbfs	noauto 0 0
> devpts	/dev/pts	devpts	mode=0620,gid=5 0 0
> /dev/fd0	/media/floppy	auto	noauto,user,sync 0 0
> Master:/home	/home	nfs	rw 0 0
> Master:/usr/local	/usr/local	nfs	ro 0 0
>
>
>
>
> I also changed this etc/mtab file, according to the mpich  
> documentation

I would never change the /etc/mtab by hand, as it's maintained by the  
kernel. Where is this stated in the mpich documentation to touch it?

> ---------------------------------------------------------------------- 
> ------------------------------------
> /dev/sda5 / ext3 rw,acl,user_xattr 0 0
> proc /proc proc rw 0 0
> sysfs /sys sysfs rw 0 0
> debugfs /sys/kernel/debug debugfs rw 0 0
> udev /dev tmpfs rw 0 0
> devpts /dev/pts devpts rw,mode=0620,gid=5 0 0
> /dev/sda1 /windows/C fuseblk
> rw,noexec,nosuid,nodev,noatime,allow_other,default_permissions,blksize 
> =4096
> 0 0
> securityfs /sys/kernel/security securityfs rw 0 0
> nfsd /proc/fs/nfsd nfsd rw 0 0
> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
>
> Master:/home /rmt/Master/home nfs noac 0 0
>
> Master:/usr/local /rmt/Master/usr/local nfs noac 0 0 0
> ---------------------------------------------------------------------- 
> -------------------------------------
>
> After that I did this
>
> On each node, type ifconfig and make sure that the machine has its
> appropriate interior IP address. (Such as 192.168.0.X).
> On each node, go to /etc/rc.d and type ./network stop.
> On the master node, also type ../nfs stop
> On the master node, type ../nfs start On each node, type ../network  
> start.
>
>
> I guess I mounted properly rite, cause I made sure I followed the
> websites..I could access the files from the slave machines also
>
>
>  I could ping both machines, and if I type
> Master:/ # rsh Slave
> Master:/ # ls -a
> or
> Slave:/ # rsh Master
> Slave:/ # ls -a
>
>
> works on both the machine, and then when I type ls -a, I get to see
> the files, but its when I type a full command like this, it fails, and
> permission denied appears. I emptied my host. allow and host. deny
> files in both Master and Slave.
>
>
>
> But when I type commands like
> Master:/ # rsh Slave date
> Master:/ # permission denied
>
> or
>
> Master:/ # rsh Master pwd
> Master:/ # permission denied
>
> Ok, here is where I am stuck, cause I tried installing mpich but
> during both rsh and ssh were not detected during configuration,
> permission denied, I think its something to with my NFS, any idea.

There is no need to allow it for root at all. Is it working for a  
normal user? Then you can already run parallel programs.

-- Reuti



More information about the Beowulf mailing list