Scyld + myrinet mpich-gm?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Dave Johnson ddj at cascv.brown.eduThu Feb 22 08:52:30 PST 2001
- Previous message: Scyld + myrinet mpich-gm?
- Next message: Scyld + myrinet mpich-gm?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In the nearly three weeks since my posting of Feb 4, I have managed to sort out most of the problems (including new ones that came up in the process) getting MPICH-GM jobs started on the diskless slave nodes using bproc/bpsh. I still wish there were searchable archives of this and other mailing lists available. Right now I'm looking for pointers on tuning the network interface (the gigabit card on the master is showing a fair number of TX drops). I will insert a few notes below as to what went on.... On Sun, Feb 04, 2001 at 12:15:58AM -0500, Dave Johnson wrote: > I've gotten myself involved in bringing a small cluster up and > into production. I'm learning as I go, with the help of the > archives of this mailing list. Unfortunately the searchable > archives at Supercomputer.org seem to be off line (I get internal > server error), and out of date (the last messages seem to be from > around May 2000). > > The current setup is one master with 100base-T to the world, gigabit > fiber to a 16-10/100 + 2-1000 switch, and 12 diskless slaves with > 10/100 and myrinet interfaces. The Scyld release of last Monday is > up and running, and I can bpsh to my heart's content. > > I'm stuck at the point of trying to deploy MPI. Scyld supplies mpi-beowulf > which does not appear to me to use bproc, and /usr/bin/mpirun and mpprun > which do. I've built the mpich-gm from Myricom, but their mpirun command > does not grok bpsh, and expects either rsh or ssh daemons on each slave. > I was clearly confused here -- the software in /usr/mpi-beowulf is linked against the bproc libraries, but the /usr/mpi-beowulf/bin/mpirun script is generic mpich. The /usr/bin/mpirun binary is intended to replace the script, but there are some limitations, some of which were mentioned before: - the script version handles the case where "." is not in $PATH, but the binary mpirun doesn't. - the binary mpirun doesn't ignore -mvhome and -mvback. - the error in this case is misleading -- instead of complaining about unrecognized options, it says "Failed to exec target program: No such file or directory". - I tried to get the mpprun SRPM to build properly, but gave up. > I've tried a number of approaches that start out looking like they might > work, but have gotten stuck after a few hours down each cowpath. > > Here is a list of some of the snags (I've lost track of some others): > > bpsh is not a full blown shell, doesn't deal well with redirection, changing > directory before running a command, and in particular it can't be swapped for > rsh or ssh when configuring mpich (ie -rsh=bpsh). Bproc's bpsh does have the interesting feature that the current working directory on the master is inherited by the slave, as long as it exists on the slave. This made it possible to hack mpirun.ch_gm.in to use bpsh. > > The master node is outside the myrinet, I haven't a clue how to get > it to cooperate with the slaves over ethernet yet have the slaves > use myrinet as much as possible. Starting from the suggestions I received, I was able to do this. The master's gigabit interface is connected to a switch with 16 RJ45 10/100 ports and 2 fiber gigabit ports. The 10/100/1000 net is named beonet (192.168.1.0) and the myrinet is 192.168.2.0. The master's address on beonet (eth1) is 192.168.1.1. For the master I added the line: eth1 net myrinet netmask 255.255.255.0 to /etc/sysconfig/static-routes. For the slaves, I added a hook in the /etc/beowulf/node_up script to call two of my new scripts, setup_local and setup_myrinet. setup_local computes IP addresses from the node number, sets the proper hostname, creates a slave version of the resolv.conf file, changes the IP routing to use myrinet GM-IP whenever possible, and sets some network parameters via /proc/sys/net/core. setup_myrinet is similar, but it first loads the GM driver and brings up the myri0 interface. It also starts the GM mapper on one of the nodes. In the end, the master's routes look like: Destination Gateway Genmask Flags Metric Ref Use Iface master.beonet * 255.255.255.255 UH 0 0 0 eth1 128.148.160.xx * 255.255.255.255 UH 0 0 0 eth0 myrinet * 255.255.255.0 U 0 0 0 eth1 beonet * 255.255.255.0 U 0 0 0 eth1 128.148.160.0 * 255.255.255.0 U 0 0 0 eth0 loopback * 255.0.0.0 U 0 0 0 lo default 128.148.160.yy 0.0.0.0 UG 0 0 0 eth0 The slaves have routing like: Destination Gateway Genmask Flags Metric Ref Use Iface master.beonet * 255.255.255.255 UH 0 0 0 eth0 myrinet * 255.255.255.0 U 0 0 0 myri0 beonet * 255.255.255.0 U 0 0 0 myri0 loopback * 255.0.0.0 U 0 0 0 lo default master.beonet 0.0.0.0 UG 0 0 0 eth0 and an /etc/resolv.conf file like: domain cfm.brown.edu nameserver 192.168.1.1 search beonet cfm.brown.edu > > I tried hacking on the first test in mpich-1.2..4/examples/test > (pt2pt/third) that you get when you do make testing or runtests -check. > Tried to get it to use /usr/bin/mpirun. Had to get rid of -mvhome and > -mvback args first, then tried to use bpsh to start up the mpirun on > one node, hoping it could use GM to start up on the other slaves. > After creating the directory in /var where it could create shm_beostat, > > Now I get truckloads of errors: > shmblk_open: Couldn't open shared memory file: /shm_beostat > shmblk_open failed. > > I suppose these might be from the other nodes, expecting everyone is > sharing /var, but I'm leery of nfs mounting all of the master's /var > on each slave. What worked here was to make sure "." was on my path, hack out the -mvhome and -mvback arguments from the runtests script, ignore bpsh, and use /usr/bin/mpirun instead of ../../bin/mpirun. > > I tried applying the Scyld patches against the 1.2.0 mpich sources to > the 1.2..4 sources from Myricom, but most of them went into the mpid/ch_p4 > directory, which is not built when --with-device=ch_gm is specified. > > Then I thought I'd look into the mpprun sources, but I couldn't get > them to build even before I started hacking on them... decided to look > elsewhere for a while. > > Tried getting sshd2 up and running on a slave node. So far it insists > on asking for my password and won't accept it at all. Having tried all these blind alleys, I concentrated on mpich-gm, and stopped playing with /usr/bin/mpirun. As it worked out, I had to make changes to only one file, mpirun.ch_gm.in, and then I was able to run the scripts in examples/test and examples/pertest. The patch will be attached (hopefully) to this message. Hope this is helpful so somebody out there. Thanks for the tips that got me going again. -- ddj Dave Johnson ddj at cascv.brown.edu -------------- next part -------------- --- mpid/ch_gm/mpirun.ch_gm.in.orig Mon Aug 21 19:46:18 2000 +++ mpid/ch_gm/mpirun.ch_gm.in Thu Feb 15 15:21:03 2001 @@ -37,6 +37,7 @@ if ($_ eq $host) { exec($cmd_ln); } else { + s/.*\D0*(\d+)$/$1/ if $bpsh; exec($rsh,'-n',$_,$cmd_ln); } } @@ -63,6 +64,7 @@ $SIG{'QUIT'} = 'cleanup'; $rsh="#RSHCOMMAND#"; +$bpsh = ($rsh =~ m|/?bpsh$|); $host = $ENV{'HOST'} || `uname -n`; chomp $host; $display=$ENV{'DISPLAY'}; @@ -262,6 +264,7 @@ $_ = read_line; die "bad line in $gmpifile: $_" unless /^([^\s]*)\s+(\d+)/; $mach[$i] = $1; + $mach[$i] = $host if ($1 eq "master"); # $node_id[$i] = $2; $port_id[$i] = $2; $board_id[$i] = 0; @@ -335,9 +338,10 @@ -d "$dir" or mkdir("$dir",0777) or die "cannot make directory $dir\n" } $gmpi_opts = " GMPI_OPTS=m$lnode,n$nbnode "; + $cmdpref = "cd $dir;$mget env $varenv $gmpi_opts"; if (defined($debug[$lnode])) { my $cmd = $argv{$lnode}->[0]; - $cmdline = "cd $dir;$mget env $varenv $gmpi_opts xterm -e gdb $cmd $mrel"; + $cmdsuff = "xterm -e gdb $cmd $mrel"; } elsif ($tview){ # my $cmd = "@{$argv{$lnode}}"; # @@ -354,16 +358,17 @@ for ($j=1;$j<=$numargs;$j++) { $cmdLineArgs = $cmdLineArgs . " $argv{$lnode}->[$j]"; } - $cmdline = "cd $dir;$mget env $varenv $gmpi_opts $totalview $cmd -a -mpichtv $cmdLineArgs"; + $cmdsuff = "$totalview $cmd -a -mpichtv $cmdLineArgs"; } else { my $cmd = "@{$argv{$lnode}}"; - $cmdline = "cd $dir;$mget env $varenv $gmpi_opts $cmd -mpichtv $mrel"; + $cmdsuff = "$cmd -mpichtv $mrel"; } } else { my $cmd = "@{$argv{$lnode}}"; - $cmdline = "cd $dir;$mget env $varenv $gmpi_opts $cmd $mrel"; + $cmdsuff = "$cmd $mrel"; } + $cmdline = $cmdpref . $cmdsuff; print STDERR "starting on $_: $cmdline\n" if ($verbose || !$doit); # print "starting on $_: $cmdline\n"; @@ -371,11 +376,16 @@ if ($_ eq $host) { exec($cmdline); } else { + if ($bpsh) { + s/.*\D0*(\d+)$/$1/; + $cmdline = $cmdpref . "$rsh -n $_ " . $cmdsuff; + exec($cmdline); + } else { exec($rsh,'-n',$_,$cmdline); + } } die "$rsh $_ $argv{$lnode}->[0]:$!\n" - } - else { + } else { exit 0; } } -------------- next part -------------- The first change, after the original line 40, is to convert hostnames such as node01 or slave-20 into just the numerical suffix, without any leading zeros. The new line of code after line 65 is to set the $bpsh flag based on the tail of $rsh matching "bpsh". This made later changes simpler and more readable. The addition after line 264 is a gross hack to deal with the fact that the master node is multi-homed, and `uname` gives "trapeza" but 192.168.1.1 maps to "master" in reverse DNS lookups. The new code after line 337 sets the prefix $cmdpref, which had been the same for all the cases to follow. The original line 340 now just sets the suffix part. I haven't tried any debugger on the cluster yet... the changes would be necessary to get a debugger started via bpsh, but are probably not enough by themselves. Lines 357, 360, 365, and the line following 367 are more of the same. The remaining changes after line 373 rearrange the command line if $bpsh is set, and again the slave hostnames are translated into node numbers.
- Previous message: Scyld + myrinet mpich-gm?
- Next message: Scyld + myrinet mpich-gm?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
