From pirabhur at MPI-SoftTech.Com Thu Dec 4 12:07:01 2003 From: pirabhur at MPI-SoftTech.Com (Pirabhu Raman) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] slave processes hang Message-ID: I am trying to use totalview to debug MPI processes. While it works fine on rank 0, all the processes hang on pthread_create call on other ranks when started with totalview for debugging. The same program runs fine without totalview. I am using the basic/demo version of Scyld operating system. I was wondering whether debugging works at all in Scyld systems. If so, could anyone provide any suggestions or pointers to resolve this issue. Thanks in advance, Pirabhu ---------------------------------------------------------------------- Pirabhu Raman Software Engineer MPI Software Technology, Inc. 110, 12th Street, North Suite D103, Birmingham, AL 35203 Ph: 205-314-3471 Ext-207 Fax: 205-314-3475 From becker at scyld.com Thu Dec 4 12:55:01 2003 From: becker at scyld.com (Donald Becker) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] slave processes hang In-Reply-To: Message-ID: On Thu, 4 Dec 2003, Pirabhu Raman wrote: > I am trying to use totalview to debug MPI processes. While it works fine > on rank 0, all the processes hang on pthread_create call on other ranks > when started with totalview for debugging. The same program runs fine > without totalview. I am using the basic/demo version of Scyld operating > system. What version? (The 28 series is the version that we use with TotalView.) > I was wondering whether debugging works at all in Scyld systems. Yes. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system Annapolis MD 21403 410-990-9993 From linuxbeak at elfess.com Sat Dec 6 11:49:01 2003 From: linuxbeak at elfess.com (linuxbeak@elfess.com) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Slave node boot problems Message-ID: <1070729668.3fd209c4ce8d9@webmail.elfess.com> Hello all, I just installed version 5.1cz on a PowerEdge 4300 the other day. The cluster that will eventually running will have 16 nodes. I have sucessfully test-booted 4 nodes, but eight others aren't working for various reasons. I'm noticing in about half of the nonworking nodes a trend... When bpslave tries to kick in, it hangs for a bit and spits out the following message: eth0: transmit timed out, tx_status 00 status e000. This is getting rather annoying; I was wondering if anyone has any ideas? Thanks. ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From linuxbeak at elfess.com Sat Dec 6 14:05:01 2003 From: linuxbeak at elfess.com (linuxbeak@elfess.com) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Re: Slave Node Boot Problems In-Reply-To: <200312061709.hB6H9ZS27282@NewBlue.scyld.com> References: <200312061709.hB6H9ZS27282@NewBlue.scyld.com> Message-ID: <1070737813.3fd22995a55ac@webmail.elfess.com> Hello again, I suppose next time I'll do a little more tinkering around with machines before crying for help. I found out that the problem lay in the network card... I've been using a mixture of 3Com 3c905's and 3c905B's... it seems that the system likes the latter a lot more. Nevertheless, I STILL need to use the original 3c905, because I've ran out the 3c905Bs... what do you suggest? ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From saville at comcast.net Sat Dec 6 14:49:01 2003 From: saville at comcast.net (Gregg Germain) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Upgrading the Linux on SCYLD Message-ID: <3FD2352A.39595267@comcast.net> Hi, I have installed the shareware SCYLD Beowulf system which uses RH Linux 6.2. I'd like to know if I can upgrade the Linux to more modern versions without disturbing the Scyld setup. Is this possible? thanks Gregg From mikemac at mikemac.com Sat Dec 6 22:53:01 2003 From: mikemac at mikemac.com (Mike McDonald) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Upgrading the Linux on SCYLD In-Reply-To: Your message of "Sat, 06 Dec 2003 14:59:39 EST." <3FD2352A.39595267@comcast.net> Message-ID: <200312062000.hB6K0gc07392@saturn.mikemac.com> >To: scyld-users@scyld.com >Date: Sat, 06 Dec 2003 14:59:39 -0500 >From: Gregg Germain > >Hi, > > I have installed the shareware SCYLD Beowulf system which uses RH Linux >6.2. > > I'd like to know if I can upgrade the Linux to more modern versions >without disturbing the Scyld setup. > >Is this possible? If you do it really, really carefully, it's "possible". You just can't update any of the Scyld modified packages. That includes the kernel, glibc, modutils, and bunch of others. Kind of makes "upgrading" not worth the attempt. Mike McDonald mikemac@scyld.com From FriedmJD at nv.doe.gov Mon Dec 8 10:48:01 2003 From: FriedmJD at nv.doe.gov (Friedman, Joshua) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Master not responding to new nodes Message-ID: Skipped content of type multipart/alternative From becker at scyld.com Mon Dec 8 10:54:01 2003 From: becker at scyld.com (Donald Becker) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Master not responding to new nodes In-Reply-To: Message-ID: On Mon, 8 Dec 2003, Friedman, Joshua wrote: > I have a working seven node beowulf cluster. I am trying to integrate > two new nodes into the cluster. I am booting the new nodes off of the > Scyld CD (version 27Cz-9). > They are making it through the phase I boot, and sending a RARP to > the master. Are you seeing incoming packets on the master? What NICs are you using? -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster systems Annapolis MD 21403 410-990-9993 From becker at scyld.com Mon Dec 8 11:04:00 2003 From: becker at scyld.com (Donald Becker) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Upgrading the Linux on SCYLD In-Reply-To: <3FD2352A.39595267@comcast.net> Message-ID: On Sat, 6 Dec 2003, Gregg Germain wrote: > I have installed the shareware SCYLD Beowulf system which uses RH > Linux 6.2. That's a quite old release. > I'd like to know if I can upgrade the Linux to more modern versions > without disturbing the Scyld setup. The kernel and libraries have performance enhancements and cluster extensions. For instance, Scyld added LFS (Large File Summit) support, which was not in the 2.2 kernels or libraries of that time. There are hooks for the BProc system, which has changed its interface over time. We've written cross-compatible libraries, but only for a subset of the functions. There were many other bug fixes and performance enhancements we added, but most are either now incorported in the standard package releases or are transparent if missing. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster systems Annapolis MD 21403 410-990-9993 From saville at comcast.net Mon Dec 8 11:05:02 2003 From: saville at comcast.net (Gregg Germain) Date: Tue Nov 9 01:14:27 2010 Subject: [scyld-users] Master not responding to new nodes References: Message-ID: <3FD4A399.4C467310@comcast.net> > have a working seven node beowulf cluster. I am trying to integrate two new nodes into the cluster. I am booting the new nodes off of the Scyld CD (version 27Cz-9). They are making it through the phase I boot, and sending a RARP to the master. The master node is not responding to the request.< It has been my experience that unanswered RARP requests can occur if the network connection doesn't go through a hub or router. How did you connect he two new nodes to the cluster network? Gregg