From rgb@phy.duke.edu Wed, 30 Sep 1998 15:19:15 -0400 Date: Wed, 30 Sep 1998 15:19:15 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: bWatch On Wed, 30 Sep 1998, Matthew Hixson wrote: > On Wed, 30 Sep 1998, Robert G. Brown wrote: > > I'm working with Jacek writing my own > > version of "procd", a daemon that I already have running in alpha that > > can provide on demand what amounts to a cat of any (permitted) file in > > /proc, all via a TCP connection without the overhead of a shell. The > > procd can also lookup or query/build anything accessible in userspace, > > e.g. it calculates and returns number of users or local time the hard > > way to avoid shell calls to uptime (I'm trying to leave it > > unprivileged). I'd be happy to add code to procd to query the CPU > > I've also written a little daemon that I call procd. It is a simple > socket based app that listens for a connection on a specified TCP port. > You telnet to the port and type the name of the file you want to view. It > restricts the files you can see to those available from /proc. So if you > want /proc/cpuinfo that's exactly what you type into your telnet session > and you get the results right there. There isn't any reason why another > program couldn't be written to communicate with this one. The output from > the file requested continues until you see a '.' on a line by itself, much > like the termination of a DATA segment in SMTP, so it would be easy to > parse. > This runs as a nonpriveleged user also, no need for root. > I don't have this up on the web yet so if anyone would like to see it > just drop me an email. Its GPL'd, as it should be. > -M@ This almost exactly describes what I'm doing, except that I plan to add some authentication and a couple of other commands, and that I do the actual querying with a perl script rather than telnet. perl parses anything totally trivially. Great minds think alike, eh? :-) I'd love to see your source just to see if I left anything out of mine -- being lazy, I actually send a "done" to the caller instead of a "." (in perl, just /^done/ which is actually a bit more readable than /^\./:-) and haven't actually installed the "send filename" command, but it is next on the agenda, possibly this very afternoon. Thanks, rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From kragen@pobox.com Wed, 30 Sep 1998 15:25:17 -0400 Date: Wed, 30 Sep 1998 15:25:17 -0400 From: Kragen kragen@pobox.com Subject: bWatch On Wed, 30 Sep 1998, Robert G. Brown wrote: > Thanks (I found the lm_sensors myself from Kragen's hint, but had to > work to do so:-) Sorry about that. I was in a hurry. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From kragen@pobox.com Wed, 30 Sep 1998 15:43:20 -0400 Date: Wed, 30 Sep 1998 15:43:20 -0400 From: Kragen kragen@pobox.com Subject: Mixed hardware On Wed, 30 Sep 1998, Bryan J. Welch wrote: > Forgive me if this is another 'newbie' question, but after searching > www.beowulf.org I don't see anything on (or against) building a Beowulf > cluster of unlike computers. We have a couple Dec Alphas and a pile of > old 486/586 pcs. Has anyone tried building a cluster combining hardware > like this? Any reasons this wouldn't work? This will work. Here are some things you have to think about: - Load balancing will be a little more difficult. - You won't be able to use MOSIX to distribute processes transparently between Alphas and 486s. But MOSIX isn't out yet anyway. Most of the big Beowulfs cost lots of money, and were purchased all at once, and thus have identical nodes. Hope this helps. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From hendriks@cesdis1.gsfc.nasa.gov Wed, 30 Sep 1998 15:44:38 -0400 Date: Wed, 30 Sep 1998 15:44:38 -0400 From: Erik Arjan Hendriks hendriks@cesdis1.gsfc.nasa.gov Subject: bWatch On Wed, 30 Sep 1998, Robert G. Brown wrote: > On Wed, 30 Sep 1998, Kragen wrote: > > > I think the Linux program is called lmwatch. It's been discussed a lot > > on freshmeat.net recently. > > lm_sensors, not lmwatch. > > " > lm_sensors 1.4.2 > lm_sensors is an effort to provide some essential tools for monitoring > the hardware health of Linux > systems containing hardware health monitoring hardware such as the > LM78 and LM75 connected > via the SMBus (usually found in P6 and P-II systems). For those of us that have motherboards w/o an SMBus, I've written a driver that supports the LM78 and the Winbond W83781D via the ISA IO port interface. (delivers status in /proc) http://beowulf.gsfc.nasa.gov/software/lm78-0.3.1.tar.gz ftp://beowulf.gsfc.nasa.gov/www/software/lm78-0.3.1.tar.gz - Erik ------------------------------------------------------------ Erik Hendriks hendriks@cesdis.gsfc.nasa.gov From jason@primenet.com Wed, 30 Sep 1998 16:21:47 -0400 Date: Wed, 30 Sep 1998 16:21:47 -0400 From: Jason Wagner jason@primenet.com Subject: No subject Can someone please send me the e-mail address of the list admin(s)? Thanks. Jason Jason Wagner Applications Systems Analyst University of Arizona at Sierra Vista jason@uasv.arizona.edu http://indigo.uasv.arizona.edu 12821209 The opinions expressed here are strictly my own, and in no way reflect the opinions of The University of Arizona. From rgb@phy.duke.edu Wed, 30 Sep 1998 17:27:10 -0400 Date: Wed, 30 Sep 1998 17:27:10 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Installing new system over network? On Wed, 30 Sep 1998, Huntress, Gary B. wrote: > Hi, > > I've been tinkering with my 10 node system for a while...adding > to it a box at a time. This entails booting from a linux boot floppy, > running fdisk, running setup, installing, recompiling my kernel, > configuring the network, then rebooting....not too painful, but not > convenient either. > > I've just been given 6 more boxes. The magic number "6" is the > physical limitation of my Toyota Celica. There are 40 or so > more......Needless to say, I want to learn how to install over a network > :) > > I would greatly appreciate it if someone could summarize the > steps necessary for a network installation (or maybe even just a URL). > In the words of Paul Reiser "...talk to me like I'm four..." OK, I've been meaning to this for a while, and it's about time I did it, even if it isn't complete. Get: www.phy.duke.edu/brahma/diskless.tar.gz Put it on your server system and untar it You will need a couple hundred megabytes free to unpack it. It contains root/, generic_[etc,var,tmp,dev]/, and makefloppy/. In makefloppy are scripts to make a diskless boot/root disk, given a kernel and some other information that can be entered on the script command line (or more likely, changed in the script). One option to this script will make copies of the generic_* directories and edit key files in them so that they are ready to mount. If you export the root and the resulting hostname_[etc,var,dev,tmp] directories just as they are, and boot from the boot floppy, you should get a functioning linux on the system. It will probably be missing e.g. /usr/local -- you may need to make certain changes in the generic_fstab to get your own exported /usr/local/ to mount. You will need a reasonably current version of expect in usr local to proceed. With a full linux running diskless, you can easily install the local disk by hand. However, the MAIN reason I'm writing all this to you is that in the root/ directory are two scripts: cloneroot and install_disk.exp The first is a perfectly boring /bin/sh script that uses the install_disk expect script to fdisk and mke2fs the disk (with hardcoded partitions/sizes -- obviously they will need to be altered to describe your "standard" disk). It then mounts the disk and clones the (presumed mounted) diskless root on to the disk. It even installs lilo for you and makes and installs four 128 MB swapfiles (complete to the entry in /etc/rc.d/rc.local to start them up). Whether or not you use the script (or even whether or not it will WORK for you -- I make no guarantees) it is an instruction set on what you need to install a system. If you cannot clone this particular root in your environment, you can export your own favorite root and clone that. Or you can follow the steps by hand. Whatever. Anyway, using this script I can take a box with NO MONITOR,, plug it in and connect it to the network, boot it from its own custom boot floppy, login to it from a console over the net, type /cloneroot, and in 15 to 20 minutes (depending on network speed) I've got a system ready to remove the floppy and reboot. For me, at least, it works very reliably. I haven't really "sanitized" the scriptset so that it is guaranteed to be portable -- I'm sure that there are a couple of local idiosyncracies lurking that I'll hear about when somebody tries it at their place -- and you do have to know what you are doing (I've got some kernel images in the makefloppy directory, but you'll probably want to use your own kernel with diskless boot support (root on NFS and all that) built in. My kernels are mostly SMP and a lot of them have an aic7xxx 5.1.0preX built in. If this approach to automated installation proves popular, I may take the time to clean up the kit. It really shoudn't rely on the root that I provide (only one or two files therein are really "customized" and I can probably think up a way to replace them with symlinks in a clever way). It DOES rely on one providing at least var, etc, tmp and dev rw (root can be exported ro) -- obviously var, etc has to contain ld.so.cache and utmp, permissions need to be changeable on devices, and tmp is used by a lot of stuff. Even if you cannot use it "right out of the box", though, it ought to be plenty for a decent systems programmer to go on to hack into something that will work just fine for them. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb@phy.duke.edu Wed, 30 Sep 1998 17:42:08 -0400 Date: Wed, 30 Sep 1998 17:42:08 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Mixed hardware On Wed, 30 Sep 1998, Bryan J. Welch wrote: > Forgive me if this is another 'newbie' question, but after searching > www.beowulf.org I don't see anything on (or against) building a Beowulf > cluster of unlike computers. We have a couple Dec Alphas and a pile of > old 486/586 pcs. Has anyone tried building a cluster combining hardware > like this? Any reasons this wouldn't work? Sure, it'll work. PVM is even "designed" to run parallel code on an inhomogeneous environment like this -- I would guess that MPI will do it too, but less efficiently. The only catch is that you will have to work to accommodate scale/speed differences in your hardware in your parallel application. If you are doing coarse grain parallel calculations (e.g. Monte Carlo) it hardly matters -- more samples are more samples. If you want to try something fine grained, you'll have to figure out some optimization by hand and tune them up by experience. Oh, and obviously you'll have to have the right binaries compiled for the right architecture and on the right paths -- again, PVM does most of this for you if you use it right. > Also, if someone could point me to more detailed how-to info, I'd be much > appreciative. Maybe I need to download the RPMs and read what's in > those? I don't think that you'll do anything particularly different. Install linux and as much of the EL CD or any other beowulf source list that you like, minimally PVM (your best bet as a parallel medium). The EL CD image is linked to the main beowulf page, www.beowulf.org and has all the software you need therein (and then some). Obviously I think you should start with PVM, but other tools may prove useful as well. I've seen PVM results presented where one part of the calculation was done on a Cray, another part on a cluster of Suns, and a third on a cluster of DEC's, all with very different speeds and strengths. Obviously breaking the problem up was probably a bit of a chore, but it seemed to give very good results. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From lindahl@cs.virginia.edu Wed, 30 Sep 1998 18:30:24 -0400 Date: Wed, 30 Sep 1998 18:30:24 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Mixed hardware > Forgive me if this is another 'newbie' question, but after searching > www.beowulf.org I don't see anything on (or against) building a Beowulf > cluster of unlike computers. I have a mixed cluster. We did it on purpose so we can study scheduling. The main trick is using systems of different speeds in a computation. If you problem can use static or dynamic load balancing, then go for it. -- g From glamm@ece.umn.edu Wed, 30 Sep 1998 18:56:51 -0400 Date: Wed, 30 Sep 1998 18:56:51 -0400 From: Bob Glamm glamm@ece.umn.edu Subject: Mixed hardware (fwd) On Wed, 30 Sep 1998, Bryan J. Welch wrote: > Forgive me if this is another 'newbie' question, but after searching > www.beowulf.org I don't see anything on (or against) building a Beowulf > cluster of unlike computers. We have a couple Dec Alphas and a pile of > old 486/586 pcs. Has anyone tried building a cluster combining hardware > like this? Any reasons this wouldn't work? This will work. Here are some things you have to think about: - Load balancing will be a little more difficult. - You won't be able to use MOSIX to distribute processes transparently between Alphas and 486s. But MOSIX isn't out yet anyway. In addition to the above, you'll need to at least think about endian + data size issues between the Alphas and the x86 machines from a distributed application point of view. MPI/PVM have *pack functions; raw programming you'll use XDR & . -Bob From jacek@usq.edu.au Wed, 30 Sep 1998 19:43:20 -0400 Date: Wed, 30 Sep 1998 19:43:20 -0400 From: Jacek Radajewski jacek@usq.edu.au Subject: bWatch Wasn't there a bproc kernel module ? The ideal thing would be to access all nodes' /proc on one machine. > I've also written a little daemon that I call procd. It is a simple > socket based app that listens for a connection on a specified TCP port. > You telnet to the port and type the name of the file you want to view. It > restricts the files you can see to those available from /proc. So if you > want /proc/cpuinfo that's exactly what you type into your telnet session > and you get the results right there. There isn't any reason why another > program couldn't be written to communicate with this one. The output from > the file requested continues until you see a '.' on a line by itself, much > like the termination of a DATA segment in SMTP, so it would be easy to > parse. > This runs as a nonpriveleged user also, no need for root. > I don't have this up on the web yet so if anyone would like to see it > just drop me an email. Its GPL'd, as it should be. > -M@ This almost exactly describes what I'm doing, except that I plan to add some authentication and a couple of other commands, and that I do the actual querying with a perl script rather than telnet. perl parses anything totally trivially. Great minds think alike, eh? :-) I'd love to see your source just to see if I left anything out of mine -- being lazy, I actually send a "done" to the caller instead of a "." (in perl, just /^done/ which is actually a bit more readable than /^\./:-) and haven't actually installed the "send filename" command, but it is next on the agenda, possibly this very afternoon. Thanks, rgb From jacek@usq.edu.au Wed, 30 Sep 1998 19:46:59 -0400 Date: Wed, 30 Sep 1998 19:46:59 -0400 From: Jacek Radajewski jacek@usq.edu.au Subject: bWatch Sure. If its just a matter of reading from /proc then its not a problem. I'll put it on the list of things to do. Jacek -----Original Message----- From: West, Jeff [mailto:Jeff.West@ssc.nasa.gov] Sent: Wednesday, September 30, 1998 11:44 PM To: Jacek Radajewski Cc: beowulf@cesdis1.gsfc.nasa.gov Subject: RE: bWatch Jacek: I downloaded bWatch a few months ago and played with it a bit. I now have funding to build a 48 cluster. I will be doing domain decomposition CFD on it. In the process of procuring I have noted that some motherboards have the ability to report the board temperature. I think that would be a nice addition to bWatch. Do you know how to access this information? The vendor I talked to said that he knew of NT programs that could access this info, but not Linux programs. He said the chip that reported this info was the LM97 chip, sounds like the LM87 performance monitoring chip from Pentium Pro discussions. What do you think/know? Jeff ---------------------------------------------------------------------------- --------------------- Jeff West, Ph.D. Associate Engineer, SR Lockheed Martin Stennis Operations Voice: (228) 688-1562 Bldg. 8306 Fax: (228) 688-1106 Stennis Space Center, MS 39529 jeff.west@ssc.nasa.gov ---------------------------------------------------------------------------- --------------------- From hogue@mshri.on.ca Wed, 30 Sep 1998 20:28:32 -0400 Date: Wed, 30 Sep 1998 20:28:32 -0400 From: Christopher Hogue hogue@mshri.on.ca Subject: Mixed hardware (fwd) Hi folks, With regard to data transmission, big/little endian problems. This has been addressed by the little used ASN.1 protocol developed in originally in 1990 for the OSI networking model. While OSI died a death of two many three-letter abbreviations, ASN.1 lives on. It is meant for mixed hardware data exchange. We biologists use it all the time for transmitting binary data (sequences, 3-D molecular structures) from host UNIX servers to network clients on all platforms. For an example of the text version of ASN.1 data, click this link: http://bioinfo.mshri.on.ca/cgi-bin/Structure/mmdbsrv?db=t&form=6&uid=4MT2&Dopt=i&save=See Binary versions are less pretty, but do the endian-job wonderfully and can handle arbitrarily complex data descriptions, like 3-D scenes such as macromolecular complexes such as ours that don't fit into VRML very well. ASN.1 info is at: http://www.oss.com http://www.inria.fr/rodeo/personnel/hoschka/asn1.html The "Beowulf" software we are working on for molecular structure prediction is in fact platform independent and will work not only on our new Linux cluster, but also simultaneously on our SGI and Sun servers, as well as some NT clients. We need all the CPUs we have! Chris. -------------------------------------- Christopher W.V. Hogue, Ph.D. Samuel Lunenfeld Research Institute Mt. Sinai Hospital 600 University Ave. Toronto Ontario Canada M5G 1X5 (416) 586-4800 xt2866 fax (416) 586-8857 hogue@mshri.on.ca http://bioinfo.mshri.on.ca From nav@pop.jaring.my Wed, 30 Sep 1998 20:35:07 -0400 Date: Wed, 30 Sep 1998 20:35:07 -0400 From: Khalid nav@pop.jaring.my Subject: PROBLEM WITH 3COM NIC DRIVER With 3c590 driver, the 3com nic card 3C905B-TX works only at 10 MHz instead of 100 MHz. Is there a driver for 900 series? I am using kernel 2.0.35 (Red Hat 5.1 Mandrake) Thanks in advance for your help. Khalid. From becker@cesdis1.gsfc.nasa.gov Wed, 30 Sep 1998 20:52:58 -0400 Date: Wed, 30 Sep 1998 20:52:58 -0400 From: Donald Becker becker@cesdis1.gsfc.nasa.gov Subject: PROBLEM WITH 3COM NIC DRIVER On Thu, 1 Oct 1998, Khalid wrote: > With 3c590 driver, the 3com nic card 3C905B-TX works only at 10 MHz instead > of 100 MHz. Is there a driver for 900 series? > I am using kernel 2.0.35 (Red Hat 5.1 Mandrake) Use driver v0.99E or v0.99G ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/test/3c59x.c Note: RedHat is using 2.0.34-pre6, not a real 2.0.35, thus the older driver version. The Extreme Linux CD has the updated version. Donald Becker becker@cesdis.gsfc.nasa.gov USRA-CESDIS, Center of Excellence in Space Data and Information Sciences. Code 930.5, Goddard Space Flight Center, Greenbelt, MD. 20771 301-286-0882 http://cesdis.gsfc.nasa.gov/people/becker/whoiam.html From mcghee@mech.uq.edu.au Wed, 30 Sep 1998 21:07:31 -0400 Date: Wed, 30 Sep 1998 21:07:31 -0400 From: Andrew Mc.Ghee mcghee@mech.uq.edu.au Subject: Mixed hardware Bryan, Yes, we run a heterogenous system of PPro, PII and DEC Alpha workstations using MPI. It works well, or at least it used to until we upgraded to a private fast ethernet system between the systems. For some reason MPI now seems to fail on running jobs on the Dec Alpha's when using gethostbyname (only when using the fast network - This is probably a small teething problem in the setup of MPI, or our network). If you are using MPI (MPICH or LAM are two freely available versions), you'll have to have a copy of MPI installed on at least one machine of each architecture (to allow building your MPI application for that machine type) and have the mpirun command on your master node (to distribute rsh commands to start your program on each of the nodes) and of course rsh enabled on each node. MPI is very favourable for heterogenous cluster programming, but you have to make sure you use the proper data types when transmitting information between machines. Floating point numbers on Intel and Dec Alpha's are internally stored differently - MPI can handle this (as I'm sure so can PVM). regards, Andrew ======================================================================== Andrew Mc.Ghee Mechanical Engineering University of Queensland St.Lucia, Brisbane. 4072. AUSTRALIA. Phone: (07) 3365-3536 Fax: (07) 3365-4799 email: mcghee@mech.uq.edu.au ======================================================================== From jacek@usq.edu.au Wed, 30 Sep 1998 21:37:35 -0400 Date: Wed, 30 Sep 1998 21:37:35 -0400 From: Jacek Radajewski jacek@usq.edu.au Subject: Mixed hardware we also have a cluster of P120s, dual P166, and dual PII233. It works well. -----Original Message----- From: Greg Lindahl [mailto:lindahl@cs.virginia.edu] Sent: Thursday, October 01, 1998 8:30 AM To: bjwelch@bell-labs.com Cc: beowulf@cesdis1.gsfc.nasa.gov Subject: Re: Mixed hardware > Forgive me if this is another 'newbie' question, but after searching > www.beowulf.org I don't see anything on (or against) building a Beowulf > cluster of unlike computers. I have a mixed cluster. We did it on purpose so we can study scheduling. The main trick is using systems of different speeds in a computation. If you problem can use static or dynamic load balancing, then go for it. -- g From bill@math.ucdavis.edu Thu, 1 Oct 1998 00:04:25 -0400 Date: Thu, 1 Oct 1998 00:04:25 -0400 From: Bill Broadley bill@math.ucdavis.edu Subject: Installing new system over network? >> I would greatly appreciate it if someone could summarize the >> steps necessary for a network installation (or maybe even just a URL). >> In the words of Paul Reiser "...talk to me like I'm four..." > >OK, I've been meaning to this for a while, and it's about time I did it, >even if it isn't complete. Get: > >www.phy.duke.edu/brahma/diskless.tar.gz I have a single floppy + script I run that will take a pc and turn it into a diskless workstation. It's great for low maintance, cheap nodes without a disk. Theres no "installation" you just boot. Of course with a beowulf cluster you would want local swap (if your swapping at all), and local data (for better bandwidth), but that would be easy to support. I have /etc /sbin /lib /bin /opt are all mounted readonly (had to modify a few binaries to fix that, mount, HOSTNAME, drift, issue, mtab, ssh_random_seed etc. Only /var and /home are mounted read/write. I'd guess that most disk I/O for beowulf type uses would be swapping or to a data disk. I find them a pleasure to admin (almost zero admin per node), quite cool and quiet without a local disk, and cheap. Of course adding a local data/swap disk offsets the advantage by a good bit. What are peoples thoughts on diskless (but not dataless) machines for beowulf clusters? Should I make what I've already done available? So far I have 18 diskless nodes, 17" nokia, 4 MB matrox, p5-200MMX, 64 mb ram, netgear 100 mbit cards, atx case, asus motherboard well under $1000 per seat 4 months ago or so. Make great xterminals that can run apps locally. Even 3d visualization using geomview is quite speedy. Amusingly geomview is faster then PII-350/riva 128 systems we have here. -- Bill From rgb@phy.duke.edu Thu, 1 Oct 1998 02:29:20 -0400 Date: Thu, 1 Oct 1998 02:29:20 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Installing new system over network? On Wed, 30 Sep 1998, Bill Broadley wrote: > What are peoples thoughts on diskless (but not dataless) machines for > beowulf clusters? I ran our cluster diskless for a month or two, and it was a bit annoying -- one of our users was generating REALLY BIG datasets output and would saturate the network writing to NFS space on the server (which was also my desktop, alas). One also has to worry about swap when users are running really big jobs, and load time, although fast enough for many purposes, is definitely perceptibly slower than a nice fast local disk. I wouldn't hesitate to install diskless desktop systems (we ran diskless SLC and ELC Suns for years and they worked fine with far less resources than any system has today), but I'd avoid running a beowulf diskless for performance reasons UNLESS, of course, your calculation(s) don't use disk to speak of! Having the diskless boot and diskless boot install options, though, is great. If a local disk crashes I don't care. I yank it, pop in the floppy, and boot and in five minutes the system is back up at only a small and disk-oriented performance penalty. As you note, it also really streamlines maintenance. Every system, diskless or not, is identical and can be reinstalled identical should it become corrupted in a very few minutes. > Should I make what I've already done available? I certainly think that diskless boot floppies are a REALLY valuable tool, and that a diskless operation mode should be an option for the cash strapped and disk-unbound. Putting together a portable "kit" for it would be great -- the tarball I posted above is such a kit, but I haven't worked to make it portable. So sure, go for it. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From caskey@technocage.com Thu, 1 Oct 1998 02:39:28 -0400 Date: Thu, 1 Oct 1998 02:39:28 -0400 From: Caskey L. Dickson caskey@technocage.com Subject: Installing new system over network? On Wed, 30 Sep 1998, Bill Broadley wrote: > I have a single floppy + script I run that will take a pc and turn > it into a diskless workstation. It's great for low maintance, cheap > nodes without a disk. Theres no "installation" you just boot. > > Of course with a beowulf cluster you would want local swap (if > your swapping at all), and local data (for better bandwidth), but > that would be easy to support. Has anyone tried using a swap file mounted from an msdos partition? If that works, then just have the boot disk create a 100MB file in the root of C:, and add a line to autoexec.bat to automagically remove it next boot. Instant local swap that removes itself. C=) -------------------------------------------------------------------------- There is hardly a thing in the world that some man can not make a little worse and sell a little cheaper. -------------------------------------------------------------------------- Caskey /// pager.818.698.2306 TechnoCage Inc. ///| gpg: 1024D/7BBB1485 -------------------------------------------------------------------------- I didn't fight my way to the top of the food chain to be a vegetarian. From thorpe@cerco.ups-tlse.fr Thu, 1 Oct 1998 04:00:55 -0400 Date: Thu, 1 Oct 1998 04:00:55 -0400 From: Simon Thorpe thorpe@cerco.ups-tlse.fr Subject: Beowulf And Digital Signal Processing - Success! >> Texas instrument is selling its (peak) 60MFlop chip TMS320C32 for $9.95, > >But for most people's purposes, they need a system built around it, >not just a cheap chip. > >> The columbia QCD machine has 8192 such hips, connected by a 4d >> serial network(Space+time). > >And the SYSTEM is very expensive. I suspect that it is not more >cost-effective than a commodity cluster, despite being built with $10 >cpus. > >> One could easily host this on a Linux >> box, and even write for it--I believe egcs has a port to this series of >> chips.(cross-compiler) > >That is an area of active interest in the military and other areas: >boards plugged into a general purpose system. Deep Thought was built >that way. But ease-of-use, pricing, and keeping up with technology >trends have always been problems. The issues are the same as with >"attached processors" or "array processors" from the 1960's and >1970's. > >-- g One of the possible future developments of the 8 module multiprocessor board that we have been developing with Chaltech (see Kragen's web page http://www.dnaco.net/~kragen/sa-beowulf/ is the idea of producing alternative CPU daughtercards using processors other than StrongARM. For example, I seem to remember Neil Carson talking about the possibility of using SHARC DSP chips. Anyway, it seems to me that the possibility of just changing the daughterboard to try out new processors in a Beowulf environment is an interesting one which would make it much easier to develop new hardware options. If there are people interested in this route, then I suggest that you get in contact with Chaltech - they're a very reasonable bunch :-) Cheers Simon __________________ Simon Thorpe Centre de Recherche Cerveau et Cognition 133, route de Narbonne 31062 Toulouse France Tel 33 (0)5 62 17 28 03. Fax 33 (0)5 62 17 28 09 http://www.cerco.ups-tlse.fr/private/simon.html __________________ From doering@iti.mu-luebeck.de Thu, 1 Oct 1998 04:09:14 -0400 Date: Thu, 1 Oct 1998 04:09:14 -0400 From: Andreas C. Doering doering@iti.mu-luebeck.de Subject: remote boot for embedded board Hi, this topic more or less relates to the topics "Beowulf - Single Board Computer" and diskless booting. I want to use an SBC for debugging ahardware project. I want to run linux on this board and boot it via (fast - if possible) ethernet. Furthermore I want to plug in a Myrinet card. And I need a bunch of digital ports on a suitable connector to interface to my hardware. Thus I need a board with one PCI slot, an ethernet interface which is able to dual boot and enough memory for running Linux. Any suggestions? Andreas --------------------------------------------------------------- Andreas Doering Medizinische Universitaet zu Luebeck Institut fuer Technische Informatik Ratzeburger Allee 160 D-23538 Luebeck Germany Tel.: +49 451 500-3741 Fax: +49 451 500-3687 Email: doering@iti.mu-luebeck.de ---------------------------------------------------------------- From Andre.Landwehr@Bertelsmann.de Thu, 1 Oct 1998 05:49:34 -0400 Date: Thu, 1 Oct 1998 05:49:34 -0400 From: Andre.Landwehr@Bertelsmann.de Andre.Landwehr@Bertelsmann.de Subject: Installing new system over network? > Has anyone tried using a swap file mounted from an msdos > partition? If > that works, then just have the boot disk create a 100MB file > in the root > of C:, and add a line to autoexec.bat to automagically remove it next > boot. Instant local swap that removes itself. I just tried to create (via dd), "mkswap" and "swapon" a file on a FAT-partition on my system, none of the commands threw an error -> it seems to work! Of course you got to mount the FAT-Partition first and then create the swap-file, so I think you can't simply use the fstab-entry and let the rest be done automagically... Maybe you might include the lines in a later called rc-script.. Take care, Andre From caskey@technocage.com Thu, 1 Oct 1998 05:58:35 -0400 Date: Thu, 1 Oct 1998 05:58:35 -0400 From: Caskey L. Dickson caskey@technocage.com Subject: Installing new system over network? On Thu, 1 Oct 1998 Andre.Landwehr@Bertelsmann.de wrote: > > Has anyone tried using a swap file mounted from an msdos > > partition? If > > that works, then just have the boot disk create a 100MB file > > in the root > > of C:, and add a line to autoexec.bat to automagically remove it next > > boot. Instant local swap that removes itself. > > I just tried to create (via dd), "mkswap" and "swapon" a file on a > FAT-partition on my system, none of the commands threw an error -> it > seems to work! Of course you got to mount the FAT-Partition first and > then create the swap-file, so I think you can't simply use the > fstab-entry and let the rest be done automagically... Maybe you might > include the lines in a later called rc-script.. I meant to follow up this suggestion with one about also making an xxMB ext2fs file on the dos FS as well. THis may address the issue of big data sets that get chewed upon while processing. Also, it would allow for a compressed filesystem of a few 'important' tidbits to be either carried on the floppy or loaded once across the network. Ofc, the same junk about injecting "DEL C:\EXT2.FS" in the autoexec.bat file applies. C=) -------------------------------------------------------------------------- There is hardly a thing in the world that some man can not make a little worse and sell a little cheaper. -------------------------------------------------------------------------- Caskey /// pager.818.698.2306 TechnoCage Inc. ///| gpg: 1024D/7BBB1485 -------------------------------------------------------------------------- I didn't fight my way to the top of the food chain to be a vegetarian. From deadline@plogic.com Thu, 1 Oct 1998 07:34:30 -0400 Date: Thu, 1 Oct 1998 07:34:30 -0400 From: Douglas Eadline deadline@plogic.com Subject: Beowulf in a Box (fwd) On Tue, 29 Sep 1998, Robert G. Brown wrote: > On Tue, 29 Sep 1998, Kragen wrote: > > > On Tue, 29 Sep 1998, Robert G. Brown wrote: > > > I wouldn't be surprised if an Intel > > > Human or two listens in on the beowulf list, but for obvious reasons > > > (if one thinks about it) they need to be mousey-quiet. > > > > Well, I've thought about it. Why do they need to be mousey-quiet? --snip-- > > I'm curious -- Doug (Eadline), do you guys have to pass your turnkey > beowulfs through any kind of customs thing? Have you run up against > export controls, or do you ship your systems as "components" and avoid > the problem? Or do you export your systems at all? We pay very close attention to export controls. Although it seems there is no one in the Gov. (that we can find) that can answer questions about MTOPS - maximum theoretical operations per second (how systems are rated). I sometimes think the main goal of bureaucrat workers is to push all questions to another bureaucrat worker - OK so I'm naive. We have to check the export documentation etc. for restricted sites etc. We have not exported any systems yet, although we have several customers who are interested and are looking at our systems. As far as "are they PC or are the a Beowulf question. I have no answer to that. If we ship a rack of dual PII-450s one day and switch the next to the same customer, is this a supercomputer or is it net work of PC's? (I ask this question with respect to the export laws - we all know that it depends how you use it.) There are restrictions on interconnect technologies. And I suspect that this where the regulators will focus their efforts. They will have limits on the "box" in terms of MTOPS and limits on the interconnect. Although for coarse grained decryption problem, this has no impact. In any case, before we export anything we will always make every attempt to comply with the current laws. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From shachar@vipe.technion.ac.il Thu, 1 Oct 1998 07:38:49 -0400 Date: Thu, 1 Oct 1998 07:38:49 -0400 From: Shachar Tal shachar@vipe.technion.ac.il Subject: Beowulf And Digital Signal Processing - Success! Hi, On Thu, 1 Oct 1998, Simon Thorpe wrote: > One of the possible future developments of the 8 module multiprocessor > board that we have been developing with Chaltech (see Kragen's web page > http://www.dnaco.net/~kragen/sa-beowulf/ is the idea of producing > alternative CPU daughtercards using processors other than StrongARM. For > example, I seem to remember Neil Carson talking about the possibility of > using SHARC DSP chips. Anyway, it seems to me that the possibility of just > changing the daughterboard to try out new processors in a Beowulf > environment is an interesting one which would make it much easier to > develop new hardware options. I believe such daughterboards (and system design in general) would then need to take into consideration a lot of architectural differences, such a big/little-endians, integer size, FP notation (and other more complicated issues such as having a binary for each architecture and process migration between architectures if even feasible). Just my $0.02, Shachar. Shachar Tal ------------- Taub Computer Center, Technion, Israel Institute of Technology KeyID 0481FEF1 fingerprint = 52 1B 97 6A F2 77 AE C6 64 B6 5A 5E 14 28 8E 7E From lindahl@cs.virginia.edu Thu, 1 Oct 1998 07:49:18 -0400 Date: Thu, 1 Oct 1998 07:49:18 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Mixed hardware (fwd) > With regard to data transmission, big/little endian problems. This has > been addressed by the little used ASN.1 protocol developed in originally > in 1990 for the OSI networking model. It's also automatically dealt with in systems such as PVM and Legion. Isn't ASN.1 that protocol which essentially involves converting everything into text and back? Yech. PVM uses xdr, and Legion uses receiver-makes-right, so no transformation is required if the sender and receiver happen to be of the same type. -- g From efinch@cais.com Thu, 1 Oct 1998 07:52:01 -0400 Date: Thu, 1 Oct 1998 07:52:01 -0400 From: Ed Finch efinch@cais.com Subject: Installing new system over network? "Huntress, Gary B." wrote: > I would greatly appreciate it if someone could summarize the > steps necessary for a network installation (or maybe even just a URL). > In the words of Paul Reiser "...talk to me like I'm four..." I thought the Extreme Linux CD supported a cookie-cutter installation where you 1) created a floppy with your install preferences, 2) booted the new nodes from the floppy and 3) completed the install via FTP. Regards, Ed -- Q: Why do PCs have a reset button on the front? A: Because they are expected to run Microsoft operating systems. From kragen@pobox.com Thu, 1 Oct 1998 07:59:57 -0400 Date: Thu, 1 Oct 1998 07:59:57 -0400 From: Kragen kragen@pobox.com Subject: Beowulf in a Box (fwd) On Thu, 1 Oct 1998, Douglas Eadline wrote: > There are restrictions on interconnect technologies. > And I suspect that this where the regulators will > focus their efforts. It could be a rather bad thing if they decided that exporting PCI buses was illegal. :) Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From kragen@pobox.com Thu, 1 Oct 1998 08:11:12 -0400 Date: Thu, 1 Oct 1998 08:11:12 -0400 From: Kragen kragen@pobox.com Subject: Beowulf And Digital Signal Processing - Success! On Thu, 1 Oct 1998, Simon Thorpe wrote: > One of the possible future developments of the 8 module multiprocessor > board that we have been developing with Chaltech (see Kragen's web page > http://www.dnaco.net/~kragen/sa-beowulf/ I'd rather people published the URL http://www.pobox.com/~kragen/sa-beowulf/ instead. > is the idea of producing > alternative CPU daughtercards using processors other than StrongARM. For > example, I seem to remember Neil Carson talking about the possibility of > using SHARC DSP chips. Anyway, it seems to me that the possibility of just > changing the daughterboard to try out new processors in a Beowulf > environment is an interesting one which would make it much easier to > develop new hardware options. Well, the hardware part is the easy part. It's getting software to run on new and interesting hardware that makes it worthwhile. Getting C code to run on the StrongARMs is pretty simple; you boot NetBSD on one, NFS mount your disk, and use gcc to build an executable. Getting the same code to run on a DSP could be a lot more difficult: first of all, I'm not at all certain that gcc even supports any DSPs, so you'd probably have to use the vendor's compiler (which is generally expensive); second, you can't run Unix on them, because they don't have MMUs (at least, the ones I looked at the datasheets for yesterday don't have MMUs -- do any DSPs?) so you have to run some kind of embedded OS; third, while you'll probably get some performance gain from the fast multiply units and VLIW architectures that seem to be popular these days, you won't get any performance gain from the hardware FFT acceleration unless your compiler puts in code that uses the FFT instructions. The TigerSHARC board ran special-purpose code for a raytracer. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From lindahl@cs.virginia.edu Thu, 1 Oct 1998 08:17:38 -0400 Date: Thu, 1 Oct 1998 08:17:38 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Beowulf And Digital Signal Processing - Success! > One of the possible future developments of the 8 module multiprocessor > board that we have been developing with Chaltech (see Kragen's web page > http://www.dnaco.net/~kragen/sa-beowulf/ is the idea of producing > alternative CPU daughtercards using processors other than StrongARM. For > example, I seem to remember Neil Carson talking about the possibility of > using SHARC DSP chips. If the SHARC can run Unix, then great -- you can provide a software environment that looks like a cluster of unix machines. If SHARC can't run Unix, then those boards won't look anything like the StrongArm-based boards, from a software viewpoint. > Anyway, it seems to me that the possibility of just > changing the daughterboard to try out new processors in a Beowulf > environment is an interesting one which would make it much easier to > develop new hardware options. Yes, although it's been done before and wasn't a great success. However, keeping your design fingers in several pots is a good way to make sure you don't wake up tomorrow and your product is obselete and you have to start over. The trick is making sure that your software environment changes slowly. -- g From kragen@pobox.com Thu, 1 Oct 1998 08:43:21 -0400 Date: Thu, 1 Oct 1998 08:43:21 -0400 From: Kragen kragen@pobox.com Subject: Installing new system over network? On Thu, 1 Oct 1998, Caskey L. Dickson wrote: > On Thu, 1 Oct 1998 Andre.Landwehr@Bertelsmann.de wrote: > > > Has anyone tried using a swap file mounted from an msdos > > > partition? If > > > that works, then just have the boot disk create a 100MB file > > > in the root > > > of C:, and add a line to autoexec.bat to automagically remove it next > > > boot. Instant local swap that removes itself. > > . . . > > Ofc, the same junk about injecting "DEL C:\EXT2.FS" in the autoexec.bat > file applies. Don't do this if you run dosemu. :) Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From STU2997@atuvm.atu.edu Thu, 1 Oct 1998 10:19:46 -0400 Date: Thu, 1 Oct 1998 10:19:46 -0400 From: HORTON, DEREK K STU2997@atuvm.atu.edu Subject: netboot (what is netboot) Hello Not to sound stupid but what does the netboot package do? Thanks for your help Derek Horton stu2997@atuvm.atu.edu From cbohn@afit.af.mil Thu, 1 Oct 1998 11:11:48 -0400 Date: Thu, 1 Oct 1998 11:11:48 -0400 From: Bohn, Christopher A. cbohn@afit.af.mil Subject: Configuration difficulties Good day, Well, I just wanted to repeat my thanks to everybody -- I've got the system recognizing memory beyond 64MB; I've got all the swap space I wanted. And I've even isolated the problem with the network gateway. By talking to cooperative system administrators for a couple of the UNIX LANs here, we "snooped" and pinged/tracerouted/telneted several ways from Linux & NT, and determined that for some reason when the gateway forwarded a packet from one of the other NT boxes, it translated the "local" IP address (172.16.0.x) into the "world-visible" IP address (129.92.108.240), but when it forwarded IP packets from a Linux box, it forwarded the packet without modifying the "from" field -- so the targeted system was receiving the packets, but it didn't know where to send the reply. When the NT admin gets back from vacation or whereever he's run off to for the last couple weeks, I'm going to have to talk to him to see how we can resolve this. Thanks again, take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* From rgb@phy.duke.edu Thu, 1 Oct 1998 11:20:28 -0400 Date: Thu, 1 Oct 1998 11:20:28 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Beowulf in a Box (fwd) On Thu, 1 Oct 1998, Kragen wrote: > On Thu, 1 Oct 1998, Douglas Eadline wrote: > > There are restrictions on interconnect technologies. > > And I suspect that this where the regulators will > > focus their efforts. > > It could be a rather bad thing if they decided that exporting PCI buses > was illegal. :) This would be funnier if it weren't so likely to be true...quite soon, actually. PCI-2 at 4 Gbps-500 MB/sec bandwidth as the basis for a beowulf made up of a bunch of alpha SBC's would "do" nicely, I think. PCI-2 with multiple fast channels (Myrinet or GBE) might do it with commodity parts. It won't be long before the unstoppable force (Moore's Law) hits the immovable object (stupidity). As always, we should expect chaos (what the Chinese call "interesting times") at the interface. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From n5jxs@tamu.edu Thu, 1 Oct 1998 11:28:16 -0400 Date: Thu, 1 Oct 1998 11:28:16 -0400 From: Gerry Creager n5jxs@tamu.edu Subject: System reconfig time I seem to have scewed up something in my configuration, and now neither MPICH nor PVM runs. I've a small cluster of 6 machines. 5 have identical configuration and file structure, while the 6th has a "unique" file structure (more disk, more real use, worldly node...) but the PVM and MPICH paths are common. When I try to use RSH, the remote node can't find the appropriate path. If I do an 'rsh set' the path only points to . If I do an 'rsh echo $PATH' it gives the same path I get if I actually log in. I am now officially confused. I thought rsh read .profile, but that's apparently not the case. I'm open to suggestions. Thanks, gerry cerager Mapping Sciences Laboratory Texas A&M University From hendriks@cesdis1.gsfc.nasa.gov Thu, 1 Oct 1998 11:36:08 -0400 Date: Thu, 1 Oct 1998 11:36:08 -0400 From: Erik Arjan Hendriks hendriks@cesdis1.gsfc.nasa.gov Subject: netboot (what is netboot) On Thu, 1 Oct 1998, HORTON, DEREK K wrote: > Not to sound stupid but what does the netboot package do? The netboot package provides boot code that uses BOOTP and TFTP to download a kernel image. Basically, it performs the the same function as LILO (Getting a kernel image into RAM and executing it.) except that netboot gets the image via a network instead of from a hard disk. The netboot package will provide you with images you can put on an EEPROM on a network card, on a floppy disk and one or two other devices. (Flashcards?) It may also be possible to flash the boot ROM image into your system BIOS. (This is what I've done on our cluster here.) - Erik ------------------------------------------------------------ Erik Hendriks hendriks@cesdis.gsfc.nasa.gov From philm@uidaho.edu Thu, 1 Oct 1998 11:58:24 -0400 Date: Thu, 1 Oct 1998 11:58:24 -0400 From: Philip J. Matheson philm@uidaho.edu Subject: Kiva-3 on a Beowulf? Hi, I'm interested in running this software (Kiva-3) on a beowulf system. I'm wondering if anyone here has had experiance with Kiva or if anyone can tell me if this application is suited for running on a beowulf. The website (http://gopher.ccs.ornl.gov/ccii/enginesim.html) metions PVM/MPI but says the code is optomized for running on Intel Paragon Supercomputers. I'm not sure what this means for me, will I have to completely rewrite code or will I just need to "optimize" the code to run on a beowulf? I'm assuming here that I'll have access to the code... Thanks, -- Philip J. Matheson Software Engineer / System Administrator National Institute for Advanced Transportation Technology philm@uidaho.edu (208)-883-9656 From johns@fishnet.caltech.edu Thu, 1 Oct 1998 12:07:48 -0400 Date: Thu, 1 Oct 1998 12:07:48 -0400 From: John Salmon johns@fishnet.caltech.edu Subject: sqrt 10x slower in 2.0.7-19 than 2.0.6-9 ??? I am encountering some very strange performance numbers with sqrt in the glibc math library on the alpha. Executive summary: sqrt from the glibc-2.0.6-9.alpha.rpm appears to be over 10 times faster than sqrt in glibc-2.0.7-19.alpha.rpm. Details: I have two machines on which I can run tests. Both are 533Mhz EV56 systems. The primary hardware difference is that one is an LX164 and the other is an SX164. I don't think that's relevant to the problem, but 'full disclosure' is always a good idea. /proc/cpuinfo is attached at the end of this note. Machine 1 (LX164) has RH5.1 and glibc-2.0.7-19 installed from the updates rpm. Machine 2 (SX164) has RH5.0 and glibc-2.0.6-9 installed from an rpm (source unknown, possibly from updates, but it no longer exists at ftp.redhat.com). A very simple code that times a loop of 1million calls to sqrt is attached. I compiled it with -static on both platforms and then ran the resulting executable on both platforms. On both platforms, the glib-2.0.6-9 code runs nearly eleven times faster. For example: [johns@paranal junk]$ sqrtloop.static.06 1000000 sqrts in 0.1952 sec, 5.12295e+06 per sec [johns@paranal junk]$ sqrtloop.static.07 1000000 sqrts in 2.13549 sec, 468277 per sec [johns@paranal junk]$ Compiling without -static gives consistent results. I..e., the machine with the 2.0.6 dynamic library is faster by about a factor of 10, regardless of where the compilation was performed). So the question is: Why is the 2.0.7 sqrt so much slower than the 2.0.6?? I thought I'd take a look at glibc-2.0.6-9.src.rpm to try to find out. I can't find it anywhere! (HotBot, ftp.redhat.com). Does anybody have an archived copy? Thanks for your help, John Salmon Appendix 1: C source code to sqrtloop.c [johns@paranal junk]$ cat sqrtloop.c --------------------------- #include #include #include #include #define NTIMES 1000000 int main(int argc, char **argv){ int i = NTIMES; double x = 0.7; clock_t tstart, tend; double seconds; tstart = clock(); while(--i){ x = sqrt(x) + 0.5; } tend = clock(); seconds = ((double)(tend-tstart))/CLOCKS_PER_SEC; printf("%d sqrts in %g sec, %g per sec\n", NTIMES, seconds, NTIMES/seconds); return 0; } ----------------------------- Appendix 2: /proc/cpuinfo details Machine 1 (glibc-2.0.7-19.rpm) [johns@paranal junk]$ cat /proc/cpuinfo cpu : Alpha cpu model : EV56 cpu variation : 0 cpu revision : 0 cpu serial number : Linux_is_Great! system type : EB164 system variation : LX164 system revision : 0 system serial number : MILO-0000 cycle frequency [Hz] : 0 timer frequency [Hz] : 1024.00 page size [bytes] : 8192 phys. address bits : 40 max. addr. space # : 127 BogoMIPS : 530.57 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 134 (pc=155556b77a0,va=11ffff8c8) platform string : N/A Machine 2 (glibc-2.0.6-9.rpm): [johns@avalon2 libm]$ cat /proc/cpuinfo cpu : Alpha cpu model : EV56 cpu variation : 0 cpu revision : 0 cpu serial number : Linux_is_Great! system type : EB164 system variation : SX164 system revision : 0 system serial number : MILO-0000 cycle frequency [Hz] : 0 timer frequency [Hz] : 1024.00 page size [bytes] : 8192 phys. address bits : 40 max. addr. space # : 127 BogoMIPS : 528.48 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 220 (pc=155556dc6b0,va=11ffffc40) platform string : N/A From aj@arthur.rhein-neckar.de Thu, 1 Oct 1998 17:01:32 -0400 Date: Thu, 1 Oct 1998 17:01:32 -0400 From: Andreas Jaeger aj@arthur.rhein-neckar.de Subject: sqrt 10x slower in 2.0.7-19 than 2.0.6-9 ??? >>>>> John Salmon writes: > I am encountering some very strange performance numbers with sqrt in > the glibc math library on the alpha. > Executive summary: sqrt from the glibc-2.0.6-9.alpha.rpm appears to be > over 10 times faster than sqrt in glibc-2.0.7-19.alpha.rpm. > [...] > So the question is: > Why is the 2.0.7 sqrt so much slower than the 2.0.6?? > [...] Have a look at PR libc/423 (via http://www-gnats.gnu.org:8080/cgi-bin/wwwgnats.pl) and check exactly why the following fix has been made (ignore the timestamp, the fix is not in 2.0.6): 1997-09-04 13:19 Richard Henderson * sysdeps/alpha/w_sqrt.S: Removed. * sysdeps/alpha/fpu/e_sqrt.c: New. Obey -mieee and -mieee-with-inexact and build a version that is as fast as possible given the constraint. [PR libc/423]. In a nutshell: The code is slower - but it gives better (exacter) results. Andreas -- Andreas Jaeger aj@arthur.rhein-neckar.de jaeger@informatik.uni-kl.de for pgp-key finger ajaeger@aixd1.rhrk.uni-kl.de From pedward@sun4.apsoft.com Thu, 1 Oct 1998 17:40:38 -0400 Date: Thu, 1 Oct 1998 17:40:38 -0400 From: Perry Harrington pedward@sun4.apsoft.com Subject: sqrt 10x slower in 2.0.7-19 than 2.0.6-9 ??? Have you compared the unaligned accesses of before/after for each of the machines? It's possible that the newer glibc code has an unaligned access that is being emulated by the PAL code. --Perry > ----------------------------- > > Appendix 2: /proc/cpuinfo details > > Machine 1 (glibc-2.0.7-19.rpm) > [johns@paranal junk]$ cat /proc/cpuinfo > cpu : Alpha > cpu model : EV56 > cpu variation : 0 > cpu revision : 0 > cpu serial number : Linux_is_Great! > system type : EB164 > system variation : LX164 > system revision : 0 > system serial number : MILO-0000 > cycle frequency [Hz] : 0 > timer frequency [Hz] : 1024.00 > page size [bytes] : 8192 > phys. address bits : 40 > max. addr. space # : 127 > BogoMIPS : 530.57 > kernel unaligned acc : 0 (pc=0,va=0) > user unaligned acc : 134 (pc=155556b77a0,va=11ffff8c8) > platform string : N/A > > Machine 2 (glibc-2.0.6-9.rpm): > [johns@avalon2 libm]$ cat /proc/cpuinfo > cpu : Alpha > cpu model : EV56 > cpu variation : 0 > cpu revision : 0 > cpu serial number : Linux_is_Great! > system type : EB164 > system variation : SX164 > system revision : 0 > system serial number : MILO-0000 > cycle frequency [Hz] : 0 > timer frequency [Hz] : 1024.00 > page size [bytes] : 8192 > phys. address bits : 40 > max. addr. space # : 127 > BogoMIPS : 528.48 > kernel unaligned acc : 0 (pc=0,va=0) > user unaligned acc : 220 (pc=155556dc6b0,va=11ffffc40) > platform string : N/A > -- Perry Harrington Linux rules all OSes. APSoft () email: perry@apsoft.com Think Blue. /\ From johns@cacr.caltech.edu Thu, 1 Oct 1998 19:42:56 -0400 Date: Thu, 1 Oct 1998 19:42:56 -0400 From: John Salmon johns@cacr.caltech.edu Subject: sqrt 10x slower in 2.0.7-19 than 2.0.6-9 ??? >>>>> John Salmon writes: > I am encountering some very strange performance numbers with sqrt in > the glibc math library on the alpha. > Executive summary: sqrt from the glibc-2.0.6-9.alpha.rpm appears to be > over 10 times faster than sqrt in glibc-2.0.7-19.alpha.rpm. > [...] > So the question is: > Why is the 2.0.7 sqrt so much slower than the 2.0.6?? > [...] Andreas Jaeger wrote: > 1997-09-04 13:19 Richard Henderson > > * sysdeps/alpha/w_sqrt.S: Removed. > * sysdeps/alpha/fpu/e_sqrt.c: New. Obey -mieee and -mieee-with-inexact > and build a version that is as fast as possible given the constraint. > [PR libc/423]. > > > In a nutshell: The code is slower - but it gives better (exacter) > results. Yup! That's it. If I read the #ifdef's and comments correctly, the newer library always uses a bit-by-bit purely-integer algorithm. There must be a better way. A factor of 10 in all cases seems like a high price to pay so that extreme input values are handled correctly. Is it known why the Reciproot algorithm fails (only?) on alphas? The problem-report indicates that the problem was observed for extreme argument values (2^-1002, I think). Is there any possibility of introducing a test for input values in the problem-range, but allowing the common case to be computed with one of the faster algorithms? Thanks again, John Salmon From efinch@cais.com Thu, 1 Oct 1998 21:01:21 -0400 Date: Thu, 1 Oct 1998 21:01:21 -0400 From: Ed Finch efinch@cais.com Subject: Beowulf in a Box (fwd) Douglas Eadline wrote: > > On Tue, 29 Sep 1998, Robert G. Brown wrote: > > If we ship > a rack of dual PII-450s one day and switch the next to the > same customer, is this a supercomputer or is it net work of PC's? > (I ask this question with respect to the export laws - we all > know that it depends how you use it.) Hypothetical question: What if someone builds a Beowulf cluster and makes it accessible to the world? Are there restrictions on the use/availability of compute time? Regards, Ed -- Q: Why do PCs have a reset button on the front? A: Because they are expected to run Microsoft operating systems. From jcthomso@mtu.edu Thu, 1 Oct 1998 21:59:46 -0400 Date: Thu, 1 Oct 1998 21:59:46 -0400 From: Jimmy C. Thomson jcthomso@mtu.edu Subject: cd mirrors Does anyone know where there are mirrors for ftp://beowulf.gsfc.nasa.gov/mirror/extreme_linux/ I am planning on burning a cd and this site doesn't support the .tar extention on the end of extreme_linux/ Thanks, Jimmy From rgb@phy.duke.edu Thu, 1 Oct 1998 23:04:57 -0400 Date: Thu, 1 Oct 1998 23:04:57 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: System reconfig time On Thu, 1 Oct 1998, Gerry Creager wrote: > When I try to use RSH, the remote node can't find the appropriate path. > If I do an 'rsh set' the path only points to > . If I do an 'rsh echo > $PATH' it gives the same path I get if I actually log in. > > I am now officially confused. I thought rsh read .profile, but that's > apparently not the case. Stupid question, but I assume that the default shell in question is /bin/sh? If it were csh or tcsh, the difference could be because of something screwing up asymmetrically in .cshrc vs .login. Or it could be a screwup in code inside one of those clever little if (! $?prompt ) conditionals, that are only executed (or not executed:-) when you are running a remote shell. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jacek@usq.edu.au Fri, 2 Oct 1998 01:38:55 -0400 Date: Fri, 2 Oct 1998 01:38:55 -0400 From: Jacek Radajewski jacek@usq.edu.au Subject: bWatch now auto refreshes Erik Cumps has provided two patches for bWatch. First patch allows to view uptime of each node, and the other provides auto-refresh. Both of these patches will be included in new releases. ftp://ftp.sci.usq.edu.au/pub/jacek/bWatch/contrib Jacek From rgb@phy.duke.edu Thu, 1 Oct 1998 23:04:57 -0400 Date: Thu, 1 Oct 1998 23:04:57 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: System reconfig time On Thu, 1 Oct 1998, Gerry Creager wrote: > When I try to use RSH, the remote node can't find the appropriate path. > If I do an 'rsh set' the path only points to > . If I do an 'rsh echo > $PATH' it gives the same path I get if I actually log in. > > I am now officially confused. I thought rsh read .profile, but that's > apparently not the case. Stupid question, but I assume that the default shell in question is /bin/sh? If it were csh or tcsh, the difference could be because of something screwing up asymmetrically in .cshrc vs .login. Or it could be a screwup in code inside one of those clever little if (! $?prompt ) conditionals, that are only executed (or not executed:-) when you are running a remote shell. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From prachya@science.gmu.edu Fri, 2 Oct 1998 06:03:28 -0400 Date: Fri, 2 Oct 1998 06:03:28 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Swap Size Hi, Is the 64MB swap size is OK for the machine with 128 MB RAM? The RH5.1 disk partition program does not allow me to create more than 64MB swap. --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From deadline@plogic.com Fri, 2 Oct 1998 06:56:39 -0400 Date: Fri, 2 Oct 1998 06:56:39 -0400 From: Douglas Eadline deadline@plogic.com Subject: Beowulf in a Box (fwd) On Thu, 1 Oct 1998, Ed Finch wrote: > Douglas Eadline wrote: > > > > On Tue, 29 Sep 1998, Robert G. Brown wrote: > > > > If we ship > > a rack of dual PII-450s one day and switch the next to the > > same customer, is this a supercomputer or is it net work of PC's? > > (I ask this question with respect to the export laws - we all > > know that it depends how you use it.) > > Hypothetical question: What if someone builds a Beowulf cluster > and makes it accessible to the world? Are there restrictions on > the use/availability of compute time? > I assume there are some restrictions in the US. Or at least you would probably be visited by some people who would consider the public availability of such a system not a good idea. However, a freely available resource would also mean the user would be showing his "cards" so it might deter "open" computing of secret things. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From cbohn@afit.af.mil Fri, 2 Oct 1998 07:09:46 -0400 Date: Fri, 2 Oct 1998 07:09:46 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: Swap Size Having recently dealth with a similar issue, the group pointed out to me a couple different ways to deal with this, including creating multiple swap partitions. (eg, 64MB+64MB=128MB). Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology     Phone (937)255-3636 (DSN 785) AFIT/EN638                              Lab x4606   Voicemail x6638 2950 P St, Box 4638                         email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765                 EngrBohn@aol.com                http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* -----Original Message----- From: owner-beowulf@beowulf.gsfc.nasa.gov [mailto:owner-beowulf@beowulf.gsfc.nasa.gov]On Behalf Of Prachya Chalermwat Sent: Friday, October 02, 1998 6:03 AM To: beowulf@beowulf.gsfc.nasa.gov Subject: Swap Size Hi, Is the 64MB swap size is OK for the machine with 128 MB RAM? The RH5.1 disk partition program does not allow me to create more than 64MB swap. --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From gordg@caprice.mb.ca Fri, 2 Oct 1998 07:53:47 -0400 Date: Fri, 2 Oct 1998 07:53:47 -0400 From: gordon grieder gordg@caprice.mb.ca Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) Douglas Eadline wrote: > > On Thu, 1 Oct 1998, Ed Finch wrote: > > > Douglas Eadline wrote: > > Hypothetical question: What if someone builds a Beowulf cluster > > and makes it accessible to the world? Are there restrictions on > > the use/availability of compute time? > > > I assume there are some restrictions in the US. Or at least > you would probably be visited by some people who would > consider the public availability of such a system not > a good idea. I was thinking of doing exactly this, which is why I'm lurking on the list. Canada's export laws aren't quite as draconian as those down south in the US. I live comfortably thanks to these machines we all use and love and was planning to have a small open Beowulf so that people without the resources for such a beast would be able to access this at no charge. FWIW, I'm planning on calling it "Freeowulf" > However, a freely available resource would also mean the user > would be showing his "cards" so it might deter "open" computing > of secret things. Perhaps, but a student in a less 'wired' or technologically progressed country may be able to prove his/her thesis with a small cluster. There's no reason that such potential in a person should be wasted because of a line on a map. -- gordon grieder phone 204.775.4473 caprice distributors ltd fax 204.772.9548 winnipeg, mb, canada From n5jxs@tamu.edu Fri, 2 Oct 1998 08:38:00 -0400 Date: Fri, 2 Oct 1998 08:38:00 -0400 From: Gerry Creager n5jxs@tamu.edu Subject: System reconfig time Robert G. Brown wrote: > > On Thu, 1 Oct 1998, Gerry Creager wrote: > > > When I try to use RSH, the remote node can't find the appropriate path. > > If I do an 'rsh set' the path only points to > > . If I do an 'rsh echo > > $PATH' it gives the same path I get if I actually log in. > > > > I am now officially confused. I thought rsh read .profile, but that's > > apparently not the case. > > Stupid question, but I assume that the default shell in question is > /bin/sh? If it were csh or tcsh, the difference could be because of > something screwing up asymmetrically in .cshrc vs .login. Or it could > be a screwup in code inside one of those clever little > > if (! $?prompt ) > > conditionals, that are only executed (or not executed:-) when you are > running a remote shell. I considered that. The default shell is tcsh. I looked at /etc/csh.cshrc and /etc/csh.login, made appropriate changes to put the path construction outside the conditional, and tried it again. No joy. Maybe I should chsh back to /bin/sh? Thanks, gerry From rgb@phy.duke.edu Fri, 2 Oct 1998 09:19:57 -0400 Date: Fri, 2 Oct 1998 09:19:57 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Swap Size On Fri, 2 Oct 1998, Prachya Chalermwat wrote: > Hi, > > Is the 64MB swap size is OK for the machine with 128 MB RAM? The RH5.1 > disk partition program does not allow me to create more than 64MB swap. So make swapfiles in an ext2 partition. See man mkswap. Also, see the enclosed script below (second or third time I've sent this out this week, is that weird or what). I'm surprised that RH only allows 64 MB swap partitions -- the linux limit is 128 MB per partition or file. There are various rituals for determining the ideal amount of swap. Some of them involve black roosters and robes. Others a spinner. Seriously, the way to think of swap is as an extension of your physical memory on a system. That is, the total size of stuff you can run is determined not by your "real" memory, but by the sum of real memory plus swap, which is the size of your "virtual memory". Actually, your virtual memory is even larger than this because the system is smart enough to know when a memory page in a running process comes from a particular inode on disk, and instead of swapping a redundant copy of this page it just reloads the page from disk. So you can think of VM as something like: Real Memory + Swap + Page Memory (refreshed from disk images). Page memory is somewhat variable in size and can even vary is size for a given process -- if you run a program the system will page to and from the program if/when real memory is exhausted, but if you delete the running program while the job is running (and things work right -- this has not always worked for me on various operating systems;-) the program's pages have to be read into real memory in case they are needed again. If RM gets full, these pages are then swapped. If you are running a beowulf (or really ANY system that does a lot of numerical work) then you should know that swapping is baaaaad. Disk (even the very fastest and bestest of disks) is slooooow compared to memory, and if your task has to go that particular well a lot a 10 minute job can easily turn into a 10 hour job. Indeed, one reason to set up a beowulf is that by partitioning certain jobs that would swap if run on a single host, you can avoid swapping at the expense of some network IPC's. If you have fast network, this is a worthwhile tradeoff, although disk have become fast enough that this may no longer be a winning strategy if your network is relatively slow. Still, it is a good idea to have a bit of swap because it makes the system overall run more efficiently. Suppose that you are running your system close to the wire, but are careful not to actually exhaust physical memory. You might still be surprised to see a bit of swap being used. What happened? The system uses physical memory for a lot of things. In particular, it uses all the space it can reasonably grab to buffer disk access, to cache data and libraries, and stuff like that. If you like, the kernel AUTOMATICALLY builds a ramdisk and whenever possible services a request for disk data from the ramdisk instead of the actual disk. Memory fast, disk slow -- this is one reason linux is very, very pleasant to use interactively, even on a loaded system. The system constantly resizes this transparent virtual storage medium, releasing old/unused stuff and filling it with new stuff, trying to keep a certain amount of it "free" and ready to be released QUICKLY to programs that request it with a malloc. Overall performance can easily be seen to suffer when the system can no longer buffer the disk with this memory -- it has to go to disk too often. SO, the system has the bright idea of writing out some of the pages it cannot refresh from disk but that have been idle for a long time to swap. It can refresh them from swap fairly efficiently, because it doesn't have to check all sorts of stat information like it does for stuff in the filesystem. This let it use whatever real memory it has for more important and volatile things, increasing overall performance. Finally, having swap is good if you have programs that SOMETIMES exceed physical + page memory. Without swap, a program simply fails (depending on how robustly it is written:-) when it tries to malloc VM that ain't there, and one can lose a LOT more time from failure than from running slow by swapping, especially if it is a poisson process type thing and only swaps for a few seconds while three programs push it over the line and then goes back to cool running when one of them terminates or frees a bit temporary structure. Now, how much swap is "enough"? The ritual answer is "one to two times real memory". Sun used to swear that you needed at least 2x real memory, but then they used to swear this back when average real memory size was 4-16 MB and the OS needed 4-8 all by itself. As an old guy (by computing standards, anyway:-) I cannot bring myself to give a system less than 1x real memory, but on a linux box with 128 MB of main memory and processes that, on the average, consume no more than half that, 64 is probably MORE than enough. If you tend to run close to the line, sometimes dipping into VM (so that when you check "free" you see that some swap has been consumed) you should probably go up to 1x RM. More is appropriate if your system is small -- 2x RM is reasonable if all you have is 32 or 64. Less is reasonable if your system is big and not overloaded -- I REALLY doubt that I need 512 MB of swap on a 512 MB system but being paranoid and conservative I provide it anyway. I seem to recall discussion that suggests that it is possible to provide "too much" swap. I don't know if this is true or not -- kernel experts might comment -- but I'd avoid going to extremes like 4x RM or more. First of all, if you actually USE 4x RM in a running calculation on a system with (say) 512 MB or RM, it will probably be Christmas of NEXT year before it finishes (figuratively speaking). Then, having 16 128 MB swapfiles might "stress" the kernel as it tries to manage all those pages. If it can handle that many -- there probably is a limit but I don't know what it is because it is almost certainly irrelevantly large. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu %< Snip Snip ============================================================ #!/bin/shell # This is a FRAGMENT of a longer script, not a script. If you want to use # it, edit it so that it works for you! Otherwise, it is a "recipe" for # what to do at the command line... /swap doesn't have to exist or # anything -- you can do this in any ext2 partition with enough room. # Now we make four swapfiles in the /swap partition echo "Now we make four swapfiles in the ext2 /swap partition" # /swap should already exist and not be mounted on diskless systems. mount /dev/sda2 /swap if [ ! $? = 0 ]; then cat << END Could not mount /dev/sda2 on /swap! Clean up and try again. END exit fi for SWAP in swap1 swap2 swap3 swap4 do echo "Making /swap/$SWAP" dd if=/dev/zero of=/swap/$SWAP bs=1024 count=130752 mkswap -c /swap/$SWAP 130752 if [ ! $? = 0 ]; then cat << END mkswap -c /swap/$SWAP 130752 failed. Clean up and try again. END exit fi sync done # Add swapfile startup to /etc/rc.d/rc.local (it should be safe to do so by # now) echo "Fix /etc/rc.d/rc.local so it starts swapping on new swapfiles." cat >> /mnt/etc/rc.d/rc.local << EOT # Start to swap on all swapfiles in /swap for SWAP in "/swap/swap*" do swapon \$SWAP done EOT From stevehi@soc.plym.ac.uk Fri, 2 Oct 1998 09:20:52 -0400 Date: Fri, 2 Oct 1998 09:20:52 -0400 From: Steve Hill stevehi@soc.plym.ac.uk Subject: extreme linux cd installation crash apologies if this message is going to the wrong place; if so please let me know where it should be directed to. i'm suffering from a consistant crash in the extreme linux installation from cd. The system is a PPro200 with twin 2.1Gb drives and a standard IDE cdrom installed.Because my SCSI card is not in the install SCSI list I unplug it, and set it up later . My system has been running RedHat (from ftp downloads) since redhat 3.something, so the system does run Linux quite happily. Most recently it has been running 5.0. Since I needed to change some things round, I decided to migrate to extreme linux so I could play with beowulf. anyhow.... The install consistantly crashes in the package installation (usually after about 20Mb of Installation, about in grep or fvwm (i'm not installing emacs!). I get a signal(7) message followed by a shutdown. If I reboot and try again without the hard drive formatting the system completely fails to find the RPM's. So, my questions are: 1. Has anyone else experienced this sort of problem? 2. Is it my hardware (stripping out boards seems to make no difference!)? 3. Any ideas on a fix? Thanks in advance Steve Hill Centre for Neural and Adaptive Systems University of Plymouth ============================================================================== It has thus become increasingly apparent that physical `reality', no less than social `reality', is at bottom a social and linguistic construct; that scientific `knowledge', far from being objective, reflects and encodes the dominant ideologies and power relations of the culture that produced it Alan Sokal From rdab100@hermes.cam.ac.uk Fri, 2 Oct 1998 10:33:44 -0400 Date: Fri, 2 Oct 1998 10:33:44 -0400 From: Dominic Baines rdab100@hermes.cam.ac.uk Subject: Beowulf in a Box (fwd) Douglas Eadline wrote: > On Thu, 1 Oct 1998, Ed Finch wrote: > > > Douglas Eadline wrote: > > > > > > On Tue, 29 Sep 1998, Robert G. Brown wrote: > > > > > > If we ship > > > a rack of dual PII-450s one day and switch the next to the > > > same customer, is this a supercomputer or is it net work of PC's? > > > (I ask this question with respect to the export laws - we all > > > know that it depends how you use it.) > > > > Hypothetical question: What if someone builds a Beowulf cluster > > and makes it accessible to the world? Are there restrictions on > > the use/availability of compute time? > > > I assume there are some restrictions in the US. Or at least > you would probably be visited by some people who would > consider the public availability of such a system not > a good idea. > > However, a freely available resource would also mean the user > would be showing his "cards" so it might deter "open" computing > of secret things. > Is there a publicly available cluster anywhere (that could be accessed over the internet under a guest login and run a process on) anyway ? Would it be of use to create one ? Dominic From kragen@pobox.com Fri, 2 Oct 1998 10:45:58 -0400 Date: Fri, 2 Oct 1998 10:45:58 -0400 From: Kragen kragen@pobox.com Subject: Beowulf in a Box (fwd) On Thu, 1 Oct 1998, Ed Finch wrote: > Hypothetical question: What if someone builds a Beowulf cluster > and makes it accessible to the world? Are there restrictions on > the use/availability of compute time? Yes. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From kragen@pobox.com Fri, 2 Oct 1998 11:09:43 -0400 Date: Fri, 2 Oct 1998 11:09:43 -0400 From: Kragen kragen@pobox.com Subject: Swap Size On Fri, 2 Oct 1998, Robert G. Brown wrote: > If you are running a beowulf (or really ANY system that does a lot of > numerical work) then you should know that swapping is baaaaad. Disk > (even the very fastest and bestest of disks) is slooooow compared to > memory, sloooow is a good way to describe it. For the uninitiated, a typical memory access time these days is 10 nanoseconds; a typical disk access time is 7 milliseconds, or 7,000,000 nanoseconds; a typical instruction execution time is 2-3 nanoseconds, and sometimes you can get better throughput than that. Disks are reasonably fast if you're reading or writing continuous chunks of data. Good disks can sustain six megabytes a second transfer rate worst-case for continuous chunks of data, if I understand correctly. PC100 SDRAM can transfer 400 megabytes a second, which is only 66 times as fast as the disk, instead of 700,000 times as fast. > If you have fast network, this is a worthwhile tradeoff, > although disk have become fast enough that this may no longer be a > winning strategy if your network is relatively slow. I've heard you can reasonably get 30MB/s across dual Fast Ethernet with Linux. I don't know how fast ping times are these days, though. > I seem to recall discussion that suggests that it is possible to provide > "too much" swap. I don't know if this is true or not -- kernel experts > might comment -- but I'd avoid going to extremes like 4x RM or more. I'm not a kernel expert, but the other day, I was running X, Afterstep, Netscape, xv, and GhostScript (to print out xv's PostScript output) on my machine, and I *really* wished I had less swap. It took more than a minute to go from one virtual desktop to the next one over; Ctrl-Alt-F1 took ten seconds to take effect. If I'd had less swap, GhostScript would just have died with an out-of-memory problem, and I would have written the PS to a file, killed xv and Netscape, and run GhostScript. It would have been much faster. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From phall@dufus.cc.westga.edu Fri, 2 Oct 1998 11:22:50 -0400 Date: Fri, 2 Oct 1998 11:22:50 -0400 From: Price Hall phall@dufus.cc.westga.edu Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) I've been thinking about this myself. I'm thinking we may see the emergence of computing 'cooperatives', similar to electrical utilities that helped with the rural electrification of America. A computing cooperative could fund a large cluster, manage its operation, and oversee the membership accounting, etc. Of course, being on the net would mean the cooperative would not be restricted by geography (except where law, as you mentioned, intrudes). A cooperative could also offer discounted or free time to certain classes of members - students, etc. As these clusters approach a level of standard setups, modular construction, and the software is enhanced, I think 'public utility' supercomputers are inevitable. Price Hall Information Technology Services State University of West Georgia phall@westga.edu From gordg@caprice.mb.ca Fri, 2 Oct 1998 11:43:52 -0400 Date: Fri, 2 Oct 1998 11:43:52 -0400 From: gordon grieder gordg@caprice.mb.ca Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) Price Hall wrote: [snip] > Of course, being on the net would mean > the cooperative would not be restricted by geography (except where law, as > you mentioned, intrudes). [snip] I think the powers that be are misleading themselves. To believe that unfriendly nations do not have access to military grade encryption or powerful computer systems is entirely delusional. Simple case in point: When my previously unnamed IP number could not resolve to a name known to be in the US or Canada, netscape.com would not allow me to download the 128 bit version of Communicator. I grabbed it from relay.com which is in the Netherlands along with the 128 bit modules for Apache. ANYHOW this is a beowulf list and I'm getting into politics. -- gordon grieder phone 204.775.4473 caprice distributors ltd fax 204.772.9548 winnipeg, mb, canada From rgb@phy.duke.edu Fri, 2 Oct 1998 12:13:32 -0400 Date: Fri, 2 Oct 1998 12:13:32 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Swap Size On Fri, 2 Oct 1998, Kragen wrote: > I've heard you can reasonably get 30MB/s across dual Fast Ethernet with > Linux. I don't know how fast ping times are these days, though. Neat! 100,000,000/8 = 12,500,000 x 2 = 25 MB/sec theoretical peak in one direction, less UDP and/or TCP overhead. Practical maximum (the highest I've measured with UDP and "perfect" packet sizes) is around 11.8 MB/sec. I'd believe 22-23 MB/sec in one direction with dual controllers if everything were tuned perfectly. 30 MB might be about right if one counts both directions (full duplex) and account for the not insignificant loading of the CPUs during the transfers -- sends are "fast" because the sender controls the scheduler and can parallelize the transfer with other stuff, but receives are "slow" (on a loaded CPU) because packet reception is asynchronous and Poissonian and the CPU is constantly interrupted and has to do a context switch to handle the packet :-(. Hence network transer speeds really benefit from having an idle CPU or an extra CPU, but fall off considerably as the extrinsic load (e.g. those annoying JOBS that generate all the data:-) increases. I'd even believe a bit more that 30 MB/sec is possible on a dual+ CPU, dual controller system. More Stupid Questions I could Probably Find The Answer To Myself If I Spent A Long Time Looking: Does the kernel practice any sort of predictive modeling of packet reception times in an incoming data stream? In a long transmission, packet times are very likely NOT poissonian; they are probably nearly fixed as the kernel on the sending end gets transmission synchronized with the other work being done on a sleep/wait cycle (I'm assuming DMA-type FE controllers, which I think most are now I think so the CPU can do other things after setting up the packet send). If the kernel used a very simple model of keeping track of say, the last four interpacket times and forced the context switch just BEFORE the anticipated arrival of a fifth, sixth, ... packet whenever the four preceding times fall into a small range (indicating a stream of some sort) and quits (restoring normal interrupt driven behavior) whenever a packet fails to arrive within some narrow window of the predicted time, it might serve to restore packet RECEPTION rates on a numerically or otherwise loaded CPU to close to what they are on an unloaded CPU (at some hopefully moderate cost in the CPU available to the background task(s)). This might really improve IPC latencies in a beowulf environment, without enforcing a strictly synchronous communications model. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From efinch@eos.EAST.HITC.COM Fri, 2 Oct 1998 12:16:14 -0400 Date: Fri, 2 Oct 1998 12:16:14 -0400 From: Ed Finch efinch@eos.EAST.HITC.COM Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) gordon grieder wrote: > > Douglas Eadline wrote: > > > However, a freely available resource would also mean the user > > would be showing his "cards" so it might deter "open" computing > > of secret things. > > Perhaps, but a student in a less 'wired' or technologically progressed > country may be able to prove his/her thesis with a small cluster. > There's no reason that such potential in a person should be wasted > because of a line on a map. I was thinking that a cluster could be built and compute time sold or auctioned off (not necessarily for money). Acceptable bids would be based on the number of nodes desired, length of run, etc. The owner of the cluster could receive the code, run it and return the results without anyone from the outside having access to the cluster itself. Ed Finch -- Q: Why do PCs have a reset button on the front? A: Because they are expected to run Microsoft operating systems. From philm@uidaho.edu Fri, 2 Oct 1998 12:17:19 -0400 Date: Fri, 2 Oct 1998 12:17:19 -0400 From: Philip J. Matheson philm@uidaho.edu Subject: extreme linux cd installation crash I've experienced this problem with many different RedHat installs.. (4.2, 5.0, 5.1). The problem can be fixed (sometimes) by making sure your HD is in LBA mode. Other things that have worked for me are: (in no particular order) 1) Make sure master/slave hd/cdrom relationship is correct 2) Try using another CD-ROM drive 3) Check the BIOS and make sure all cards/com ports/etc.. are not conflicting (a Com port conflict especially) If none of this works. I usually just plug the HD into another machine, install RH, and then put that drive into the machine that was causing problems. Your NIC and X windows will need to be reconfigured but that is much better than trying to install RH over and over and over again... :) My experience has been that some machines don't play nice with the RH install.. then once the install is up and running there are never any problems. I'm not sure why this is, but considering a lot of the installations I do are on marginal hardware I don't think this reflects on the quality of RedHat's products. Steve Hill wrote: > apologies if this message is going to the wrong place; if so please let me > know where it should be directed to. > > i'm suffering from a consistant crash in the extreme linux installation > from cd. The system is a PPro200 with twin 2.1Gb drives and a standard > IDE cdrom installed.Because my SCSI card is not in the install SCSI list > I unplug it, and set it up later . My system has > been running RedHat (from ftp downloads) since redhat 3.something, so the > system does run Linux quite happily. Most recently it has been running > 5.0. Since I needed to change some things round, I decided to migrate to > extreme linux so I could play with beowulf. anyhow.... > > The install consistantly crashes in the package installation (usually > after about 20Mb of Installation, about in grep or fvwm (i'm not > installing emacs!). I get a signal(7) message followed by a shutdown. If I > reboot and try again without the hard drive formatting the system > completely fails to find the RPM's. > > So, my questions are: > 1. Has anyone else experienced this sort of problem? > 2. Is it my hardware (stripping out boards seems to make no difference!)? > 3. Any ideas on a fix? > > Thanks in advance > > Steve Hill > Centre for Neural and Adaptive Systems > University of Plymouth > > ============================================================================== > It has thus become increasingly apparent that physical `reality', no less > than social `reality', is at bottom a social and linguistic construct; that > scientific `knowledge', far from being objective, reflects and encodes the > dominant ideologies and power relations of the culture that produced it > > Alan Sokal > -- Philip J. Matheson Software Engineer / System Administrator National Institute for Advanced Transportation Technology philm@uidaho.edu (208)-883-9656 From kragen@pobox.com Fri, 2 Oct 1998 12:39:22 -0400 Date: Fri, 2 Oct 1998 12:39:22 -0400 From: Kragen kragen@pobox.com Subject: Swap Size On Fri, 2 Oct 1998, Robert G. Brown wrote: > On Fri, 2 Oct 1998, Kragen wrote: > > I've heard you can reasonably get 30MB/s across dual Fast Ethernet with > > Linux. I don't know how fast ping times are these days, though. > > Neat! 100,000,000/8 = 12,500,000 x 2 = 25 MB/sec theoretical peak in > one direction, less UDP and/or TCP overhead. I did say *dual* Fast Ethernet. > Practical maximum (the > highest I've measured with UDP and "perfect" packet sizes) is around > 11.8 MB/sec. I'd believe 22-23 MB/sec in one direction with dual > controllers if everything were tuned perfectly. 30 MB might be about > right if one counts both directions (full duplex) and account for the > not insignificant loading of the CPUs during the transfers I thought someone (Miguel Barreiro Paz?) told me he'd actually measured 30 MB/s with Linux. Maybe I'm misremembering. What are the latencies like? Latencies are the killer with disks. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From rauch@inf.ethz.ch Fri, 2 Oct 1998 12:53:44 -0400 Date: Fri, 2 Oct 1998 12:53:44 -0400 From: Felix Rauch rauch@inf.ethz.ch Subject: Beowulf in a Box (fwd) On Fri, 2 Oct 1998, Dominic Baines wrote: > Is there a publicly available cluster anywhere (that could be accessed > over the internet under a guest login and run a process on) anyway ? > > Would it be of use to create one ? I guess such a cluster would soon be overloaded with raytracing and cracking jobs :-/ - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H15 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From wrankin@ee.duke.edu Fri, 2 Oct 1998 13:28:32 -0400 Date: Fri, 2 Oct 1998 13:28:32 -0400 From: William T. Rankin wrankin@ee.duke.edu Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) On Fri, 2 Oct 1998, Price Hall wrote: > I've been thinking about this myself. I'm thinking we may see the > emergence of computing 'cooperatives' www.ncsc.org >From what experience tells us, it is extremely difficult to make something like this a viable commercial endeavor. The technology curve and equipment depreciation alone can kill you financially. -bill From wrankin@ee.duke.edu Fri, 2 Oct 1998 13:41:45 -0400 Date: Fri, 2 Oct 1998 13:41:45 -0400 From: William T. Rankin wrankin@ee.duke.edu Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) On Fri, 2 Oct 1998, gordon grieder wrote: > Price Hall wrote: > [snip] > > Of course, being on the net would mean > > the cooperative would not be restricted by geography (except where law, as > > you mentioned, intrudes). Not true. Geographic restriction (and associated costs) for bandwidth between locations is very real. I have a high bandwidth pipe to the local supercomputing center (NCSC) , but not to the west coast nor the northeast. Very important if you are running any sort of visualiztion or interactive simulations. For high-perf computing, a T1 ain't gonna cut it. > [snip] > > I think the powers that be are misleading themselves. I think the "powers that be" also have a lot more information on this subject than either you or I have access to. From prachya@science.gmu.edu Fri, 2 Oct 1998 13:51:15 -0400 Date: Fri, 2 Oct 1998 13:51:15 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Swap Size Hi, Thanks for all your comprehensive discussion. On Fri, 2 Oct 1998, Kragen wrote: > On Fri, 2 Oct 1998, Robert G. Brown wrote: > > If you are running a beowulf (or really ANY system that does a lot of > > numerical work) then you should know that swapping is baaaaad. Disk > > (even the very fastest and bestest of disks) is slooooow compared to > > memory, > > sloooow is a good way to describe it. For the uninitiated, a typical > memory access time these days is 10 nanoseconds; a typical disk access > time is 7 milliseconds, or 7,000,000 nanoseconds; a typical instruction > execution time is 2-3 nanoseconds, and sometimes you can get better > throughput than that. > I agree with this. I like the program to be terminated (if there is not enough memory in VM) instead of being alive but very very very very slow (or almost equivalent to unworkable state :< ). The program should return and say "Not enough memory!!" instead of keep swapping. Thank you. --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From sean@ntr.net Fri, 2 Oct 1998 13:59:09 -0400 Date: Fri, 2 Oct 1998 13:59:09 -0400 From: Sean McPherson sean@ntr.net Subject: Ultra2-Wide SCSI I'm looking to set up a mondo server ( not a cluster, but a single machine ) for disk accesses. I'm considering the Symbios NCR895 Ultra-2 Wide SCSI cards. They claim to run at 80 MB/sec. Has anyone had any luck with these cards? I know they are basically a low voltage differential controller for a 7200-10000 disk, but do they really go as fast as they should? The machine will be a PII-350 (single) with 1 or 2 4.5 or 9.1 GB 7200 or 10000 RPM U2W disks (I know, a lot of options, but what I get kind of depends on any data I can find). If I don't hear anything, I'm buying the card with a single 9.1 GB Quantum, since the card is $195 and the drive is $619. Any info or pointers to other info would be greatly appreciated. Oh, the reason I'm not looking at Adaptec cards are A) Price, they run about $400 and B) drivers, I know Gerard Ruodier does the driver for the 895 and my 875 flies, while Adaptec has been troublesome in the past. Sean McPherson sean@ntr.net Network Operations Technician Information Systems ntr.net Corporation From lindahl@cs.virginia.edu Fri, 2 Oct 1998 14:37:17 -0400 Date: Fri, 2 Oct 1998 14:37:17 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Swap Size > I agree with this. I like the program to be terminated (if there is not > enough memory in VM) instead of being alive but very very very very slow > (or almost equivalent to unworkable state :< ). The program should return > and say "Not enough memory!!" instead of keep swapping. Thank you. Unfortunately, it is difficult to arrange for the correct program to be killed. Often you will instead lose some other process like inetd or portmapper or amd. So it is better to try to set limits on per- process memory usage and let that kill the overly-large task. This won't reflect the usage of other tasks, unfortunately, but at least it won't cause your system to become unusuable. -- g From glamm@ece.umn.edu Fri, 2 Oct 1998 14:40:39 -0400 Date: Fri, 2 Oct 1998 14:40:39 -0400 From: Bob Glamm glamm@ece.umn.edu Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) (fwd) > I've been thinking about this myself. I'm thinking we may see the > emergence of computing 'cooperatives', similar to electrical utilities > that helped with the rural electrification of America. A computing > cooperative could fund a large cluster, manage its operation, and oversee > the membership accounting, etc. Of course, being on the net would mean > the cooperative would not be restricted by geography (except where law, as > you mentioned, intrudes). A cooperative could also offer discounted or > free time to certain classes of members - students, etc. As > these clusters approach a level of standard setups, modular construction, > and the software is enhanced, I think 'public utility' supercomputers are > inevitable. > > Price Hall > Information Technology Services > State University of West Georgia > phall@westga.edu Funny, that almost sounds like... a SUPERCOMPUTING center... ;) At least that's the model that the Minnesota Supercomputer Institute (MSI) is more or less run on (granted, it's for U of MN faculty and their associated researchers, but for the students of faculty it's free). Or other supercomputing sites where one can buy SU's (these fund the supercomputing site) - say MSC, PSC, SDSC (?), NCSA (?).. the only drawback to this is that there's no "free" time offered to students (unless their part of the researcher's team of code monkeys ;) A lot of private industry jobs were run through MSC while I was directly involved with MSI. I don't think that 'public utility' supercomputers will ever become available (it's my opinion that commercially networked computer clusters are not supercomputers, including Beowulf). I'll hedge on the 'public utility' high-performance computing clusters and say that it could happen, but right now I can't see any economic reason why it should happen the way that's described above. If it happens I think all the time on them will be sold to private industry/small universities. -Bob From becker@cesdis1.gsfc.nasa.gov Fri, 2 Oct 1998 14:44:12 -0400 Date: Fri, 2 Oct 1998 14:44:12 -0400 From: Donald Becker becker@cesdis1.gsfc.nasa.gov Subject: Swap Size On Fri, 2 Oct 1998, Robert G. Brown wrote: > > I've heard you can reasonably get 30MB/s across dual Fast Ethernet with > > Linux. I don't know how fast ping times are these days, though. > > Neat! 100,000,000/8 = 12,500,000 x 2 = 25 MB/sec theoretical peak in > one direction, less UDP and/or TCP overhead. Practical maximum (the > highest I've measured with UDP and "perfect" packet sizes) is around > 11.8 MB/sec. I'd believe 22-23 MB/sec in one direction with dual 30MB/sec is roughly the maximum unidirectional performance with three (or four -- more doesn't help at this point) channel bonded links. I haven't measured bidirectional aggregate performance (Tx + Rx) -- 30MB/sec sounds about right with two channels. > not insignificant loading of the CPUs during the transfers -- sends are > "fast" because the sender controls the scheduler and can parallelize the > transfer with other stuff, but receives are "slow" (on a loaded CPU) > because packet reception is asynchronous and Poissonian and the CPU is > constantly interrupted and has to do a context switch to handle the > packet :-(. The assymmetry is often dominated by the work of decoding packets, rather than cache or scheduling. It's easy to load the constant '4' into a packet header. It takes considerably more work to dispatch on a set of possible values. > I'd even believe a bit more that 30 MB/sec is possible on a dual+ CPU, > dual controller system. My numbers come from a uniprocessor tests. The early SMP kernels didn't help raw network performance (i.e. no compute-intensive application running) much. > Does the kernel practice any sort of predictive modeling of packet > reception times... No, almost all network drivers are configured to interrupt as soon as a packet is received. This minimizes latency in the case where the machine is idle waiting for more work. (The typical state for most machines, cluster workloads excluded.) A few drivers (example: the Hamachi gigabit driver) take advantage of interrupt aggregation hardware on the adapter to defer the Rx interrupt to see if a packet follows immediately, but I do not use similar interrupt aggregation hardware on Fast Ethernet adapters. > If the kernel used a very simple model of keeping track > of say, the last four interpacket times and forced the context switch > just BEFORE the anticipated arrival of a fifth, sixth, ... packet > whenever the four preceding times fall into a small range (indicating a > stream of some sort) and quits (restoring normal interrupt driven > behavior) whenever a packet fails to arrive within some narrow window of > the predicted time, it might serve to restore packet RECEPTION rates on > a numerically or otherwise loaded CPU to close to what they are on an > unloaded CPU (at some hopefully moderate cost in the CPU available to > the background task(s)). The Linux Queue/Bottom-Half ("BH") model serves the same end. First, a note: A received packet doesn't cause a process-level context switch, it triggers the interrupt handler to run roughly as if the process had made a system call. Rather than interpret packets as they are removed from the hardware queue, Linux moves them on a receive queue and sets a "BH" flag. After all interrupts are handled, the BH dispatch checks the flag and runs the network protocol layer if needed. In the lightly-loaded case this structure adds latency and a little overhead vs. the do everything at interrupt time model. On a heavily loaded machine this structure allows the driver to remove multiple packets for each interrupt, and then process them all at once, improving access locality, throughput, and reducing interrupt latency for other interrupt handlers. Donald Becker becker@cesdis.gsfc.nasa.gov USRA-CESDIS, Center of Excellence in Space Data and Information Sciences. Code 930.5, Goddard Space Flight Center, Greenbelt, MD. 20771 301-286-0882 http://cesdis.gsfc.nasa.gov/people/becker/whoiam.html From patkus@helix.nih.gov Fri, 2 Oct 1998 14:52:56 -0400 Date: Fri, 2 Oct 1998 14:52:56 -0400 From: Mark Patkus patkus@helix.nih.gov Subject: Beowulf networking problem - Tulip & 3c905 cards I'm hoping that someone has encountered the following and can advise on a solution. My group has an 8-node Beowulf system - 330 MHz Pentium systems with 128 MB RAM, running RedHat 5.0, kernel version 2.0.33. The monolithic kernel has both the Tulip driver (latest version written by Donald Becker) and a 3c59x driver (version 0.99E by Becker) built into it. Each system currently has two 3COM Etherlink PCI XL (3c905) network cards and we are trying to replace one of the two network cards with a Kingston KNE100TX card (DEC 21040-based Tulip board) and we can not seem to get the Kinston card to work in conjunction with the 3c905 card. Using one network card of each type, the system only seems to recognize the 3c905 card on boot up - one of the messages on boot-up says "eth1 initialization delayed". The output from the ifconfig command only lists the 3c905 card at the eth0 interface. The output from /proc/pci is listed below, which only appears to show information regarding the Tulip card. Tried the case of substituting both 3c905 cards with the Tulip cards and that worked as the system was able to recognize both cards and they worked fine - thus, at a minimum the cards and drivers would appear to be good. If anyone has any suggestions on resolving this situation, I would appreciate the help. Mark Patkus Center for Information Technology National Institutes of Health Bethesda, MD Output from /proc/pci: PCI devices found: Bus 0, device 14, function 0: Ethernet controller: DEC DC21140 (rev 34). Medium devsel. Fast back-to-back capable. IRQ 10. Master Capable. Latency=64. Min Gnt=20.Max Lat=40. I/O at 0xde00. Non-prefetchable 32 bit memory at 0xefffff80. Bus 0, device 7, function 3: Bridge: Intel 82371AB PIIX4 Power Management (rev 1). Medium devsel. Fast back-to-back capable. Bus 0, device 7, function 2: USB Controller: Intel 82371AB PIIX4 (rev 1). Medium devsel. Fast back-to-back capable. IRQ 11. Master Capable. Latency=64. I/O at 0xdc00. Bus 0, device 7, function 1: IDE interface: Intel 82371AB 430TX PIIX4 (rev 1). Medium devsel. Fast back-to-back capable. Master Capable. Latency=64. I/O at 0xffa0. Bus 0, device 7, function 0: ISA bridge: Intel 82371AB PIIX4 (rev 1). Medium devsel. Fast back-to-back capable. Master Capable. No bursts. Bus 1, device 0, function 0: VGA compatible controller: Cirrus Logic GD 5465 (rev 3). Medium devsel. Fast back-to-back capable. IRQ 10. Master Capable. Latency=64. Min Gnt=16.Max Lat=16. Non-prefetchable 32 bit memory at 0xec000000. Non-prefetchable 32 bit memory at 0xefdf0000. Bus 0, device 1, function 0: PCI bridge: Intel Unknown device (rev 3). Vendor id=8086. Device id=7181. Medium devsel. Fast back-to-back capable. Master Capable. Latency=64. Min Gnt=11. Non-prefetchable 32 bit memory at 0x40010100. Non-prefetchable 32 bit memory at 0x22a0c0c0. Non-prefetchable 32 bit memory at 0xefe0ebe0. Non-prefetchable 32 bit memory at 0xe3c0e3c0. Bus 0, device 0, function 0: Host bridge: Intel Unknown device (rev 3). Vendor id=8086. Device id=7180. Medium devsel. Fast back-to-back capable. Master Capable. Latency=64. Prefetchable 32 bit memory at 0xe4000000. From rgb@phy.duke.edu Fri, 2 Oct 1998 15:22:06 -0400 Date: Fri, 2 Oct 1998 15:22:06 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Swap Size On Fri, 2 Oct 1998, Kragen wrote: > I thought someone (Miguel Barreiro Paz?) told me he'd actually measured > 30 MB/s with Linux. Maybe I'm misremembering. Well, 30 MB/sec data efficiency in one direction is possible only if he uses compression, as wirespeed max is less than that. I've never done full duplex tests (something on my agenda if I EVER have time) so I don't know how ever one channel manages when both transmitting as fast as it can and being hammered by another transmitter, also going as fast as it can. Oh, heck... OK, Unidirectional I get around 95.70 Mbps UPD_STREAM (netperf) between two beowulf nodes (dual 400 MHz PII's with eepro100's) with at least one idle CPU apiece and size 1472 messages. This is 11.96 x 10^6 bytes/second. The maximum packet size is 1514 (1500 MTU plus 14 ethernet header). The maximum data packet size for UDP is (pause while I look this up again)... 1500 - 20 (IP) - 8 (UDP) = 1472, so theoretical maximum is 1472/1514 = .9722x10^8 bits per second data = 97.22 Mbps = 12.15 x 10^6 bytes/second. Unidirectional efficiency is around 95.7/97.2 = 98.5%. Not bad, not bad at all. Bidirectional between the same two hosts (simultaneous netperfs running in opposite directions) I get -- uhh -- call it 95.65 Mpbs. Not even worth redoing the calculation. We conclude that my beowulf nodes, on a good day and with perfect packet sizes, we can get a smashing 99% of theoretical wirespeed for an aggregate full duplex bandwidth of just under 24 Mbytes/sec. If your message sizes are certain larger quanta, one can shave off just a hair of overhead in a long transmission stream (only the first packet needs the UDP header) and call it 24 MB sec even. This is (just now) measured, not theoretical. I would bet my sweet bippy that if I had a second eepro100 in each chassis running to the switch and had two idle CPU's on either end, I could get within a whisker of 48 MB/sec aggregate data bandwidth, 24 MB/sec in either direction. > What are the latencies like? Latencies are the killer with disks. Alas, networks too, only worse. There are several latencies to consider. If I now send a MINIMUM packet size UPD stream between the SAME two hosts I get all of 300 k (yes, that's kilo) bps unidirectional, and 50-200 kbps full duplex. The range there, by the way, reflects the fact that reception on the host running ONE SMALL JOB on a two processor system (plus the task running the transmission the other way) drops the reception rate by a factor of four. These ranges are typical. I've found that a 400 MHz PII can transmit as many as 75 kpps, but can only receive around 40 kpps, nearly independent of the payload size for small payloads. The data transmission rate therefore scales linearly with payload size until it begins to hit the nonlinear bottleneck associated with MTU and real line bandwidth. Inverting 40 kpps, one gets an effective interpacket latency of 25 microseconds (better than PPro's, which were around 50 microseconds, but still not great. The transmitter is nearly twice as good as the receiver (it can send faster that the receiver can receive) and a LOADED receiver's efficiency drops off to zilch very rapidly as the delays associated with the context switches and timesharing get right up there and even exceeds the time required to actually handle the packet by quite a large amount. At 50 kpps (6 kilobytes per second) a modem starts to look fast. There is one other measure of latency supported by netperf-- UDP and TCP request-response. Alas, I don't really know how to read and interpret it. I think that its results suggest RR rates in the order of 1000-5000 but I don't know what that is or even if big is good or bad. I should probably read up on this. If one compares disk seek times to network latencies, I suspect that the network wins but it would be interesting to learn how to work this out and measure it. So, there are a number of lessons to be learned from all this raw benchmarking. The most important is that big packets and messages are good, small ones are bad. A 100 Mbps switched network on a very fast bus with very fast CPUs can yield performance little better than a common modem if sending the data in one-byte payloads. To get optimal performance, pack your data into certain very specific message lengths -- some of the math is worked out on www.phy.duke.edu/brahma/ -- or at least use messages bigger than around 1K to get into the flattish part of the network saturation curve. This of course leads to a discussion of PVM and MPI and how (or if) they handle optimization of payload sizes. It would be extraordinarily easy to write either PVM or MPI code that shrink your "expected" 95 Mbps data throughput to less than 1 Mbps. Second, network speed (especially latencies -- raw bandwidth is less affected because once a packet is coming in it comes in efficienty enough -- it is the setup that kills you) is VERY CPU load dependent, according to my measurements, and a cursory look at what is going on helps one understand why. I'd much rather put a second CPU in a box with one network card than put two network cards with one CPU if I expect the box to be receiving data at the same time a calculation is going on, as I've measured halved reception rates even for big packet messages when the receiver CPU's were running a job at the same time. A second controller with an independent data stream with uncorrelated simultaneous incoming streams might make the efficiency multiplier even lower and could easily rob you of all your expected gains. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb@phy.duke.edu Fri, 2 Oct 1998 15:54:07 -0400 Date: Fri, 2 Oct 1998 15:54:07 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Swap Size On Fri, 2 Oct 1998, Donald Becker wrote: (a whole bunch of stuff). Don, that was great! I'm going to save that reply and frame it. I've wanted to learn a lot of that forever. (p.s., ignore all lines involving "context switch" in my last message or two and replace them with "decode burden". Although on my desktop, the context, intr and rx procmeter windows go neatly together with near direct proportionality on a UDP_STREAM receive, which explains why I though there were a lot of context switches...) rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From kragen@pobox.com Fri, 2 Oct 1998 15:56:20 -0400 Date: Fri, 2 Oct 1998 15:56:20 -0400 From: Kragen kragen@pobox.com Subject: Swap Size On Fri, 2 Oct 1998, Greg Lindahl wrote: > Unfortunately, it is difficult to arrange for the correct program to > be killed. Often you will instead lose some other process like inetd > or portmapper or amd. So it is better to try to set limits on per- > process memory usage and let that kill the overly-large task. This > won't reflect the usage of other tasks, unfortunately, but at least it > won't cause your system to become unusuable. This is not a sufficient solution. Per-process memory-usage limits would not have prevented the problem I cited earlier, wherein xv and GhostScript were causing the system to thrash. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From rgb@phy.duke.edu Fri, 2 Oct 1998 19:23:05 -0400 Date: Fri, 2 Oct 1998 19:23:05 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Beowulf networking problem - Tulip & 3c905 cards On Fri, 2 Oct 1998, Mark Patkus wrote: > I'm hoping that someone has encountered the following and can advise > on a solution. My group has an 8-node Beowulf system - 330 MHz Pentium > systems with 128 MB RAM, running RedHat 5.0, kernel version 2.0.33. The > monolithic kernel has both the Tulip driver (latest version written by > Donald Becker) and a 3c59x driver (version 0.99E by Becker) built into it. > Each system currently has two 3COM Etherlink PCI XL (3c905) network cards > and we are trying to replace one of the two network cards with a Kingston > KNE100TX card (DEC 21040-based Tulip board) and we can not seem to get the > Kinston card to work in conjunction with the 3c905 card. Using one > network card of each type, the system only seems to recognize the 3c905 > card on boot up - one of the messages on boot-up says "eth1 initialization > delayed". The output from the ifconfig command only lists the 3c905 card > at the eth0 interface. The output from /proc/pci is listed below, which > only appears to show information regarding the Tulip card. > > Tried the case of substituting both 3c905 cards with the Tulip cards > and that worked as the system was able to recognize both cards and they > worked fine - thus, at a minimum the cards and drivers would appear to be > good. > > If anyone has any suggestions on resolving this situation, I would > appreciate the help. I've had similar problems myself in a similar configuration. Try rearranging the cards on your motherboard. This helped me a couple of times in similar situations. You might try 2.0.35, too. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From STU2997@atuvm.atu.edu Fri, 2 Oct 1998 19:20:13 -0400 Date: Fri, 2 Oct 1998 19:20:13 -0400 From: HORTON, DEREK K STU2997@atuvm.atu.edu Subject: Beowulf howto ??? Hello In the Beowulf howto it makes references to the add_node script and the setup_template script I cant find these and was wondering if some one could point me in the right direction to look for them or send them to me? Thanks for the help Derek Horton stu2997@atuvm.atu.edu From rgb@phy.duke.edu Fri, 2 Oct 1998 19:33:25 -0400 Date: Fri, 2 Oct 1998 19:33:25 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Swap Size On Fri, 2 Oct 1998, Kragen wrote: > On Fri, 2 Oct 1998, Greg Lindahl wrote: > > Unfortunately, it is difficult to arrange for the correct program to > > be killed. Often you will instead lose some other process like inetd > > or portmapper or amd. So it is better to try to set limits on per- > > process memory usage and let that kill the overly-large task. This > > won't reflect the usage of other tasks, unfortunately, but at least it > > won't cause your system to become unusuable. > > This is not a sufficient solution. Per-process memory-usage limits > would not have prevented the problem I cited earlier, wherein xv and > GhostScript were causing the system to thrash. Yeah, but there is no sufficient solution except getting more memory. I'm with Greg -- I think it is a bit foolhardy to run with no swap at all or rather run without VM at least half again as large as any amount of memory all your tasks are likely to occupy. The point is that ANYTHING that happens when your system runs out of VM is "bad". A not unlikely outcome is that it will crash and you will lose time, work, money. Memory is, really, pretty unbelievably cheap. The only solution I could ever recommend is to add enough memory to hold everything you expect to run with a factor of two tolerance (you can reduce the margin to 10% on a big memory beowulf node). Install 0.5-2 x RM as swap, with a sort of minimum of 32 MB (just enough to enable the system to move long-idle pages to swap to improve cache/buffer performance). If you find yourself in Kragen's situation, where you start to push the limits of your existing RM and have negative experiences with swap (more often, swapping just means a half-beat delay as you try to launch a process, not a multisecond long delay while the system swaps out a huge contiguous image) then get more memory or learn what cannot be safely run together. Its sort of: Live in the house you've got or buy a bigger house, but don't put bars on the windows and invite in a crowd -- there might be a fire. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From johns@cacr.caltech.edu Fri, 2 Oct 1998 19:46:30 -0400 Date: Fri, 2 Oct 1998 19:46:30 -0400 From: John Salmon johns@cacr.caltech.edu Subject: [PATCH] Re: sqrt 10x slower in 2.0.7-19 than 2.0.6-9 ??? To whom it may concern, Here is a patch to glibc-2.0.7/sysdeps/alpha/fpu/e_sqrt.c that recovers most of the performance lost when the 'fast' versions were switched off due to erroneous values at or near DBL_MIN. Accuracy is NOT compromised. Also included is a 'torture' test that verifies results by testing the value returned by sqrt for several million values throughout the entire range of double precision values. Of course, this test is not exhaustive. Proving the correctness of floating point functions is notoriously hard, but I believe this patch is reliable. I understand that there are 'fast libraries' out there, but I believe that a tiny patch that can recover a factor of five in performance is worth making part of glibc even if somebody else has a faster library. The underlying cause of the problem is that the alpha does not really fully support the IEEE-754 standard in hardware. In particular, the 'gradual underflow' provisions of the standard are meant to be implemented by software which does not appear to be part of glibc yet. This means that arithmetic on values at or near DBL_MIN can be counter-intuitive. In particular, the expression in e_sqrt.c: y = y*(one_and_a_half - half*x*y*y); first computes half*x, which results in 0.0 for x in the problem-range! From then on, the calculation is garbage. The problem is easy to repair. Simply reorder the multiplications. The product x*y is properly normalized (it's nearly 1./sqrt(x)), so rewrite the expression as: y = y*((one_and_a_half - two_to_minus_30) - half*y*(x*y)); After fixing the above, I encountered an additional problem with a floating point exception (underflow?) being thrown from the evaluation of: high = high - x; for very small values of x. Another simple reorganization seems to fix it (by comparing high with x, rather than high-x with 0). I did not repair the assembly version of the code. I added additional remarks to the descriptive comment instead. Tests: 1 - sqrttorture.c (see below) tests millions of logarithmically spaced arguments to sqrt(). The worst case appears to be 1ulb. The mean error appears to be 0, and the mean absolute error appears to be about 1/3 ulb when I run this test. These are exactly what one expects. For example: [johns@paranal glibc]$ sqrttorture 64 evaluated 185884154 sqrts in range 2.22507e-308 to 1.79769e+308 worst ratio: 1.00 ulb for x=1.1162e-103 mean error: -6.47361e-05 ulb mean absolute error: 0.34 ulb [johns@paranal glibc]$ sqrttorture 31.415926535 evaluated 378679352 sqrts in range 2.22507e-308 to 1.79769e+308 worst ratio: 1.00 ulb for x=1.45519e-11 mean error: 8.52193e-06 ulb mean absolute error: 0.34 ulb 2 - sqrtloop.c calls sqrt in a timing loop. The performance is about 5 times the original 2.0.7 version. It's about half of the buggy 2.0.6 version: [johns@paranal glibc]$ ./sqrtloop 1000000 sqrts in 0.4392 sec, 2.27687e+06 per sec ----------------------------patch-------------------- [johns@paranal BUILD]$ diff -u glibc-2.0.7/sysdeps/alpha/fpu/e_sqrt.c.orig glibc-2.0.7/sysdeps/alpha/fpu/e_sqrt.c --- glibc-2.0.7/sysdeps/alpha/fpu/e_sqrt.c.orig Fri Oct 2 11:48:57 1998 +++ glibc-2.0.7/sysdeps/alpha/fpu/e_sqrt.c Fri Oct 2 15:25:16 1998 @@ -22,9 +22,7 @@ * We have three versions, depending on how exact we need the results. */ -/* Alternative versions are disabled because they currently don't work - properly with and near DBL_MIN. */ -#if 1 || defined(_IEEE_FP) && defined(_IEEE_FP_INEXACT) +#if defined(_IEEE_FP) && defined(_IEEE_FP_INEXACT) /* Most demanding: go to the original source. */ #include @@ -56,7 +54,9 @@ 0x1527f,0x1334a,0x11051,0xe951, 0xbe01, 0x8e0d, 0x5924, 0x1edd } }; -#ifdef _IEEE_FP +/* Alternative versions are disabled because they currently don't work + properly with and near DBL_MIN. */ +#if 1 || defined(_IEEE_FP) /* * This version is much faster than the standard one included above, * but it doesn't maintain the inexact flag. @@ -110,10 +110,10 @@ y = initial_guess(x, k >> 32, ptr); half = Double(__half); one_and_a_half = Double(__one_and_a_half); - y = y*(one_and_a_half - half*x*y*y); + y = y*(one_and_a_half - half*y*(x*y)); dn = Double(__dn); two_to_minus_30 = Double(__two_to_minus_30); - y = y*((one_and_a_half - two_to_minus_30) - half*x*y*y); + y = y*((one_and_a_half - two_to_minus_30) - half*y*(x*y)); up = Double(__up); z = x*y; one = Double(__one); @@ -122,6 +122,9 @@ choppedmul(z,dn,zp); choppedmul(z,up,zn); +#if 0 + /* This looks ok, but with x near DBL_MIN, the computation + of high-x seems to underflow and generate a SIGFPE. */ choppedmul(z,zp,low); low = low - x; choppedmul(z,zn,high); @@ -134,6 +137,12 @@ __asm__("fcmovlt %2,%3,%0" :"=f" (z) :"0" (z), "f" (high), "f" (zn)); +#else + choppedmul(z,zp,low); + choppedmul(z,zn,high); + if( low >= x ) z = zp; + if( high < x ) z = zn; +#endif return z; /* Argh! gcc jumps to end here */ special: @@ -159,7 +168,13 @@ #else /* * This version is much faster than generic sqrt implementation, but - * it doesn't handle exceptional values or the inexact flag. + * it doesn't handle exceptional values or the inexact flag, or + * values in the range [DBL_MIN, 2.0*DBL_MIN). The last problem is + * due to the alpha not performing gradual underflow as specified by IEEE, + * so when 0.5*x is computed and stored in %f11, the result is 0, which + * sends the rest of the calculation completely awry! This + * could be repaired in several ways, but what would be the point unless + * the other problems are resolved too. */ asm ("\ [johns@paranal BUILD]$ ------------------------------- ---------------sqrttorture.c------------- /* Usage: sqrttorture fpvalue ratio of successive tests will be 1.0 + fpvalue*FLT_EPSILON fpvalue defaults to 64.0, which takes a couple of minutes on a 21164 */ #include #include #include #include #include #include #define FACTOR (1.0 + 64.*(FLT_EPSILON)) #define XMIN (DBL_MIN) #define XMAX (DBL_MAX) /* DBL_MAX */ int main(int argc, char **argv){ double x, resid, sqrtx; double factor, xlast; double ratio, worstratio, worstx; /* Assume longs are 8 bytes. */ unsigned long int ix; unsigned long int ir; double sum, sumabs; int nsamples; if( argc > 1 ){ factor = 1.0 + (atof(argv[1]) * FLT_EPSILON); }else{ factor = FACTOR; } worstratio = -1.; sumabs = sum = 0.; nsamples = 0; xlast = XMAX/(factor*1.001); for(x = XMIN; x worstratio ){ worstratio = fabs(ratio); worstx = x; } } /* Finish off by checking XMAX, just for fun. */ x = XMAX; sqrtx = sqrt(x); resid = x - sqrtx*sqrtx; ratio = (resid/x)/DBL_EPSILON; sum += ratio; sumabs += fabs(ratio); nsamples++; if( fabs(ratio) > worstratio ){ worstratio = fabs(ratio); worstx = x; } fprintf(stdout, "evaluated %d sqrts in range %g to %g\n", nsamples, XMIN, XMAX); fprintf(stdout, "worst ratio: %.2f ulb for x=%g\n", worstratio, worstx); fprintf(stdout, "mean error: %g ulb\n", sum/nsamples); fprintf(stdout, "mean absolute error: %.2f ulb\n", sumabs/nsamples); return 0; } ----------------------------- ---------------sqrtloop.c-------------- #include #include #include #include #define NTIMES 1000000 int main(int argc, char **argv){ int i = NTIMES; double x = 0.7; clock_t tstart, tend; double seconds; tstart = clock(); while(--i){ x = sqrt(x) + 0.5; } tend = clock(); seconds = ((double)(tend-tstart))/CLOCKS_PER_SEC; printf("%d sqrts in %g sec, %g per sec\n", NTIMES, seconds, NTIMES/seconds); return 0; } From richieb@netlabs.net Fri, 2 Oct 1998 21:56:51 -0400 Date: Fri, 2 Oct 1998 21:56:51 -0400 From: Richie Bielak richieb@netlabs.net Subject: extreme linux cd installation crash Steve Hill wrote: [...] > The install consistantly crashes in the package installation (usually > after about 20Mb of Installation, about in grep or fvwm (i'm not > installing emacs!). I get a signal(7) message followed by a shutdown. If I > reboot and try again without the hard drive formatting the system > completely fails to find the RPM's. > > So, my questions are: > 1. Has anyone else experienced this sort of problem? > 2. Is it my hardware (stripping out boards seems to make no difference!)? > 3. Any ideas on a fix? I installed Extreme Linux on two small P120 machines with 16Megs each. The first one installed fine, the second install would randomly crash at various points. I replaced the memory and now the system is fine. Perhaps that's your problem too? ...richie -- "It is a good day to code." http://www.netlabs.net/~richieb From rahul@reno.cis.upenn.edu Sat, 3 Oct 1998 00:54:36 -0400 Date: Sat, 3 Oct 1998 00:54:36 -0400 From: Rahul Dave rahul@reno.cis.upenn.edu Subject: How to get rid of FIN_WAIT1 sockets Hi, As I'd reported somne time back, I've been having problems with these MPICH socket timeouts, with sockets going into FIN_WAIT1's and never coming out. Short of rebooting, is there a command I can use to kill them? I'm not even able to experiment with them piling up. I tried ifdowning and then ifup'ing them(ifconfig down followed by ifconfig up) but it does nothing to clear the sockets. Thanks, Rahul From deadline@plogic.com Sat, 3 Oct 1998 07:50:15 -0400 Date: Sat, 3 Oct 1998 07:50:15 -0400 From: Douglas Eadline deadline@plogic.com Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) On Fri, 2 Oct 1998, Price Hall wrote: > > I've been thinking about this myself. I'm thinking we may see the > emergence of computing 'cooperatives', similar to electrical utilities > that helped with the rural electrification of America. A computing > cooperative could fund a large cluster, manage its operation, and oversee > the membership accounting, etc. Of course, being on the net would mean > the cooperative would not be restricted by geography (except where law, as > you mentioned, intrudes). A cooperative could also offer discounted or > free time to certain classes of members - students, etc. As > these clusters approach a level of standard setups, modular construction, > and the software is enhanced, I think 'public utility' supercomputers are > inevitable. > In my opinion, this would not work. It makes more sense when the computers were very expensive (this is why we had SC centers) I think the appeal of a Beowulf is the ownership - it is yours you get to use when you want and how you want. Now that the cost is so low for the hardware, money spent on shared resource could go toward your own system. To really make any money, you would need to attract corporate users. Most of these guys are VERY security conscious. If they could buy their own system and not let code and data leave the fort.... Here is an idea - what about building "stone soup" co-operative where donors get time on the system othere may need to pay a nominal membership fee. You would not have the latest and greatest system, but it might be a great test bed for ideas - testing ideas on large numbers of CPUs (they do not always need to be fast) - lets say 500-1000 would be attractive. Besides there are a lot of old computers out there. Get some corporate sponsors (to pay for some good networking, electric,space, and your time) Just some thoughts. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From neil@causality.com Sat, 3 Oct 1998 14:07:50 -0400 Date: Sat, 3 Oct 1998 14:07:50 -0400 From: Neil A. Carson neil@causality.com Subject: Beowulf in a Box (fwd) Jukka E Isosaari wrote: > Forgive my ignorance, and e-mailing both lists, but has anyone > talked to Intel people about Beowulf projects? It would seem to > me that they should have an interest in sponsoring this kind of > super-computer development, given that most of them would probably > use Intel processors. I plan to discuss this with s representative soon. Neil -- Neil A. Carson From Loic.Prylli@ens-lyon.fr Sat, 3 Oct 1998 16:55:25 -0400 Date: Sat, 3 Oct 1998 16:55:25 -0400 From: Loic Prylli Loic.Prylli@ens-lyon.fr Subject: linux SMP 2.0 -- interrupt reentrancy problems --foelzrdq+r Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit [message cross-posted to the beowulf list (as well as linux-smp), people might be interested there ] Hello, It seems some people have had problems with reentrant interrupts for some time, this is more often seen when using parallel applications on clusters with fast networks: - without some workarounds into the drivers to tolerate reentrant interrupt, this can kill the whole machine with something like "IRQ DEADLOCK DETECTED...", or sometimes various oops like "double unlock on device queue...", I saw the problem with at least with 3 different network cards/drivers (tulip, 3c59x, myrinet drivers) on different clusters. - with the workaround in the driver, there may be interrupts that are missed, potentially leading to various timeouts errors. I may have found a possible cause, in the Linux interrupt code in asm-i386/irq.h, correct me if I am wrong: The code assumes a processor acquires the global kernel lock before manipulating the cache_A1 and the cache_21 variables and changing the interrupt mask register. But this is not respected for the MSGIRQ handler associated with IPI interrupts, there seems to be a race condition where the end of the IPI handler can be executed concurrently with a normal interrupt leading to wrong setting for the interrupt mask registers, and cache_A1. The problem will occur more often with fast networks because there are more frequent interrupts, but I think any SMP configuration could potentially be affected, even with no network. If you had problems before, you can try the patch below, hope it does not break anything, there is two parts: - in arch/i386/irq.c we try to detect if we are in an incoherent state, for instance without the other change I get this kind of output: kernel: IRQ 15 (proc 0):cache_x1=0x72,INT mask=0xd2 kernel: IRQ 15 (proc 0):cache_x1=0x52,INT mask=0x52 kernel: IRQ 15 (proc 0):cache_x1=0x52,INT mask=0xf2 As you can see, we can be processing interrupt 15 without the corresponding bit set in cache_A1. Sometimes the interrupt mask register bit is set, sometimes not. Moreover there are strange things occuring with the bits associated with the IPI interrupt. - in include/asm-i386/irq.h, I changed the IPI interrupt code, so that it does not touch the interrupt mask register nor cache_A1. This patch works well for me, but as I have hardly any knowledge about the IOAPIC used for IPI, I would appreciate comments from someone more knowledgeable. Loic --foelzrdq+r Content-Type: text/plain Content-Disposition: inline; filename="smp-irq.patch" Content-Transfer-Encoding: 7bit --- linux/arch/i386/kernel/irq.c.std Sat Oct 3 20:22:39 1998 +++ linux/arch/i386/kernel/irq.c Sat Oct 3 20:36:15 1998 @@ -345,7 +345,25 @@ { struct irqaction * action = *(irq + irq_action); int do_random = 0; - + int c,intm,mask; + static int count; + if (smp_processor_id() != 0 && count++ < 1000) + printk("IRQ %d: done by CPU %d\n",irq,smp_processor_id()); + if (irq >= 8) { + c = cache_A1; + intm = inb(0xA1); + mask = 1 << (irq - 8); + } else { + c = cache_21; + intm = inb(0x21); + mask = 1 << irq; + } + if (!(c & mask) || !(intm & mask)) { + printk("IRQ %d (proc %d):cache_x1=0x%x,INT mask=0x%x\n", irq, smp_processor_id(),c,intm); + /* better to return because the interrupt may be asserted again, + the bad thing is that we may loose some interrupts */ + return; + } #ifdef __SMP__ if(smp_threads_ready && active_kernel_processor!=smp_processor_id()) panic("IRQ %d: active processor set wrongly(%d not %d).\n", irq, active_kernel_processor, smp_processor_id()); --- linux/include/asm-i386/irq.h.std Sat Oct 3 20:22:59 1998 +++ linux/include/asm-i386/irq.h Sat Oct 3 22:35:29 1998 @@ -108,6 +108,17 @@ "1:\tjmp 1f\n" \ "1:\toutb %al,$0x20\n\t" +/* do not modify the ISR nor the cache_A1 variable */ +#define MSGACK_SECOND(mask,nr) \ + "inb $0xA1,%al\n\t" \ + "jmp 1f\n" \ + "1:\tjmp 1f\n" \ + "1:\tmovb $0x20,%al\n\t" \ + "outb %al,$0xA0\n\t" \ + "jmp 1f\n" \ + "1:\tjmp 1f\n" \ + "1:\toutb %al,$0x20\n\t" + #define UNBLK_FIRST(mask) \ "inb $0x21,%al\n\t" \ "jmp 1f\n" \ @@ -302,34 +313,14 @@ __asm__( \ "\n"__ALIGN_STR"\n" \ SYMBOL_NAME_STR(IRQ) #nr "_interrupt:\n\t" \ - "pushl $-"#nr"-2\n\t" \ - SAVE_ALL \ - ENTER_KERNEL \ - ACK_##chip(mask,(nr&7)) \ - "incl "SYMBOL_NAME_STR(intr_count)"\n\t"\ - "sti\n\t" \ - "movl %esp,%ebx\n\t" \ - "pushl %ebx\n\t" \ - "pushl $" #nr "\n\t" \ - "call "SYMBOL_NAME_STR(do_IRQ)"\n\t" \ - "addl $8,%esp\n\t" \ - "cli\n\t" \ - UNBLK_##chip(mask) \ - GET_PROCESSOR_ID \ - "btrl $" STR(SMP_FROM_INT) ","SYMBOL_NAME_STR(smp_proc_in_lock)"(,%eax,4)\n\t" \ - "decl "SYMBOL_NAME_STR(intr_count)"\n\t" \ - "incl "SYMBOL_NAME_STR(syscall_count)"\n\t" \ - "jmp ret_from_sys_call\n" \ -"\n"__ALIGN_STR"\n" \ SYMBOL_NAME_STR(fast_IRQ) #nr "_interrupt:\n\t" \ SAVE_MOST \ - ACK_##chip(mask,(nr&7)) \ + MSGACK_##chip(mask,(nr&7)) \ SMP_PROF_IPI_CNT \ "pushl $" #nr "\n\t" \ "call "SYMBOL_NAME_STR(do_fast_IRQ)"\n\t" \ "addl $4,%esp\n\t" \ "cli\n\t" \ - UNBLK_##chip(mask) \ RESTORE_MOST \ "\n"__ALIGN_STR"\n" \ SYMBOL_NAME_STR(bad_IRQ) #nr "_interrupt:\n\t" \ --foelzrdq+r Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit --foelzrdq+r-- From STU2997@atuvm.atu.edu Sat, 3 Oct 1998 18:21:23 -0400 Date: Sat, 3 Oct 1998 18:21:23 -0400 From: HORTON, DEREK K STU2997@atuvm.atu.edu Subject: netboot help? Hello, I have downloaded the netboot rpm and installed onto my machine. In the documentation it is talking about dos exe and com files what are these needed for? The installation copied some binary files on to my computer in the dir /usr/lib/netboot/binaries these are floppy.bin floppy86.bin etc... what do these do? and how do I execute them? Also in the client part of the install it says to issue the command make bootrom what direc tory do I need to be in for this to work? I want to create a couple of boot disks for my beowulf cluster, so how do i tell them what their ip a ddreses and names are? Thanks for the help Derek Horton stu2997@atuvm.atu.edu From beckman@acl.lanl.gov Sun, 4 Oct 1998 03:30:59 -0400 Date: Sun, 4 Oct 1998 03:30:59 -0400 From: Pete Beckman beckman@acl.lanl.gov Subject: CFP: 2nd Workshop on Runtime Systems for Parallel Programming (RTSPP) CALL FOR PAPERS 3rd Workshop on Runtime Systems for Parallel Programming (RTSPP) San Juan, Puerto Rico, April 12, 1999 to be held in conjunction with the 13th International Parallel Processing Symposium ( IPPS/SPDP 1999 ) Motivation Runtime systems are critical to the implementation of parallel programming languages and libraries. They provide the core functionality of a particular programming model and the glue between the model and the underlying hardware and operating system. As such, runtime systems have a large impact on the performance and portability of parallel programming systems. Despite the importance of runtime systems, there are few forums in which practitioners can exchange their ideas, and these are typically forums showcasing peripheral areas, such as languages, operating systems, and parallel computing. RTSPP provides a forum for bringing together runtime system designers from various backgrounds to discuss the state-of-the-art in designing and implementing runtime systems for parallel programming. This one-day workshop includes technical sessions of refereed papers and panel discussions. The first two workshops (RTSPP'97 and RTSPP'98) were very successful and generated a lot of interest. Both the reviewed papers and invited talks were well received. Scope The focus of the workshop is on the design and implementation of runtime systems for parallel programming languages and libraries. While the problem domain is restricted to parallel programming, papers that deal with fundamental issues in runtime systems that are applicable to parallel programming systems are also encouraged. The topics for the workshop include, but are not limited to, the following: - Techniques to reduce the tension between portability and efficiency in runtime systems - Relationship between runtime systems and programming models - Interfaces between compiler-generated code and runtime systems - Operating system support for runtime systems - Performance evaluation of runtime systems - The design and implementation of runtime systems for [PC] clusters and high-speed networks - The design and implementation of thread systems - Runtime systems for distributed shared memory - Extensibility and adaptability in runtime systems Authors are invited to submit manuscripts that demonstrate original and unpublished research in the area of runtime systems that support parallel programming. Accepted papers will be published by the IPPS/SPDP organization, both on paper and CD-ROM. One of the authors will be required to attend the workshop and present the work. RTSPP focuses on low-level issues in implementing parallel programming systems. Another workshop, HIPS, also in conjunction with IPPS/SPDP 1999, is directed towards higher-level language issues. Submission Instructions All submissions must be completed papers whose length --- including abstract, figures, bibliographies, and appendices --- does not exceed 3 US letter (8.5" x 11"; no A4 please) pages; papers longer than 3 pages will be summarily rejected. Text must be set in no smaller than a 10 pt font, be single-spaced and single-columned, and include page numbers. Each paper will be reviewed by the program committee. It is expected that Springer-Verlag will publish the accepted papers (which will be given additional pages) in their Lecture Notes series. Submit an electronic version of your complete manuscript in uuencoded, gzipped, postscript format to: rtspp@cs.ucdavis.edu. Please do not send your paper as an attachment (e.g., MIME). Please use standard postscript fonts (Times, Helvetica, Courier) and ensure that the postscript file is viewable using the "ghostview" tool. The corresponding author is requested to include at the start of the message containing the paper: 1. complete postal address 2. e-mail address 3. phone number 4. fax number 5.a list of keywords Receipt of submissions will be promptly acknowledged by e-mail. Late submissions will not be accepted; sorry, no exceptions. In cases where electronic submission is not possible, please contact the program chair (see below). Important Dates October 30, 1998: Papers due. December 7, 1998: Notification sent to authors. January 8, 1999: Final copies due for publication in the proceedings. Program Committee Henri Bal, Vrije Universiteit, The Netherlands Pete Beckman, Los Alamos National Laboratory, USA Greg Benson University of San Francisco, USA Matthew Haines University of Wyoming, USA Laxmikant V. Kale, University of Illinois at Urbana Champaign, USA Koen Langendoen, Delft University of Technology, The Netherlands David Lowenthal, University of Georgia, USA Frank Mueller, Humboldt-Universitaet zu Berlin, Germany Ron Olsson, University of California, Davis, USA Raju Pandey, University of California, Davis, USA Alan Sussman, University of Maryland, USA Registration This workshop is being held as part of IPPS/SPDP 1999. All workshop attendees are expected to pay for the IPPS/SPDP 1999 registration, which includes access to all sessions and workshops, coffee break and refreshments, as well as a number of lunches and receptions. Information about IPPS/SPDP 1999 can be obtained over the Web at the following URL: http://www.ippsxx.org/ipps99/ . Workshop Organizers Laxmikant V. Kale (kale@cs.uiuc.edu) -- General Chair Department of Computer Science University of Illinois 1304 W. Springfield Urbana, IL 61801, USA Phone: 217-244-0094 Fax: 217-333-3501 Pete Beckman (beckman@acl.lanl.gov) -- Co Chair Los Alamos National Laboratory CIC/ACL, MS B287 Los Alamos, NM 87545, USA Phone: 505-665-0800 Fax: 505-665-4939 Ron Olsson (olsson@cs.ucdavis.edu) -- Program Chair Department of Computer Science University of California, Davis One Shields Avenue Davis, CA 95616-8562, USA Phone: 530-752-7004 Fax: 530-752-4767 [ This call for papers can be found at http://elysium.cs.ucdavis.edu/~olsson/rtspp/1999/ ] --- ======================================================================== | Peter H. Beckman | Advanced Computing Laboratory | | Los Alamos National Laboratory | Phone: 505-665-0800 | | CIC/ACL MS-B287 | Fax: 505-665-4939 | | Los Alamos, NM 87545 | email: beckman@acl.lanl.gov | ======================================================================== From jacek@usq.edu.au Sun, 4 Oct 1998 18:35:01 -0400 Date: Sun, 4 Oct 1998 18:35:01 -0400 From: Jacek Radajewski jacek@usq.edu.au Subject: beobase RPM Hi, What is the story with the beobase rpm? Does any one use it? Is there a plan for any further development? I think that some of the concepts introduced in the package are really good. Could the author(s) please contact me? Cheers Jacek From enano@ceu.fi.udc.es Mon, 5 Oct 1998 02:53:22 -0400 Date: Mon, 5 Oct 1998 02:53:22 -0400 From: Miguel Barreiro Paz enano@ceu.fi.udc.es Subject: Swap Size > > I thought someone (Miguel Barreiro Paz?) told me he'd actually measured > 30 MB/s with Linux. Maybe I'm misremembering. Certainly not me. This is the message where I sent some results a couple of weeks ago: K>> If I remember correctly (and I may not), the current top TCP speed on K>> Linux is only a little over 100Mbps (over bonded dual Ethernet K>> channels), much less than PCI's top speed, which is 800Mbps (according K>> to -- soon it'll be MBP> MBP> It's higher than that. With two tulips, I consistently get MBP> about 17MBtes/sec between the cheap K6's using two TCP streams, one MBP> over each card. I'm probably limited by the CPU speed; ttcp reports MBP> 75-80% CPU used, and the time spent by the kernel in the bottom MBP> handlers is not accounted, IIRC. 30 MB/s would be simply impossible with two fast ethernet channels, indeed. It“s faster than wire speed. BTW, I“d like to know for sure what time remains unaccounted in a network transmission, anyone? Regards, Miguel From cd_rasmussen@yahoo.com Mon, 5 Oct 1998 11:38:12 -0400 Date: Mon, 5 Oct 1998 11:38:12 -0400 From: CD Rasmussen cd_rasmussen@yahoo.com Subject: beobase RPM I am interested in responses to this, please port all responses. Thanks, Costa Jacek Radajewski wrote: > > Hi, > > What is the story with the beobase rpm? Does any one use it? Is there a > plan for any further development? I think that some of the concepts > introduced in the package are really good. Could the author(s) please > contact me? > > Cheers > > Jacek > _________________________________________________________ DO YOU YAHOO!? Get your free @yahoo.com address at http://mail.yahoo.com From hatridge@straubing.baynet.de Mon, 5 Oct 1998 14:59:54 -0400 Date: Mon, 5 Oct 1998 14:59:54 -0400 From: Jim Hatridge hatridge@straubing.baynet.de Subject: Extreme Linux Hi All; Can anyone give me an URL to download parts of Extreme Linux? I tried the Extreme homepage but it seemed to have only parts of it (the wrong parts). Also I tried RH's ftp site and could not get on it was too busy. :( Mainly what I want to d/l is the manual to study. Thanks J I M ----------------------------------------- Jim Hatridge Germany hatridge@straubing.baynet.de M$ -- Ghostdriver* on the road to the future! (*German Slang for the guy driving on the wrong side of the road!) From rgb@phy.duke.edu Mon, 5 Oct 1998 17:13:32 -0400 Date: Mon, 5 Oct 1998 17:13:32 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Extreme Linux On Mon, 5 Oct 1998, Jim Hatridge wrote: > Hi All; > > Can anyone give me an URL to download parts of Extreme Linux? I tried the There is a link on the beowulf site: www.beowulf.org, right up there near the top. I keep meaning to set a mirror up on my beowulf mirror site but haven't yet remembered at a time that I can do the work. This link always seems to work, though. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From caskey@technocage.com Mon, 5 Oct 1998 23:34:23 -0400 Date: Mon, 5 Oct 1998 23:34:23 -0400 From: Caskey L. Dickson caskey@technocage.com Subject: How to get rid of FIN_WAIT1 sockets On Sat, 3 Oct 1998, Rahul Dave wrote: > As I'd reported somne time back, I've been having problems with > these MPICH socket timeouts, with sockets going into FIN_WAIT1's and > never coming out. A socket enters the FIN_WAIT_1 state when one side of a connection calls close() on an open socket (causing a FIN to be transmitted to the other end). It stays in this state whilst waiting for the other end to respond with an ACK to the FIN that was transmitted to it. The remote (should) automatically send the ACK, causing the client to enter the FIN_WAIT_2 state (This is done by the kernel). It remains in this state until the remote sends LAST_ACK. This happens when the other side calls close() on it's end of the socket. At that point it will enter the TIME_WAIT state where it will stay for the 2MSL timeout (30, 60 or 180 seconds typically, linux == 60). >From my interpretation, one end of your application is not responding to the socket close requests that the client is making thus suspeding the socket connection in the FIN_WAIT_1 state until the connection times out. I can't imagine how this could be happening as a crash on the remote end would cause the socket to be closed by the OS and the ACK to get it from FIN_WAIT_1 to FIN_WAIT_2 (should) be send by the OS. > Short of rebooting, is there a command I can use to kill them? Alas, no. At least, not without forging some packets. You may, however be able to solve it another way. > I'm not even able to experiment with them piling up. I tried ifdowning > and then ifup'ing them(ifconfig down followed by ifconfig up) but it > does nothing to clear the sockets. Unfortunatley the ifdown/up of the interface should (correctly) have no effect on the connection. On solution I recommend ( as it may allow you to re-use socket numbers while waiting for the timeout ) is to enable the SO_REUSEADDR option. My quick skimming of STEVENS 1994, section 18.6 does not directly address FIN_WAIT_1 this, however it may help. The more typical problem I have seen is a bunch of sockets in the FIN_WAIT_2 or TIME_WAIT states. Here's the magic snip of code from our socket class. ... int* value = new int(1); setsockopt( Socket, SOL_SOCKET, SO_REUSEADDR, (int*) value, sizeof( value ) ); ... Another solution would be to go straight to the source (pun intended) and change the various TCP timeouts in include/net/tcp.h. My copy of the source delcares three interesting timers. I didn't read tcp.c carefully to see their effects, but you have TCP_TIMEOUT_LEN which is 15 minutes, TCP_TIMEWAIT_LEN (60s), which is the time to spend waiting for a socket to close(!), and TCP_FIN_TIMEOUT which is 3 minutes. I don't know which one would apply to a socket in FIN_WAIT_1. In a beowulf, however, I can't imagine measuring timeouts in minutes as being remotely applicable. C=) P.S. setsockopt(2) & _TCP/IP_Illustrated_Volume_1_ section 18.6 should prove useful. -------------------------------------------------------------------------- There is hardly a thing in the world that some man can not make a little worse and sell a little cheaper. -------------------------------------------------------------------------- Caskey /// pager.818.698.2306 TechnoCage Inc. ///| gpg: 1024D/7BBB1485 -------------------------------------------------------------------------- I didn't fight my way to the top of the food chain to be a vegetarian. From stevehi@soc.plym.ac.uk Tue, 6 Oct 1998 04:26:30 -0400 Date: Tue, 6 Oct 1998 04:26:30 -0400 From: Steve Hill stevehi@soc.plym.ac.uk Subject: extreme linux installation problem - fix FYI here is how I got round the extreme linux cd installation problem I had. I had been partitioning drives with disk druid, and had left some space on the hdb drive for an NT partition, although I hadn't turned this into a partition yet. If I formatted any partitions, the installation crashed somewhere in the package install, leaving a signal 7 error, and an invitation to reboot. If I didn't format any partitions, the install failed to find the RPM's. So I tried using fdisk (the old favourite) instead, because disk druid was leaving me with a primary partion on hdb (hdb1), an extended partition (hdb2), which had one partion in it (hdb5). Using fdisk I repartioned it to 3 primary partions (hdb1, hdb2, hdb3), hdb3 being a WIN95 partion. And hey preto, the problem vanished. Now, I don't know exactly where the problem lies, but I suspect that either it is somewhere in the disk druid script in the install, or that the install script doesn't like unallocated areas on drives. I had kind of got to the end of my patience, and once I had something working, I wasn't really prepared to start an install again to nail the problem ;) Hope this may help anyone who has similar problems. Thanks to everyone who offered suggestions Steve ============================================================================== It has thus become increasingly apparent that physical `reality', no less than social `reality', is at bottom a social and linguistic construct; that scientific `knowledge', far from being objective, reflects and encodes the dominant ideologies and power relations of the culture that produced it Alan Sokal From ymorin@enib.fr Tue, 6 Oct 1998 06:20:35 -0400 Date: Tue, 6 Oct 1998 06:20:35 -0400 From: Yann E. MORIN ymorin@enib.fr Subject: Extreme Linux Hi! Here what I found: a ftp site at beowulf.gsfc.nasa.gov/extreme_linux it's a mirror of the extreme linux CD from RedHat (which I'm currently running on 4 PCs). What I wonder about is why you want to download it. It cost my school only 150FF (around $25) to buy the CD... I don't thimk it's worth being downloaded... Regards, Yann. PS. I'm in charge of that small beowulf at scholl, but I'm a newbie... Soon, we'll get a TRUE machine with 64 Mother Boards (PPC) and a ATM switch on which we'll either use RT Linux or Extreme Linux... My plans are for EL, but teachers are for RTL... Anyone to help make our choice??? -- Yann E. MORN --------------------------------------- Ecole Nationale d'Ingenieurs de Brest Laboratoire d'Informatique Industrielle Technopole Brest Iroise CP 15 - 29608 Brest Cedex - FRANCE --------------------------------------- tel : (+33) 298 056 600 (Ext 6318) fax : (+33) 298 056 629 email : bureau : ymorin@enib.fr perso : ymorin@france-mail.fr yann.morin@hol.fr --------------------------------------- From hjstein@bfr.co.il Tue, 6 Oct 1998 07:18:02 -0400 Date: Tue, 6 Oct 1998 07:18:02 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: PVM robustness problems. We're using PVM on a switched 100mbps ethernet with 60 alpha boxes running Linux (RH 4.2 based). We've run into two robustness issues that maybe someone on the list has some solutions for. We have a master/slave setup - 1 master spawns 60 slaves, 1 per machine in the virtual machine. The master calls pvm_notify to be notified if any of the slaves dies. If they do it prints a message & exits. Each slave calls pvm_notify to be notified if the master dies. If it does it exits. Thus, if anything dies then everything should exit. We have another mechanism for restarting work when this happens. The master is a server - it's given requests (via a TCP/IP connection), farms them out across the slaves & returns the final result. Communication overhead seems to be small compared to the work that the slaves do for each part of a request. Problem 1 - Long timeouts/slow work. Sometimes a slave will die & it seems like it takes a long time for the set of tasks to shut down - as long as an hour. Also, sometimes a given request will take much longer than expected. Has anyone else seen such problems? Does anyone have any ideas/know of any mechanisms for tracking down such problems? Problem 2 - Master pvmd death. With PVM, if the master pvmd dies then the entire virtual machine shuts down. We'd like to be able to recover from such an event. We've been thinking of implementing something which runs on another machine on the cluster & if it sees pvm go down it'd start pvm & the tasks again. Has anyone already addressed this problem? Does anyone have any mechanisms for dealing with master pvmd deaths? Or should we just go about implementing our own home grown solution? Thanks, Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From deadline@plogic.com Tue, 6 Oct 1998 08:09:34 -0400 Date: Tue, 6 Oct 1998 08:09:34 -0400 From: Douglas Eadline deadline@plogic.com Subject: linux SMP 2.0 -- interrupt reentrancy problems On Sat, 3 Oct 1998, Loic Prylli wrote: > > [message cross-posted to the beowulf list (as well as linux-smp), > people might be interested there ] > > Hello, > > > It seems some people have had problems with reentrant interrupts for > some time, this is more often seen when using parallel applications on clusters > with fast networks: I have been having problems on a Supermicro P6DBE (testing other boards today). The problems seem related to this issue. I tried your patch, but it made no difference :( My problem, described previously results in lock-ups, wrong answers, or stalled MPI communication, when doing lots of communication on a cluster. The problem is absent when I have a single copy of the program running on a single SMP node, but as soon as there are more than one copy running and communicating to other nodes the instability starts. This seems to occur in both 2.0.X kernels and 2.1.X kernels. With both LAM-MPI, MPICH and with fast Ethernet and Myrinet. Which leads me to the conclusion that linux SMP is not stable for my hardware configuration (which is pretty standard). Doug Eadline ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From lindahl@cs.virginia.edu Tue, 6 Oct 1998 09:29:10 -0400 Date: Tue, 6 Oct 1998 09:29:10 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: PVM robustness problems. > We're using PVM on a switched 100mbps ethernet with 60 alpha boxes > running Linux (RH 4.2 based). We've run into two robustness issues > that maybe someone on the list has some solutions for. You should really use comp.parallel.pvm for questions like this. PVM seems to figure out deaths based on TCP socket closures. As everyone should know, TCP is really bad at figuring out the other end is gone. PVM should be sending keepalives, but it doesn't. Does your application only infrequently send messages? If so, sure, you can go a long time before a message failing to be received causes a link to close. You should get better behavior if the master sends keepalives (a message that the slave can simply discard) to the slaves every N minutes. The slaves should notice that the master is dead when they send their results back. -- g From HuntressGB@code831.npt.nuwc.navy.mil Tue, 6 Oct 1998 12:42:16 -0400 Date: Tue, 6 Oct 1998 12:42:16 -0400 From: Huntress, Gary B. HuntressGB@code831.npt.nuwc.navy.mil Subject: Demming's 14 Quality Points While perusing sci.stat.consult I found a reference to the 14 Demming quality points. As I re-read them, I felt the following comments gel in my head...see what you think: :) 1: Create constancy of purpose for improvement of product or service Lots of identical boxes 2: Adopt the new philosophy Beowulf 3: Cease dependence on mass inspection for quality control cease dependence on mass produced software/OS 4. End the practice of awarding business on the basis of price tag Supercomputers are high priced high tech Viagra (tm) 5. Improve quality, cost decrease because of fewer reworks, etc ... ??? 6: Institute more thorough, better job-related training well, I have been meaning to learn perl, Tcl/Tk, and python 7: Institute leadership I will continue shouting "Beowulf" from the rooftops 8: Drive out fear, so that everyone may work effectively for the company unix is not evil 9: Break down barriers between departments "They're just as scared of you as you are of them" 10. Eliminate slogans, exhortations, and targets for the work force that ask for zero defects and new level of productivity. "Where do you want to go today?" 11. Eliminate work standards on the factory floor A tough one...I'm *forced* by rule, to use MSOffice, etc....*blork* 12. Remove the barriers that rob employees at all levels in the company of their right to pride of workmanship If you can't get your IT manager to order a $2 cd from lsl.com, download it! 13. Institute a vigorous program of education and self-improvement add http://www.slashdot.org to your hotlist 14. Put everybody in the organization to work to accomplish the transformation. Tell them what you've done......nothing succeeds like success! Regards, Gary Huntress Code 8313 Naval Undersea Warfare Center Newport, RI 02841 1-800-669-NUWC x28990 http://www.nyx.net/~ghuntres/superid.html From cstod@vvm.com Tue, 6 Oct 1998 13:06:27 -0400 Date: Tue, 6 Oct 1998 13:06:27 -0400 From: Chris Stoddard cstod@vvm.com Subject: Extreme Linux I would recommend buying the CD, everything or mostly everything you need is contained on it. The only reason I'd do the download is if I already had a copy of Linux up and running. As for which flavor of RH you should buy, it depends, RH 5.1 has some bug fixes, but requires you to download the Beowolf software. EL is based on RH5.0, but includes the Beowolf stuff. If you are building a cluster machine, go with EL, if you are just building a network of linux machines, go for RH5.1 -Chris Stoddard "Jesus died for somebody's sins, but not mine." -----Original Message----- From: Yann E. MORIN To: beowulf@beowulf.gsfc.nasa.gov Date: Tue, 6 Oct 1998 13:06:27 -0400 Subject: Re: Extreme Linux >Hi! > >Here what I found: > >a ftp site at beowulf.gsfc.nasa.gov/extreme_linux >it's a mirror of the extreme linux CD from RedHat (which I'm currently running >on 4 PCs). > >What I wonder about is why you want to download it. It cost my school only 150FF >(around $25) to buy the CD... I don't thimk it's worth being downloaded... > >Regards, > >Yann. > >PS. I'm in charge of that small beowulf at scholl, but I'm a newbie... Soon, >we'll get a TRUE machine with 64 Mother Boards (PPC) and a ATM switch on which >we'll either use RT Linux or Extreme Linux... My plans are for EL, but teachers >are for RTL... Anyone to help make our choice??? > >-- > >Yann E. MORN >--------------------------------------- >Ecole Nationale d'Ingenieurs de Brest >Laboratoire d'Informatique Industrielle >Technopole Brest Iroise >CP 15 - 29608 Brest Cedex - FRANCE >--------------------------------------- >tel : (+33) 298 056 600 (Ext 6318) >fax : (+33) 298 056 629 >email : bureau : ymorin@enib.fr > perso : ymorin@france-mail.fr > yann.morin@hol.fr >--------------------------------------- > From newt@hq.nasa.gov Tue, 6 Oct 1998 13:25:06 -0400 Date: Tue, 6 Oct 1998 13:25:06 -0400 From: Daniel Ridge newt@hq.nasa.gov Subject: beobase RPM > What is the story with the beobase rpm? Does any one use it? Is there a > plan for any further development? I think that some of the concepts > introduced in the package are really good. Could the author(s) please > contact me? I wrote it. -Dan -------------------------------------+--------------------------------- Daniel Ridge | Computer Crime Division | N A S A email: dridge@hq.nasa.gov | Office of Inspector General tel: 202-358-1901 | 300 E Street SW fax: 202-358-3439 | Washington, D.C. 20546 NexTel: 301-440-9153 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From STU2997@atuvm.atu.edu Tue, 6 Oct 1998 17:44:54 -0400 Date: Tue, 6 Oct 1998 17:44:54 -0400 From: HORTON, DEREK K STU2997@atuvm.atu.edu Subject: beobase rpm (what is it for?) Hi Once again I am here asking probably a stupid question but what does beobase do? Thanks Derek Horton stu2997@atuvm.atu.edu From cbohn@afit.af.mil Tue, 6 Oct 1998 22:51:48 -0400 Date: Tue, 6 Oct 1998 22:51:48 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: Top Ten Reasons to Use Beowulf-Class Supercomputers This is a multi-part message in MIME format. ------=_NextPart_000_0002_01BDF17C.547DA100 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Good day, We recently started a new class of students here, and I was asked by my advisor to throw together a very brief presentation as a part of a larger presentation to try to convince a captive audience to try parallel & distributed computing. Part of what I came up with was my Top Ten Reasons to Use Beowulf-Class Supercomputers (largely toungue-in-cheek). Bearing in mind this was thrown together in a couple hours, and there was more substantial information I wanted to concentrate on, I didn't rack my brains trying to optimize this list. So, enjoy it, flame it, improve upon it, share it, whatever. It's derived from another list I'd gotten my hands on: Top Ten Reasons to use MPPs. Since I got that list third-hand, I can't cite the source, but I think it came from SC95. First, the Top Ten Reasons to use MPPs: 10. MPPs are way cool 9. Really humongous peak speeds 8. Lots of blinking lights 7. Great tee-shirts at SuperComputing ’95 6. Win valuable prizes 5. Al Gore loves them 4. DARPA will give you one for free 3. Help U.S. balance of trade 2. They’re hard to program, so you’ll always have a job 1. They are (or should be) scalable And now the Top Ten Reasons to Use Beowulf-Class Supercomputers: 10. MPPs are way expensive 9. Sustained speeds competitive with MPPs 8. Lots of idle processors just waiting to be used 7. That Penguin is just so darn cute 6. Configure the hardware to suit your needs 5. Rewrite the operating system to suit your needs 4. DARPA doesn’t have to give you one for free 3. They’re cheap, so more money can go to your annual raise 2. They’re hard to program, so you’ll always have a job 1. They are (or should be) scalable Enjoy, and take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* ------=_NextPart_000_0002_01BDF17C.547DA100 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Good day,
 
We=20 recently started a new class of students here, and I was asked by my = advisor to=20 throw together a very brief presentation as a part of a larger = presentation to=20 try to convince a captive audience to try parallel & distributed=20 computing.  Part of what I came up with was my Top Ten Reasons to = Use=20 Beowulf-Class Supercomputers (largely toungue-in-cheek).  Bearing = in mind=20 this was thrown  together in a couple hours, and there was more = substantial=20 information I wanted to concentrate on, I didn't rack my brains trying = to=20 optimize this list.  So, enjoy it, flame it, improve upon it, share = it,=20 whatever.
 
It's=20 derived from another list I'd gotten my hands on:  Top Ten Reasons = to use=20 MPPs.  Since I got that list third-hand, I can't cite the source, = but I=20 think it came from SC95.
 
 
First,=20 the Top Ten Reasons to use MPPs:
10. MPPs are way=20 cool
9. Really humongous = peak=20 speeds
8. Lots of blinking=20 lights
7. Great tee-shirts at = SuperComputing=20 ’95
6. Win valuable=20 prizes
5. Al Gore loves=20 them
4. DARPA will give you = one for=20 free
3. Help U.S. balance of = trade
2. They’re hard = to program, so=20 you’ll always have a job
1. They are (or should = be)=20 scalable
 
 
And now the Top = Ten Reasons to=20 Use Beowulf-Class Supercomputers:
10. MPPs are way=20 expensive
9. Sustained speeds = competitive with=20 MPPs
8. Lots of idle = processors just=20 waiting to be used
7. That Penguin is just = so darn=20 cute
6. Configure the = hardware to suit=20 your needs
5. Rewrite the = operating system to=20 suit your needs
4. DARPA doesn’t = have to give=20 you one for free
3. They’re cheap, = so more money=20 can go to your annual raise
2. They’re hard = to program, so=20 you’ll always have a job
1. They are (or should = be)=20 scalable
 
 
Enjoy, and take = care,
cb
=
 
*-*-*-*-*-*-*-*
Capt Christopher A. Bohn
Graduate = Student,=20 Electrical (digital) Engineering
Air Force Institute of=20 Technology     Phone (937)255-3636 (DSN=20 785)

AFIT/EN638        &n= bsp;           &nb= sp;        =20 Lab x4606   Voicemail x6638
2950 P St, Box=20 4638           &nb= sp;           &nbs= p;=20 email cbohn@afit.af.mil
Wright-Patterson AFB OH=20 45433-7765          &nb= sp;     =20 EngrBohn@aol.com
         = ;     =20 http://members.aol.com/EngrBohn= /
*-*-*-*-*-*-*-*
 
------=_NextPart_000_0002_01BDF17C.547DA100-- From Jeff.West@ssc.nasa.gov Wed, 7 Oct 1998 10:59:45 -0400 Date: Wed, 7 Oct 1998 10:59:45 -0400 From: West, Jeff Jeff.West@ssc.nasa.gov Subject: PC100 SDRAM and BX M/B ? Doug & others: I had ordered a PII-400 BX-based system. Ignorant of this thread I specified PC100 SDRAM, the DIMM has a nice little -10 printed on the IC's. Lo and behold I seem to have problems with crashes. Here is a snippet of the system log message from a crash. /***** Oct 7 07:40:10 ssc1939648 kernel: Unable to handle kernel paging request at virtual address f6323031 Oct 7 07:40:10 ssc1939648 kernel: current->tss.cr3 = 05088000, `r3 = 05088000 Oct 7 07:40:10 ssc1939648 kernel: *pde = 00000000 Oct 7 07:40:10 ssc1939648 kernel: Oops: 0000 Oct 7 07:40:10 ssc1939648 kernel: CPU: 0 Oct 7 07:40:10 ssc1939648 kernel: EIP: 0010:[<0011ee60>] Oct 7 07:40:10 ssc1939648 kernel: EFLAGS: 00010002 Oct 7 07:40:10 ssc1939648 kernel: eax: 36323031 ebx: 001d2db8 ecx: 05950410 edx: 05950000 Oct 7 07:40:10 ssc1939648 kernel: esi: 36323031 edi: 0000001f ebp: 00000246 esp: 07e84f44 Oct 7 07:40:10 ssc1939648 kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 Oct 7 07:40:10 ssc1939648 kernel: Process kerneld (pid: 51, process nr: 7, stackpage=07e84000) Oct 7 07:40:10 ssc1939648 kernel: Stack: 05950418 05133398 bffffcb0 bffffb8c 00000000 00115f91 05950418 0880b000 Oct 7 07:40:10 ssc1939648 kernel: 05950418 001e2e80 00115cd5 0594f810 0804b40c bffffb68 bffffb30 0000000c Oct 7 07:40:10 ssc1939648 kernel: 00000400 00000000 fffffffc 00000000 00000000 4007efae 00000023 00000212 Oct 7 07:40:10 ssc1939648 kernel: Call Trace: [<00115f91>] [<0880b000>] [<00115cd5>] [<0010a91d>] Oct 7 07:40:10 ssc1939648 kernel: Code: 8b 06 85 c0 74 66 39 d0 74 06 89 c6 eb f2 89 f6 8b 02 89 06 ****/ Is this consistent with a slow memory problem? I am concerned about chasing the wrong goose here. I am sure glad I ordered a checkout system instead of having this problem with a whole 48 node cluster. My spec. for the remainder will be detailed indeed. The BIOS is AWARD, is there a way to insert a wait state and verify this is indeed causing the problem? Thanks: Jeff Jeff West, Ph.D. Lockheed Martin Stennis Operations Bldg. 8306 Stennis Space Center, MS 39529 *(228) 688-1562 Fax: (228) 688-1106 *jeff.west@ssc.nasa.gov > -----Original Message----- > From: Douglas Eadline [SMTP:deadline@plogic.com] > Sent: Saturday, September 26, 1998 7:38 AM > To: jhpark@nurapt.kaist.ac.kr > Cc: beowulf@beowulf.gsfc.nasa.gov > Subject: Re: PC100 SDRAM and BX M/B ? > > On Sat, 26 Sep 1998, Jeong Hwan Park wrote: > > > > > We are about to build a Beowulf system. > > > > I don't have any experience with PC100 SDRAM and BX M/B. > > I heard the cycle time of PC100 SDRAM must be below 8ns. > > If I use the 10 ns (tCLK) PC100 ECC SDRAM, is it safe? > > > > Is there any suggestion of PC100 ECC SDRAM? > > We have found that just the PC100 certification is > not a guarantee that the memory will work in all cases. It seems > Linux pushes the memory a bit harder than Windows or NT > and that some systems will run fine using W95 or NT, but > eventually fail with Linux (after several day of running, a random > crash etc.) > > We now use 7ns or better ECC SDRAM in all our systems. > > Also, there seem to be motherboard/memory interaction > rules that are not quite understood. For example: > "micron does not work well with supermicro" > > > > > I think the MPI timeout problem on the BX M/B due to slow PC100 SDRAM. > > How do you think about? > No. > > Doug > ------------------------------------------------------------------- > Paralogic, Inc. | PEAK | Voice:+610.861.6960 > 115 Research Drive | PARALLEL | Fax:+610.861.8247 > Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com > ------------------------------------------------------------------- From HuntressGB@code831.npt.nuwc.navy.mil Wed, 7 Oct 1998 11:41:38 -0400 Date: Wed, 7 Oct 1998 11:41:38 -0400 From: Huntress, Gary B. HuntressGB@code831.npt.nuwc.navy.mil Subject: Installing legacy network cards This is not specifically a Beowulf question, although it does affect my cluster. I have not gotten much help elsewhere, my apologies for cluttering up the list. I've got 6 new systems (old 486's) with 6 apparently no-name ethernet cards. No one here has any documentation, and the best guess is that they are NE2000 clones (the only marking on them is "made in taiwan" and 2000a). I compiled NE2000 support into my kernel, and the cards appear detected at boot, and with the correct interrupt. Yet, the net appears dead....I can't ping any hosts. (I eliminated the usual suspects, cables, terminators etc) I basically have two questions: 1) If these are NOT NE2000 clones then I obviously have the wrong driver. Would the NE2K driver still detect a non-ne2k card? (i.e. am I being misled by the msg that it found the card?) 2) These cards are three way (AUI, coax, and twisted pair). I *assume* that all three are always available. That is, I don't have to jumper one interface to enable it. Is this true? Thanks in advance, Regards, Gary Huntress Code 8313 Naval Undersea Warfare Center Newport, RI 02841 1-800-669-NUWC x28990 http://www.nyx.net/~ghuntres/superid.html From hatridge@straubing.baynet.de Wed, 7 Oct 1998 12:03:03 -0400 Date: Wed, 7 Oct 1998 12:03:03 -0400 From: Jim Hatridge hatridge@straubing.baynet.de Subject: Extreme Linux On Tue, 6 Oct 1998, Yann E. MORIN wrote: > Hi! > > Here what I found: > > a ftp site at beowulf.gsfc.nasa.gov/extreme_linux > it's a mirror of the extreme linux CD from RedHat (which I'm currently running > on 4 PCs). > > What I wonder about is why you want to download it. It cost my school only > 150FF > (around $25) to buy the CD... I don't thimk it's worth being downloaded... > > Regards, > > Yann. Hi Yann, I know that it's not very costly. The deal is that at this moment I don't really need it all. (But I had someone offer me a CD of it.) Right now I have only one computer (486/33). Around March '99 I will have some money (money CD, Festgeld in German) come to me. I plan / hope to buy 4 - 6 old 486's and set up a Beowulf. Until that time I am studing all I can find about Beowulfs. I tried to get to the RH/Extreme ftp sites, it was always busy. So I turned to the group for help. They (and you) have given me allot. That's one of the great things about Linux! Thanks, J I M ----------------------------------------- Jim Hatridge Germany hatridge@straubing.baynet.de M$ -- Ghostdriver* on the road to the future! (*German Slang for the guy driving on the wrong side of the road!) From czimmer@insync.net Wed, 7 Oct 1998 14:26:31 -0400 Date: Wed, 7 Oct 1998 14:26:31 -0400 From: Chris Zimmerman czimmer@insync.net Subject: PC100 SDRAM and BX M/B ? I ran into the same problem when I switched my desktop PC from a PII 266 to an AMD K6-2 300 with the PC100 RAM. I did not do much troubleshooting, instead I loaded another hard drive with the base Linux install, then copied my old data over from my old hard drive. Please let me know what is happening here..... Thank you, Chris Zimmerman From rdab100@hermes.cam.ac.uk Wed, 7 Oct 1998 14:37:24 -0400 Date: Wed, 7 Oct 1998 14:37:24 -0400 From: Dominic Baines rdab100@hermes.cam.ac.uk Subject: Installing legacy network cards Gary, Huntress, Gary B. wrote: > This is not specifically a Beowulf question, although it does affect my > cluster. I have not gotten much help elsewhere, my apologies for > cluttering up the list. > > I've got 6 new systems (old 486's) with 6 apparently no-name ethernet > cards. No one here has any documentation, and the best guess is that > they are NE2000 clones (the only marking on them is "made in taiwan" and > 2000a). I compiled NE2000 support into my kernel, and the cards appear > detected at boot, and with the correct interrupt. Yet, the net appears > dead....I can't ping any hosts. (I eliminated the usual suspects, > cables, terminators etc) That may indicate that you may be unfortunate to have cards that configure for certain network hardware (ie RJ45, BNC etc...) and store that in the card's memory. Did you get any driver software (DOS usually) with them ? > I basically have two questions: > > 1) If these are NOT NE2000 clones then I obviously have the wrong > driver. Would the NE2K driver still detect a non-ne2k card? (i.e. am > I being misled by the msg that it found the card?) > > 2) These cards are three way (AUI, coax, and twisted pair). I *assume* > that all three are always available. That is, I don't have to jumper > one interface to enable it. Is this true? No this may be the problem. The different interfaces may need to be enabledif the cards don't auto detect which to use or have been previously set to another interface than the one you want. Hope that helps some. Dominic Baines From efinch@eos.EAST.HITC.COM Wed, 7 Oct 1998 14:58:52 -0400 Date: Wed, 7 Oct 1998 14:58:52 -0400 From: Ed Finch efinch@eos.EAST.HITC.COM Subject: Extreme Linux Jim Hatridge wrote: > > I know that it's not very costly. The deal is that at this moment I don't > really need it all. A coworker said he saw the Extreme Linux CD at Micro Center in Northern Virginia for $19. It pays to shop around. -- Ed Finch Q: Why do PCs have a reset button on the front? A: Because they are expected to run Microsoft operating systems. From jcthomso@mtu.edu Wed, 7 Oct 1998 15:09:05 -0400 Date: Wed, 7 Oct 1998 15:09:05 -0400 From: Jimmy C. Thomson jcthomso@mtu.edu Subject: Redhat to extreme? Ok, I already run Redhat 5.1, what do I have to do to make a beowulf cluster? Is there a patch or do you have to go and reinstall everything? ********************************************************* Jimmy Thomson Computer Science Undergrad Michigan Technological University "I don't think that Microsoft is evil in itself; I just think that they make really crappy operating systems." ---Linus Torvalds--- ********************************************************* From kairu@obsidian.calstatela.edu Wed, 7 Oct 1998 15:54:57 -0400 Date: Wed, 7 Oct 1998 15:54:57 -0400 From: Wing kairu@obsidian.calstatela.edu Subject: PC100 SDRAM and BX M/B ? On Wed, 7 Oct 1998, West, Jeff wrote: > Doug & others: > > I had ordered a PII-400 BX-based system. Ignorant of this thread I > specified PC100 SDRAM, the DIMM has a nice little -10 printed on the IC's. > Lo and behold I seem to have problems with crashes. Here is a snippet of > the system log message from a crash. I've noticed quite a few vendors of memory chips noting that 10ns/pc100 isn't enough to perform at the 100mhz bus speed. My own memory that I use at home is 7ns and 8ns. Waiting for my 100mhz board to come back after being replaced... bad cache. ^_^; > Is this consistent with a slow memory problem? I am concerned about > chasing the wrong goose here. I am sure glad I ordered a checkout system > instead of having this problem with a whole 48 node cluster. My spec. for > the remainder will be detailed indeed. Most likely, it is because of slow memory. 10ns for a 100mhz system won't cut it. A few notes which you may already be aware of: - Some resellers of chips may be remarking their chips and altering the eeproms(if any) on the chips to report pc100 compatibility and selling those as such. - Some resellers are selling chips "pc100 certified", but are rated at only 10ns. I agree with another post before that 8ns or better would be the way to go. (5ns would be great, if anyone can find some of it. ^_^;) > The BIOS is AWARD, is there a way to insert a wait state and verify > this is indeed causing the problem? > Thanks: Jeff I don't think changing the wait state will help you much, Jeff. @100mhz, a few wait states are not going to do much to keep the system stable. If this is just a checkout system, you might want to see if the group you are ordering from has a checklist of compatible 100mhz boards with memory chips. Some groups out there stand by the chips they sell, but they will charge a premium for them. (When you order from them, they will ask for the type and model of the motherboard you plan on using the memory in.) Wing. From warnes@biostat.washington.edu Wed, 7 Oct 1998 16:40:29 -0400 Date: Wed, 7 Oct 1998 16:40:29 -0400 From: Gregory R. Warnes warnes@biostat.washington.edu Subject: Freeowulf (was Re: Beowulf in a Box (fwd)) On Sat, 3 Oct 1998, Douglas Eadline wrote: > > > In my opinion, this would not work. It makes more sense when > the computers were very expensive (this is why we had SC centers) > I think the appeal of a Beowulf is the ownership - it is yours > you get to use when you want and how you want. Now that the cost > is so low for the hardware, money spent on shared resource > could go toward your own system. It will eventually work. Right now, I see groups of researchers _within_ an organization pooling resources for a Beowulf cluster. This is on the verge of happening where I admin a Beowulf cluster. As I see it, you either contribute machines or $$ to get into the coperative. Then you get access to the cluster for your research. This should work wonderfully for those groups that have sporadic, very demanding computations. This will allow using the free cycles when another group's machines would be sitting idle, at the cost of having to share _your_ machines when the demands coincide. [I don't think this would work well for groups that have long-term full-out computing.] -Greg ------------------------------------------------------------------------------- Gregory R. Warnes | It is high time that the ideal of success warnes@biostat.washington.edu | be replaced by the ideal of service. | Albert Einstein ------------------------------------------------------------------------------- From cbohn@afit.af.mil Wed, 7 Oct 1998 20:52:46 -0400 Date: Wed, 7 Oct 1998 20:52:46 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: g77 problems with Extreme Linux This is a multi-part message in MIME format. ------=_NextPart_000_0003_01BDF234.E679B900 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Good day, Various factors have led me to run a reinstallation of our existing nodes while installing our new nodes -- this time with the Extreme Linux CD. I'm running into a bit of a snag with the g77 compiler, though ... even with serial Fortran. Here's what happened when I tried to compile pi3.f from MPICH/examples: [cbohn@abc04 cbohn]$ make pi3 g77 -I/usr/mpich/include -c pi3.f gcc: language f77 not recognized gcc: pi3.f: linker input file unused since linking not done g77 -o pi3 pi3.o -L/usr/mpich/lib/LINUX/ch_p4 -lmpi gcc: pi3.o: No such file or directory make: *** [pi3] Error 1 and with a serial Fortran file: [cbohn@abc04 cbohn]$ g77 pi.f gcc: language f77 not recognized ld:pi.f: file format not recognized; treating as linker script ld:pi.f:1: parse error Anyone else encounter this? Any "easy & obvious" fixes (preferably one that'll make me slap my forehead with a "duh!")? Or am I going to need to download & install the lastest & greatest gcc & g77 that'll play nice together? Thanks for your time. Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* ------=_NextPart_000_0003_01BDF234.E679B900 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Good=20 day,
     Various factors have led me to run a=20 reinstallation of our existing nodes while installing our new nodes -- = this time=20 with the Extreme Linux CD.  I'm running into a bit of a snag with = the g77=20 compiler, though ... even with serial Fortran.
 
Here's=20 what happened when I tried to compile pi3.f from=20 MPICH/examples:
 
[cbohn@abc04 cbohn]$=20 make pi3
g77 -I/usr/mpich/include  -c pi3.f
gcc: language f77 = not=20 recognized
gcc: pi3.f: linker input file unused since linking not=20 done
g77  -o pi3 pi3.o -L/usr/mpich/lib/LINUX/ch_p4  = -lmpi =20
gcc: pi3.o: No such file or directory
make: *** [pi3] Error=20 1
 
and with a serial Fortran=20 file:

[cbohn@abc04=20 cbohn]$ g77 pi.f
gcc: language f77 not recognized
ld:pi.f: file = format not=20 recognized; treating as linker script
ld:pi.f:1: parse=20 error
Anyone else encounter this?  = Any "easy=20 & obvious" fixes (preferably one that'll make me slap my = forehead with=20 a "duh!")?  Or am I going to need to download & = install the=20 lastest & greatest gcc & g77 that'll play nice=20 together?
 
Thanks for your = time.
Take=20 care,
cb

*-*-*-*-*-*-*-*
Capt Christopher A. = Bohn
Graduate Student,=20 Electrical (digital) Engineering
Air Force Institute of=20 Technology     Phone (937)255-3636 (DSN=20 785)
AFIT/EN638         &= nbsp;           &n= bsp;       =20 Lab x4606   Voicemail x6638
2950 P St, Box=20 4638           &nb= sp;           &nbs= p;=20 email cbohn@afit.af.mil
Wright-Patterson AFB OH=20 45433-7765          &nb= sp;     =20 EngrBohn@aol.com
         = ;     =20 http://members.aol.com/EngrBohn= /
*-*-*-*-*-*-*-*

 
------=_NextPart_000_0003_01BDF234.E679B900-- From kragen@pobox.com Thu, 8 Oct 1998 09:10:09 -0400 Date: Thu, 8 Oct 1998 09:10:09 -0400 From: Kragen kragen@pobox.com Subject: g77 problems with Extreme Linux On Wed, 7 Oct 1998, Capt Bohn, Christopher A. wrote: > Good day, > Various factors have led me to run a reinstallation of our existing > nodes while installing our new nodes -- this time with the Extreme Linux CD. > I'm running into a bit of a snag with the g77 compiler, though ... even with > serial Fortran. > > Here's what happened when I tried to compile pi3.f from MPICH/examples: > > [cbohn@abc04 cbohn]$ make pi3 > g77 -I/usr/mpich/include -c pi3.f > gcc: language f77 not recognized > Anyone else encounter this? Any "easy & obvious" fixes (preferably one > that'll make me slap my forehead with a "duh!")? Or am I going to need to > download & install the lastest & greatest gcc & g77 that'll play nice > together? g77 is a program that invokes gcc and tells gcc to use the Fortran front-end. If gcc doesn't have a Fortran front-end, it won't work. The g77 source package includes some patches to add to the gcc source tree to get it to work. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From lindahl@cs.virginia.edu Thu, 8 Oct 1998 10:22:26 -0400 Date: Thu, 8 Oct 1998 10:22:26 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: g77 problems with Extreme Linux > g77 is a program that invokes gcc and tells gcc to use the Fortran > front-end. If gcc doesn't have a Fortran front-end, it won't work. > The g77 source package includes some patches to add to the gcc source > tree to get it to work. Note that RH 5.0 -- which is what I think the Extreme Linux CD includes -- is the last version of RedHat to not use egcs. The easiest way to get g77 is to install egcs, which has it integrated. -- g From Jeff.West@ssc.nasa.gov Thu, 8 Oct 1998 11:14:41 -0400 Date: Thu, 8 Oct 1998 11:14:41 -0400 From: West, Jeff Jeff.West@ssc.nasa.gov Subject: g77 problems with Extreme Linux Chris: Sound like gcc and g77 are not mated properly. (g77 is just a front-end to gcc) When g77 invokes the back end of gcc, gcc complains. I suggest you get gcc and g77 that are set up to work together. I am no expert here but I think you are using gcc that comes with EL, scrap it and get both gcc and g77 together. There is an rpm series with gcc, g++ and g77 all that work nicely together. Also the egcs guys distribute a gcc/g77 combination, as well as the Pentium Compiler Group, which have added pentium-specific opts to the egcs package. URLs of these sites escape me right now but I bet a net search would reveal them. Adding Pentium-specific gcc/g++/g77 to the EL CD seems like a good idea. I am assuming that g77 is not part of the EL CD, just as it is not part of the RH CDs. Why doesn't RH add g77 to their CDs? They are still sticking with the f2c package. My $0.02 worth. Jeff Jeff West, Ph.D. Lockheed Martin Stennis Operations Bldg. 8306 Stennis Space Center, MS 39529 *(228) 688-1562 Fax: (228) 688-1106 *jeff.west@ssc.nasa.gov > -----Original Message----- > From: Capt Bohn, Christopher A. [SMTP:cbohn@afit.af.mil] > Sent: Wednesday, October 07, 1998 7:56 PM > To: Beowulf > Subject: g77 problems with Extreme Linux > > Good day, >      Various factors have led me to run a reinstallation of our existing > nodes while installing our new nodes -- this time with the Extreme Linux > CD.  I'm running into a bit of a snag with the g77 compiler, though ... > even with serial Fortran. >   > Here's what happened when I tried to compile pi3.f from MPICH/examples: >   > [cbohn@abc04 cbohn]$ make pi3 > g77 -I/usr/mpich/include  -c pi3.f > gcc: language f77 not recognized > gcc: pi3.f: linker input file unused since linking not done > g77  -o pi3 pi3.o -L/usr/mpich/lib/LINUX/ch_p4  -lmpi  > gcc: pi3.o: No such file or directory > make: *** [pi3] Error 1 >   > and with a serial Fortran file: > > [cbohn@abc04 cbohn]$ g77 pi.f > gcc: language f77 not recognized > ld:pi.f: file format not recognized; treating as linker script > ld:pi.f:1: parse error > > Anyone else encounter this?  Any "easy & obvious" fixes (preferably one > that'll make me slap my forehead with a "duh!")?  Or am I going to need to > download & install the lastest & greatest gcc & g77 that'll play nice > together? >   > Thanks for your time. > Take care, > cb > > *-*-*-*-*-*-*-* > Capt Christopher A. Bohn > Graduate Student, Electrical (digital) Engineering > Air Force Institute of Technology     Phone (937)255-3636 (DSN 785) > AFIT/EN638                              Lab x4606   Voicemail x6638 > 2950 P St, Box 4638                         email cbohn@afit.af.mil > Wright-Patterson AFB OH 45433-7765                 EngrBohn@aol.com >                > *-*-*-*-*-*-*-* > >   From josip@icase.edu Thu, 8 Oct 1998 11:15:40 -0400 Date: Thu, 8 Oct 1998 11:15:40 -0400 From: Josip Loncaric josip@icase.edu Subject: PC100 SDRAM and BX M/B ? Wing wrote: > > I've noticed quite a few vendors of memory chips noting that 10ns/pc100 > isn't enough to perform at the 100mhz bus speed. My own memory that I use > at home is 7ns and 8ns. Waiting for my 100mhz board to come back after > being replaced... bad cache. ^_^; As far as I know, Intel's idea of PC100 compatibility is 7ns or 8ns access time, not 10ns access time. However, it seems that some 10ns SDRAM gets called PC100 anyway. Josip P.S. http://developer.intel.com/design/pcisets/memory/index.htm#sdram may be of interest. The PC100 testing summary says that Intel specification at 50pf load at both test conditions (condition A: Vcc=3.6V, Ta=0C; condition B: Vcc=3.0V, Tc=85C) requires the following timings: Tac: 6ns max Toh: 3ns min Tset: 2ns max Thold: 1ns max -- Dr. Josip Loncaric, Senior Staff Scientist ICASE, M/S 403, NASA Langley Research Center, Hampton, VA 23681-2199 Phone: (757) 864-2192 mailto:josip@icase.edu Fax: (757) 864-6134 http://www.icase.edu/~josip/ From rahul@reno.cis.upenn.edu Thu, 8 Oct 1998 11:19:13 -0400 Date: Thu, 8 Oct 1998 11:19:13 -0400 From: Rahul Dave rahul@reno.cis.upenn.edu Subject: g77 problems with Extreme Linux egcs/pentiumgcc have g77 integrated in. The last time I benchmarked, a while back: gcc version pgcc-2.90.27 980315 (egcs-1.0.2 release) on a pentium pro, I got a ~>10% improvement in speed on some of my boltzmann codes Rahul > > On Wed, 7 Oct 1998, Capt Bohn, Christopher A. wrote: > > Good day, > > Various factors have led me to run a reinstallation of our existing > > nodes while installing our new nodes -- this time with the Extreme Linux CD. > > I'm running into a bit of a snag with the g77 compiler, though ... even with > > serial Fortran. > > > > Here's what happened when I tried to compile pi3.f from MPICH/examples: > > > > [cbohn@abc04 cbohn]$ make pi3 > > g77 -I/usr/mpich/include -c pi3.f > > gcc: language f77 not recognized > > > Anyone else encounter this? Any "easy & obvious" fixes (preferably one > > that'll make me slap my forehead with a "duh!")? Or am I going to need to > > download & install the lastest & greatest gcc & g77 that'll play nice > > together? > > g77 is a program that invokes gcc and tells gcc to use the Fortran > front-end. If gcc doesn't have a Fortran front-end, it won't work. > The g77 source package includes some patches to add to the gcc source > tree to get it to work. > > Kragen > > -- > Kragen Sitaker > A well designed system must take people into account. . . . It's hard to > build a system that provides strong authentication on top of systems that > can be penetrated by knowing someone's mother's maiden name. -- Schneier > From lindahl@cs.virginia.edu Thu, 8 Oct 1998 12:05:51 -0400 Date: Thu, 8 Oct 1998 12:05:51 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: g77 problems with Extreme Linux > Adding Pentium-specific gcc/g++/g77 to the EL CD seems like a good > idea. I am assuming that g77 is not part of the EL CD, just as it is not > part of the RH CDs. Why doesn't RH add g77 to their CDs? They are still > sticking with the f2c package. RH, as of 5.1, is shipping egcs as their compiler set. They are also still shipping f2c, which is a fine thing. Presumably the next Extreme Linux CD is going to be based on a later version of RedHat that includes egcs. -- g From juanr@averroes.ivic.ve Thu, 8 Oct 1998 14:37:00 -0400 Date: Thu, 8 Oct 1998 14:37:00 -0400 From: Juan Rivero juanr@averroes.ivic.ve Subject: g77 problems with Extreme Linux On Thu, Oct 08, 1998 at 10:15:04AM -0500, West, Jeff wrote: > Chris: > Sound like gcc and g77 are not mated properly. (g77 is just a > front-end to gcc) When g77 invokes the back end of gcc, gcc complains. > ... > > URLs of these sites escape me right now but I bet a net search would > reveal them. http://egcs.cygnus.com -- Dr. Juan Rivero email juanr@averroes.ivic.ve Tel +582-504-1772 Centro de Quimica WWW http://averroes.ivic.ve/ Fax +582-504-1350 Instituto Venezolano de Investigaciones Cientificas Cel +5814-933 93 88 IVIC, AP 21827, Caracas 1020A, Venezuela From cbohn@afit.af.mil Thu, 8 Oct 1998 15:07:55 -0400 Date: Thu, 8 Oct 1998 15:07:55 -0400 From: Bohn, Christopher A. cbohn@afit.af.mil Subject: g77 problems with Extreme Linux Good day, Well, I installed ecgs, & ecgs-g77 ... and I can now successfully compile serial Fortran. But it still doesn't seem to be linking with the MPI libraries (I'm using MPICH): [cbohn@abc04 cbohn]$ make pi3 g77 -I/usr/mpich/include -c pi3.f g77 -o pi3 pi3.o -L/usr/mpich/lib/LINUX/ch_p4 -lmpi pi3.o: In function `MAIN__': pi3.o(.text+0x39): undefined reference to `mpi_init__' pi3.o(.text+0x4e): undefined reference to `mpi_comm_rank__' pi3.o(.text+0x63): undefined reference to `mpi_comm_size__' pi3.o(.text+0x15f): undefined reference to `mpi_bcast__' pi3.o(.text+0x21f): undefined reference to `mpi_reduce__' pi3.o(.text+0x289): undefined reference to `mpi_finalize__' pi3.o(.data+0x0): undefined reference to `mpi_dup_fn__' pi3.o(.data+0x4): undefined reference to `mpi_null_delete_fn__' pi3.o(.data+0x8): undefined reference to `mpi_null_copy_fn__' collect2: ld returned 1 exit status make: *** [pi3] Error 1 Any suggestions? Thanks. cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* From guilin@ix.netcom.com Thu, 8 Oct 1998 16:45:08 -0400 Date: Thu, 8 Oct 1998 16:45:08 -0400 From: guilin@ix.netcom.com guilin@ix.netcom.com Subject: fault tolerance Hi In a beowulf system, if one of the slave nodes fail, will the entire system crash? Or will the system continue with just one less node? Thanks Richard From harter@feeding.frenzy.com Thu, 8 Oct 1998 16:52:52 -0400 Date: Thu, 8 Oct 1998 16:52:52 -0400 From: Sam Hayes Merritt, III harter@feeding.frenzy.com Subject: Beowulf in a Box (fwd) On Fri, 2 Oct 1998, Dominic Baines wrote: > Douglas Eadline wrote: > > Is there a publicly available cluster anywhere (that could > be accessed over the internet under a guest login and run > a process on) anyway ? I had had plans of developing one. > Would it be of use to create one ? Yes. If you monitor it and make sure it is not abused. I think it would be highly useful to persons without access to any other high powered systems if there wished to develop ideas and test out things. Sam From rahul@reno.cis.upenn.edu Thu, 8 Oct 1998 17:58:10 -0400 Date: Thu, 8 Oct 1998 17:58:10 -0400 From: Rahul Dave rahul@reno.cis.upenn.edu Subject: Linux SMP 2.1.124 and 2.1.79 TCP comparision, 2.1.79 FIN_WAIT1 problem Hi, I was using linux 2.1.79 on my cluster. The machines are Dell Poweredges 6100, Quad Ppro, Gigabyte memory, Onboard Adaptec 2940, intel etherexpress pro, communicating over a Intel 510T switch. I am running Message Passing (MPI) programs on these machines. Mostly in jobs with two processes on one node and one on the other, on terminating the job prematurely by a kill or ctrl-C, sockets get left in the FIN_WAIT1. Sometimes the corresponding socket is in LAST_ACK, indicating that one end went into LAST_ACK without doing the needful to get FIN_WAIT1 to FIN_WAIT2 or something like that. The FIN_WAIT1's never go away:(from netstat) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 23025 eip11.cluster01.en:4242 eip11.cluster01.en:4244 FIN_WAIT1 tcp 0 1 eip11.cluster01.en:4234 eip11.cluster01.en:4232 FIN_WAIT1 This behaviour goes away when I use 2.1.79 uniprocessor. There, the two processes on the same machine are on the same processor. Upgrading to 2.1.124 took away the FIN_WAIT1 problem, but the performance is very much slower. For jobs with 40,000 broadcasts and reduces, which take 8 seconds on 2.1.79, take 80 seconds on 2.1.124. I benchmarked the network performance for TCP and MPI, and the TCP performance of 2.1.124 is almost 105 lower. The results, using the Netpipe benchmark are at http://reno.cis.upenn.edu/~rahul/perf/, with graphs of throughput against block transfer time, and blocksize in both postcript and gif format. Is there some way to fix the FIN_WAIT1 problem? Or, why is the throughput less on 2.1.124? Is there some way to fix that instead? Does IO-APIC have anything to do with it(2.1.79 dosent seem to have it) Thanks a lot, Rahul From bryan@bighorn.dr.lucent.com Thu, 8 Oct 1998 21:50:12 -0400 Date: Thu, 8 Oct 1998 21:50:12 -0400 From: Bryan J. Welch bryan@bighorn.dr.lucent.com Subject: High availability? In looking at Beowulf as a new build server, some questions came up recently about high availability. What happens to the cluster (and any jobs running) when a single machine in the cluster fails? Is it possible to take down and upgrade/replace a single machine in the cluster without taking down the entire cluster? How long has some of you kept you your cluster? Could we expect to keep the cluster running for months at a time without taking it down? thanks, Bryan -- Bryan Welch - Bell Labs - bjwelch@bell-labs.com - N0SFG - PP-ASEL From eugene@liposome.genebee.msu.su Fri, 9 Oct 1998 00:47:00 -0400 Date: Fri, 9 Oct 1998 00:47:00 -0400 From: Eugene Leitl eugene@liposome.genebee.msu.su Subject: RAID controller question... On Thu, 10 Sep 1998, Alan Cox wrote: > Not if you have hotswap hardware and software raid. While speaking of hotswap hardware: Kingston drive frames are supposed to be best, yet at $98 (hotswappable probably even more expensive) do spell large expenses if used in quantity. Any alternative vendors, anybody? Thanks! Regards, Eugene Leitl From chen@next.co.jp Fri, 9 Oct 1998 01:56:35 -0400 Date: Fri, 9 Oct 1998 01:56:35 -0400 From: Qing Chen chen@next.co.jp Subject: PVM/MPI and performance Hi there, I have several PCs in our LAN. The operating systems are Linux and FreeBSD. Also, I have already installed PVM/MPI packages on those computers. The problem I have now is how to measure the performance of this beowulf cluster. I heard that Linpack is one of the benchmarks. But does it work with only MPP supercomputers or with PC clusters either? Please help. -- David From rdab100@hermes.cam.ac.uk Fri, 9 Oct 1998 02:23:13 -0400 Date: Fri, 9 Oct 1998 02:23:13 -0400 From: Dominic Baines rdab100@hermes.cam.ac.uk Subject: Beowulf in a Box (fwd) Sam, I'm sure Douglas would love to run one as I suggested but I think you over snipped the email !! These comments were mine, I can't locate what Douglas actually said but it was along the line of "....cooperating on a cluster by some organisations was both possible and probable." (Apologies Douglas if I've miss quoted) Sam Hayes Merritt, III wrote: > On Fri, 2 Oct 1998, Dominic Baines wrote: > > > Douglas Eadline wrote: > > > > Is there a publicly available cluster anywhere (that could > > be accessed over the internet under a guest login and run > > a process on) anyway ? > > I had had plans of developing one. > > > Would it be of use to create one ? > > Yes. If you monitor it and make sure it is not abused. > I think it would be highly useful to persons without access > to any other high powered systems if there wished to develop > ideas and test out things. I will and do have a couple of clusters that were used for specific tasksand an other facility that is in essence a public workstation facility and it is this last one that will become a 'public' Beowulf. In time details will arrive. > Sam Dominic From davem@dm.cobaltmicro.com Fri, 9 Oct 1998 02:26:42 -0400 Date: Fri, 9 Oct 1998 02:26:42 -0400 From: David S. Miller davem@dm.cobaltmicro.com Subject: Linux SMP 2.1.124 and 2.1.79 TCP comparision, 2.1.79 FIN_WAIT1 problem From: Rahul Dave Date: Thu, 8 Oct 1998 17:50:11 -0400 (EDT) Upgrading to 2.1.124 took away the FIN_WAIT1 problem, but the performance is very much slower. For jobs with 40,000 broadcasts and reduces, which take 8 seconds on 2.1.79, take 80 seconds on 2.1.124. What networking card? TCP did not go slower at all during this time frame, in fact it should have become slightly faster. Later, David S. Miller davem@dm.cobaltmicro.com From rdab100@hermes.cam.ac.uk Fri, 9 Oct 1998 02:47:35 -0400 Date: Fri, 9 Oct 1998 02:47:35 -0400 From: Dominic Baines rdab100@hermes.cam.ac.uk Subject: High availability? Bryan J. Welch wrote: > In looking at Beowulf as a new build server, some questions came up > recently about high availability. > > What happens to the cluster (and any jobs running) when a single machine > in the cluster fails? > > Is it possible to take down and upgrade/replace a single machine in the > cluster without taking down the entire cluster? > > How long has some of you kept you your cluster? Could we expect to keep > the cluster running for months at a time without taking it down? > > thanks, > > Bryan I believe a lot depends on what your cluster topology and structure is and how internode dependant your code is. One early cluster I built for my own amusement was a flat 8 node into an 8 port hub with each node having a full Slackware linux distribution installed on the HDD (so the box would operate independantly too!). LAM MPI, C code, NFS mounted /data from all nodes to one node as server with the DAT tape drive :-). Early on had one client node fail (memory) which didn't kill the cluster but as it was node 6 out of the 8 no new process would start on 7 or 8 until I had rebooted the LAM MPI software from the server node (perhaps my setup was screwy or I didn't boot it with fault tollerance at the time). However, I did let all the nodes other processes complete to save the data before rebooting and that included 7 and 8's !!. Once replaced memory haven't had the problem again. I believe it is possible to have just added the node 6 back again without using ' wipe -v lamhosts, lamboot -v lamhosts etc....' I was and still am a little new to the idosyncratic behaviour and syntax of LAM MPI. About 3 months later the secondary HDD (NFS /data mount) on the server failed. Result was whole cluster came down. I would have thought that just NFS timeouts would have occured and no writes would have occured until the server node was back but everything seemed to have frozen. (MPI problem ?) That's the hardware and software cluster part however, if your code is run so that it isn't node specific then I believe that the first scenario may not have caused major problems (unless important code was on that node for the others). the second would be more critical. Now if the nodes had been diskless then if the server fails you would obviously loose the lot. Single client node then possibly just that code until you added it once again. Trouble is I don't really know how LAM MPI would behave when trying to add or remove cluster nodes, or what would happen if the 'server' node was killed off, looks like some testing is in order. Has anyone done this ? Longest run so far is some 62 days but it was killed due to a site move and a suitable data checkpoint had been reached. Had Linux server on identical hardware run for at least double that before so can't see why you couldn't expect to have a cluster running for some time. Regards, Dominic Baines From srshanbh@cat.syr.edu Fri, 9 Oct 1998 04:25:21 -0400 Date: Fri, 9 Oct 1998 04:25:21 -0400 From: Sachin R. Shanbhag srshanbh@cat.syr.edu Subject: Distributed Shared Memory on Beowulf. Can anyone tell me if the kernel with beowulf tweaks implements support for Distributed Shared memory ? If so how can i utilize/test it ? Are there any applications that use this feature ? Any help in this regard will be appreciated. Sachin From caskey@technocage.com Fri, 9 Oct 1998 04:49:40 -0400 Date: Fri, 9 Oct 1998 04:49:40 -0400 From: Caskey L. Dickson caskey@technocage.com Subject: High availability? Linux == uptime On Fri, 9 Oct 1998, Dominic Baines wrote: > Bryan J. Welch wrote: > > > > In looking at Beowulf as a new build server, some questions came up > > recently about high availability. > > > > How long has some of you kept you your cluster? Could we expect to keep > > the cluster running for months at a time without taking it down? > > Longest run so far is some 62 days but it was killed due to a site move > and a suitable data checkpoint had been reached. Had Linux server on > identical hardware run for at least double that before so can't see why > you couldn't expect to have a cluster running for some time. These are our *current* run times of several of our primary servers. As you may note, 70 days ago this group moved to a new facility. All of them are linux, all of them are under heavy usage as servers. In fact the two busiest machines happen the first two in the list and have the longest uptimes. Interestingly enough, the first one is the test machine where we run buggy versions of the code that will be deployed upon the second. Dozens of revisions have gone through and not once have we had to reboot them. 1:42am up 213 days, 18:16, 2 users, load average: 2.37, 1.96, 1.74 1:35am up 145 days, 23:52, 0 users, load average: 1.08, 1.02, 1.01 1:36am up 128 days, 3:24, 0 users, load average: 1.31, 1.28, 1.12 6:51pm up 70 days, 3:08, 0 users, load average: 0.00, 0.00, 0.00 1:37am up 70 days, 2:08, 0 users, load average: 0.08, 0.02, 0.01 1:37am up 70 days, 2:03, 4 users, load average: 0.00, 0.00, 0.00 6:55pm up 70 days, 2:16, 0 users, load average: 0.00, 0.00, 0.00 1:38am up 70 days, 1:58, 6 users, load average: 0.00, 0.00, 0.03 1:41am up 69 days, 11:47, 1 user, load average: 0.08, 0.05, 0.02 (It would seem, however, that two of them are drastically mistaken as to the current time.) The only real issue is that duplication of hardware reduces MTBF. Even given that, long run times are not unlikely. C=) -------------------------------------------------------------------------- There is hardly a thing in the world that some man can not make a little worse and sell a little cheaper. -------------------------------------------------------------------------- Caskey /// pager.818.698.2306 TechnoCage Inc. ///| gpg: 1024D/7BBB1485 -------------------------------------------------------------------------- I didn't fight my way to the top of the food chain to be a vegetarian. From ymorin@enib.fr Fri, 9 Oct 1998 04:50:04 -0400 Date: Fri, 9 Oct 1998 04:50:04 -0400 From: Yann E. MORIN ymorin@enib.fr Subject: Setting up a beowulf... Hi! As I said in a former post, I'm trying to build a Linux beowulf at school. I got 4 PCs, an Extreme Linux CD, time for the thing, but what I miss is a good HOWTO, or documentation on how to setup the wholw thing. I'm new to all this stuff (PVM & MPI). Although I know what it means, I don't really know what software I need to run... The first try Im having is to run PVM POV-Ray, enclosed with E.L. CD. But I lack this knowledge of 'Message Passing' or 'Parallel Virtual Machine'... Any suggestion of book/www/else? Thanks, Yann. -- Yann E. MORIN --------------------------------------- Ecole Nationale d'Ingenieurs de Brest Laboratoire d'Informatique Industrielle Technopole Brest Iroise CP 15 - 29608 Brest Cedex - FRANCE --------------------------------------- tel : (+33) 298 056 600 (Ext 6318) fax : (+33) 298 056 629 email : bureau : ymorin@enib.fr perso : ymorin@france-mail.fr yann.morin@hol.fr --------------------------------------- From hjstein@bfr.co.il Fri, 9 Oct 1998 05:33:05 -0400 Date: Fri, 9 Oct 1998 05:33:05 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: High availability? Dominic Baines writes: > About 3 months later the secondary HDD (NFS /data mount) on the > server failed. Result was whole cluster came down. I would have > thought that just NFS timeouts would have occured and no writes > would have occured until the server node was back but everything > seemed to have frozen. (MPI problem ?) We have a cluster of alphas with data distribution & a couple of head machines as NFS backups for the data. We're not using MPI. We just had one of the head machines go offline with an ethernet driver bug & another machine's ethernet seized up when this happened. The logs showed RPC timeouts. Are your machines alphas? Are they headless? We've also had problems with logins hanging on headless machines because syslogd would hang. We think it's hanging when trying to write to the console because if we connect to the console we get some log messages & logins are ok again. Maybe there's an NFS bug? Has anyone else seen problems of this sort? -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From FENG@duvm.ocs.Drexel.edu Fri, 9 Oct 1998 06:48:08 -0400 Date: Fri, 9 Oct 1998 06:48:08 -0400 From: FENG@duvm.ocs.Drexel.edu FENG@duvm.ocs.Drexel.edu Subject: HUBS jobs To: All colleagues in the four states region From: Tom Harris, Ph.D. Divisional Manager and Vice President, SAIC Da Hsuan Feng, Ph.D. General Manager-HUBS, SAIC (on-leave) M. Russell Wehr Professor, Drexel University Greg Swarts Program Manager, SAIC HUBS PROJECT DESCRIPTION Science Applications International Corporation (SAIC) is leading an exciting information technology initiative know as "HUBS," which stands for Hospitals, Universities, Businesses, and Schools. The HUBS vision is most ambitious: to establish the world's first "Smart Region" in the Four State Region consisting of Delaware, Maryland, New Jersey, and Pennsylvania. HUBS plans to establish a powerful information technology (IT) environment in this Four State Region that will foster business growth, create new job opportunities, and improve the quality of life through telemedicine, health informatics, and technology-based educational programs. In order to realize these goals, SAIC is developing a suite of advanced information technology applications including data fusion, data warehousing, data mining, distributed modeling and simulation, telemedicine, distance learning, and collaborative design and engineering. The HUBS software platforms will provide linkage across existing networks, and enable collaborations among common and complementary communities throughout the region. The HUBS development is sponsored by a number of federal agencies, and the initiative enjoys broad Congressional support. SAIC was awarded two HUBS contracts in 1998, one from the Department of Education and another from the Defense Advanced Research Projects Agency (DARPA). The DARPA contract is one element of the Next Generation Internet program. The HUBS Team is actively seeking a significant number of talented and motivated technologists to participate in this fascinating and technically challenging project. We believe that HUBS offers unusual career growth opportunities for those interested in the development and integration of advanced software/middleware solutions. Though the HUBS initiative, SAIC has an opportunity to impact many community segments using advanced networking capabilities and applications. We look forward to hearing from you if you feel you have the "right stuff" to assist SAIC in achieving the ambitious goals ahead of us. For contact: Please send email to Dr. Da Hsuan Feng at (feng@duvm.ocs.drexel.edu) or (fengd@saic.com) or call 610-687-4440 and ask for Greg Swarts. Thank you very much for your attention in this matter. From hjstein@bfr.co.il Fri, 9 Oct 1998 07:20:06 -0400 Date: Fri, 9 Oct 1998 07:20:06 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: High availability? Linux == uptime "Caskey L. Dickson" writes: > These are our *current* run times of several of our primary > servers. > 1:42am up 213 days, 18:16, 2 users, load average: 2.37, 1.96, 1.74 > 1:35am up 145 days, 23:52, 0 users, load average: 1.08, 1.02, 1.01 > 1:36am up 128 days, 3:24, 0 users, load average: 1.31, 1.28, 1.12 > 6:51pm up 70 days, 3:08, 0 users, load average: 0.00, 0.00, 0.00 > 1:37am up 70 days, 2:08, 0 users, load average: 0.08, 0.02, 0.01 > 1:37am up 70 days, 2:03, 4 users, load average: 0.00, 0.00, 0.00 > 6:55pm up 70 days, 2:16, 0 users, load average: 0.00, 0.00, 0.00 > 1:38am up 70 days, 1:58, 6 users, load average: 0.00, 0.00, 0.03 > 1:41am up 69 days, 11:47, 1 user, load average: 0.08, 0.05, 0.02 On what hardware is this? If alpha, what kernel, libs, etc, are you using? -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From lindahl@cs.virginia.edu Fri, 9 Oct 1998 10:09:51 -0400 Date: Fri, 9 Oct 1998 10:09:51 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: High availability? > In looking at Beowulf as a new build server, some questions came up > recently about high availability. > > What happens to the cluster (and any jobs running) when a single machine > in the cluster fails? It all depends on how you have set things up. Best case, you share as little as possible (i.e. no huge NFS cross-mounts), and only the jobs on that one machine are clobbered. Then you can reconfigure the cluster and move on. Worst case, you have one machine serving out all the files and it dies. Or, you have inflexible software so that you are screwed if you suddenly have N-1 nodes instead of N nodes. In general, the hardware is all independent, so you can turn off and physically remove one machine without affecting others. The "cluster in box" doesn't have this property. > Is it possible to take down and upgrade/replace a single machine in the > cluster without taking down the entire cluster? Sure, see above. > How long has some of you kept you your cluster? Could we expect to keep > the cluster running for months at a time without taking it down? My cluster runs for months without any failure. -- g From lindahl@cs.virginia.edu Fri, 9 Oct 1998 10:17:56 -0400 Date: Fri, 9 Oct 1998 10:17:56 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: High availability? > We have a cluster of alphas with data distribution & a couple of head > machines as NFS backups for the data. We're not using MPI. We just > had one of the head machines go offline with an ethernet driver bug & > another machine's ethernet seized up when this happened. The logs > showed RPC timeouts. You didn't say anything about what driver version, what hardware, etc. I saw an ethernet driver screw up once; it was tulip.c 0.89H, and I had attempted to use some defines to set the speed. When I don't do that, it works fine, except that on my particular hardware, it's well known that it won't deal with a fixed speed hub. > Are your machines alphas? Are they headless? We've also had problems > with logins hanging on headless machines because syslogd would hang. > We think it's hanging when trying to write to the console because if > we connect to the console we get some log messages & logins are ok > again. That's a broken implementation of a headless console. Stuff gets sent to the console on an Alpha all the time; every time a program segfaults, for example. It should be simple for you to test this theory by inciting the kernel to print stuff to the console, and seeing if that hangs the system. > Maybe there's an NFS bug? Has anyone else seen problems of this sort? On my pile of machines, the only NFS bugs I've seen are: 1) NFS timeouts jump the clock by about 10ms 2) Occasionally the NFS cache gets really screwed up, probably aided and abetted by (1). BTW, I think this might be a discussion better held on an alpha mailing list, and not a cluster computing mailing list. -- g From rgb@phy.duke.edu Fri, 9 Oct 1998 12:17:23 -0400 Date: Fri, 9 Oct 1998 12:17:23 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: High availability? On Fri, 9 Oct 1998, Dominic Baines wrote: > Bryan J. Welch wrote: > > > In looking at Beowulf as a new build server, some questions came up > > recently about high availability. > > > > What happens to the cluster (and any jobs running) when a single machine > > in the cluster fails? > > > > Is it possible to take down and upgrade/replace a single machine in the > > cluster without taking down the entire cluster? > > > > How long has some of you kept you your cluster? Could we expect to keep > > the cluster running for months at a time without taking it down? > > > > thanks, I lot of it depends on design. Some designs are more robust than others. For example, a beowulf built on a fast ethernet switch, with local binaries on each node and a server node with a RAID (or dual server nodes with one active and the other on fallback/standby) is pretty darn fault tolerant. If a node goes down, the other nodes should not crash; if the server goes down then whether or not the nodes recover depends on the reason -- if you shut it down for maintenance it will probably come back up and continue where it left off (possibly blocking NFS writes and node processes in the meantime) but if it crashes and you have to rewrite the disk of course all the inode mappings will change and your NFS clients will very likely hang with a "stale NFS mount" message. If you use a hypernet of any sort (where each node is connected to N other nodes via crossover cable) and still have local binaries and swap, you will probably be less stable unless you really work on building a robust and dynamic routing mechanism. I would guess that making "each" node a real router with protocols for establishing dynamic routes would make it moderately fault tolerant but at a high systems cost. If you instead built static routes with a watchdog process that detected a failed node and rebuilt routes around it and used a "reliable" transport protocol (TCP, or a reliable UDP in e.g. NFS or PVM) you could achieve pretty high stability but obviously you have to really know what you are doing. A switch is cheap and easy, so this is for experts (or folks with appropriate problems who are forced to become experts:-) only. All designs will be less stable (and often a bit slower) without local binaries and swap. If stability is your prime directive, there are also lots of things you can do to increase the stability of any given topology or overall design, but they all involve costs in real dollar amounts and performance. The stability of a beowulf or cluster WRT to the survivability of the underlying parallel application is similarly very problem-specific. If you are running mostly coarse-grained parallel applications, you may lose very little work if a node goes down. If you run very fine grained parallel applications (where every node carries critical data required by all other nodes) then you may lose all the work since "the epoch" if a node goes down. In that case you have a simple cost/benefit optimization to perform. You have to calculate or estimate probability of failure of a node, set a "value" for (or cost of) the time lost in the event of such a failure, and establish the cost (in time) of adding, e.g. -- a periodic save of the entire state of the machine to disk or some other mechanism that decreases the expected value of the overall cost function. I'll assume that you can work this out, but it is obviously a very important step of robust parallel design that you WILL TAKE the first time you try to run a single application for a time commensurate with the the expected uptime of a host divided by the number of hosts (where the probability of failure is unacceptably high). So I would argue that for most applications, a beowulf is stable and reliable, and that the fault tolerance of ANY beowulf depends more on the design goals and optimax criteria of the designers and the amount of work they are willing to put in to achieve them. I think that it is possible to build a 'wulf that is awesomely stable, even for fine grained code, although there will be a very definite tradeoff in dollar cost and overall speed for the stability. I (and I suspect most persons) sort of automatically optimize all this to our personal comfort leven borne of experience, but if you are talking a commercial application or even a big dollar project you can make a lot of this rigorous. Finally, I routine run our cluster for months at a time without a single host crashing. linux (even SMP linux) is awesomely stable at this point. I honestly think that it is the most stable Unixoid OS I've ever used (in 12 odd years I've used the offerings of nearly all the big vendors at least once). Even with e.g. development/beta drivers for networking cards and SCSI adapters in place that wouldn't survive a reboot without my personally tickling the host (a problem that is nicely fixed now, thank you Doug Ledford:-) our Dell Poweredge 2300's would stay up for months. At this moment I have one host that has been up 43 days, most of that with a crashed automount daemon (most annoying but not fatal). I'm waiting on a guy who has been running a calculation that WHOLE TIME: rgb@ganesh|T:142>>b10 ps -ruaxww Could not chdir to home directory /home/einstein/rgb: I/O error /usr/X11/bin/xauth: error in locking authority file /home/einstein/rgb/.Xauthority USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND poeschl 3143 99.9 78.9 406764 406224 ? R Aug 30 57321:01 qgp rgb 1225 0.0 0.0 856 328 ? R 12:04 0:00 ps -ruaxww 57,000 minutes of 100% CPU and 80% memory consumption is not bad 'tall, but I really wish his job would end so I could reboot with the newest aic7xxx drivers and at the same tim fix the amd:-). I have LOTS of hosts with SMP kernels that have been up 40-50 days, and the only reason the number is so small is that a 46 days ago or thereabouts I finally shut them down (most with ~100 days of uptime) to upgrade 2.0.33->2.0.35. That is, scheduled maintenace is about the only way these systems go down, even under heavy load. Awesome. Now, YMMV so be careful. I have fairly carefully selected hardware (e.g. motherboards, disks, SVGA cards) known to be problem free, and have had to work very hard in a few cases to stabilize things with particular hardware combos even then. Even now there are a FEW folks who complain about regular intermittant server crashes, for example, probably caused by something peculiar to their particular system that tweaks what may well be one of the several remaining bugs in the kernel. In others, it is just bad hardware (not compatible with the driver selected due to OEM variation in the underlying device or the like). The rule here is to PROTOTYPE before going out and buying a whole slew of systems to build a beowulf. Some motherboards have great reputations -- ASUS and SuperMicro tend to be nearly trouble free, for example. Others sometimes work great but are finicky -- Tyan's in particular are reputed to be very particular about what they will accept as memory (and as CPU speeds keep cranking up several other mobos are also getting a bit finicky in this regard). Still others are not common enough to have a reputation one way or another -- they may be just fine, they may be trouble. The same is true of disks, SCSI controllers, networking cards, and SVGA cards, and sometimes it isn't the hardware but the combination -- two PCI devices are just buggy enough that they collide and create stability problems for linux (or probably WinXX, but who could tell in a system with a mean time between crashes measured in days anyway:-). In conclusion, I wouldn't hesitate to make a beowulf a "build server: (I assume that you mean to distribute a very large make for a huge application). The make is likely to take at most hours (probably FAR less in massive parallel). The MTBF is measured in months/host. It isn't even worth it to try to make it any more fault tolerant. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rdab100@hermes.cam.ac.uk Fri, 9 Oct 1998 12:52:08 -0400 Date: Fri, 9 Oct 1998 12:52:08 -0400 From: Dominic Baines rdab100@hermes.cam.ac.uk Subject: High availability? Harvey J. Stein wrote: > Dominic Baines writes: > > > About 3 months later the secondary HDD (NFS /data mount) on the > > server failed. Result was whole cluster came down. I would have > > thought that just NFS timeouts would have occured and no writes > > would have occured until the server node was back but everything > > seemed to have frozen. (MPI problem ?) > > We have a cluster of alphas with data distribution & a couple of head > machines as NFS backups for the data. We're not using MPI. We just > had one of the head machines go offline with an ethernet driver bug & > another machine's ethernet seized up when this happened. The logs > showed RPC timeouts. > > Are your machines alphas? I wish .... No this particular cluster was 8 nodes each 64MB & Cyrix 6x86MX 233 CPU's > Are they headless? Onboard SVGA but no monitors or keyboards and BIOS setup to ignore errors from either. > We've also had problems > with logins hanging on headless machines because syslogd would hang. > We think it's hanging when trying to write to the console because if > we connect to the console we get some log messages & logins are ok > again. > > Maybe there's an NFS bug? Has anyone else seen problems of this sort? Quick test on another cluster, not being used .... rebooted the server and once againthe cluster died. I am going to have to look at MPI booting to see if this is the problem. I don't know what would cause the problem you descibe. > - > Harvey J. Stein > BFM Financial Research > hjstein@bfr.co.il Regards, Dominic Baines From bryan@bighorn.dr.lucent.com Fri, 9 Oct 1998 15:41:00 -0400 Date: Fri, 9 Oct 1998 15:41:00 -0400 From: Bryan J. Welch bryan@bighorn.dr.lucent.com Subject: High availability? Thanks to everyone for the great info on high availability. The next consideration in build servers is how to achieve fast access to large amounts of disk space. I'm wondering if we'd lose that much with NFS access to some RAID servers, or if that's expensive and local disk is best. Maybe have one slave node with tons of disk? (We're talking a few hundred GB for large builds.) We talked to Altatech this morning, and I'm getting a bunch of info from them soon on pre-made Beowulf clusters. If people would like, I'll post some of the Q & A we had with them here. -bjw -- Bryan Welch - Bell Labs - bjwelch@bell-labs.com - N0SFG - PP-ASEL From sws@lacasa.com Fri, 9 Oct 1998 17:05:41 -0400 Date: Fri, 9 Oct 1998 17:05:41 -0400 From: Steven Schramm sws@lacasa.com Subject: High availability? Watch out for NFS performance. This has raised it's ugly head for us. Some configurations work well, others quite poorly (most notably with a Sun Slowaris 2.6 NFS server). Steven Schramm Bryan J. Welch wrote: > Thanks to everyone for the great info on high availability. > > The next consideration in build servers is how to achieve fast access to > large amounts of disk space. I'm wondering if we'd lose that much with > NFS access to some RAID servers, or if that's expensive and local disk is > best. Maybe have one slave node with tons of disk? (We're talking a few > hundred GB for large builds.) > > We talked to Altatech this morning, and I'm getting a bunch of info from > them soon on pre-made Beowulf clusters. If people would like, I'll post > some of the Q & A we had with them here. > > -bjw > > -- > Bryan Welch - Bell Labs - bjwelch@bell-labs.com - N0SFG - PP-ASEL -- Steven W. Schramm | sws@lacasa.com CASA, Inc. | Ph : (505) 662-6820 x137 Los Alamos, NM 87544 | Fax: (505) 662-0095 From hjstein@bfr.co.il Fri, 9 Oct 1998 18:08:18 -0400 Date: Fri, 9 Oct 1998 18:08:18 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: High availability? "Bryan J. Welch" writes: > Thanks to everyone for the great info on high availability. > > The next consideration in build servers is how to achieve fast access to > large amounts of disk space. I'm wondering if we'd lose that much with > NFS access to some RAID servers, or if that's expensive and local disk is > best. Maybe have one slave node with tons of disk? (We're talking a few > hundred GB for large builds.) You should be able to get ~10mb/sec from a SCSI disk. You should be able to get ~10mb/sec over a 100mb/s ethernet port. Unless you use higher speed networking, you'll ether get 10mb/sec on each machine locally = total throughput of 10mb/sec * # of machines, or get a total of 10mb/sec from the NFS server. You could have an NFS server with multiple interfaces on a switched 100mb/s ethernet hub, in which case it'd be 10mb/sec * # of NFS server ports. But then you need a very fast NFS server to support the full data bandwidth. On the other hand, if you use local disks everywhere you might run into data distribution & syncronization problems. -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From caskey@technocage.com Fri, 9 Oct 1998 18:34:41 -0400 Date: Fri, 9 Oct 1998 18:34:41 -0400 From: Caskey L. Dickson caskey@technocage.com Subject: High availability? On Fri, 9 Oct 1998, Bryan J. Welch wrote: > Thanks to everyone for the great info on high availability. > > The next consideration in build servers is how to achieve fast access to > large amounts of disk space. I'm wondering if we'd lose that much with > NFS access to some RAID servers, or if that's expensive and local disk is > best. Maybe have one slave node with tons of disk? (We're talking a few > hundred GB for large builds.) Having local storage would require that you write the code to manage distribution of the information to each node for processing the build. As for RAID storage, Promise Technology Inc. has come out with a product I'm eager to get. It's an external RAID enclosure that presents a single Fast/Wide SCSI drive to the host (i.e. all raid operations are encapsulated) but unlike traditional units of this type, the internal drive chain is ATA/EIDE drives. 10 or 5 depending upon the enclosure. They say you can get 48GB of usable raid 5 storage for ~$10K. The ultimate in commodity storage... if it works. If you use something like CODA to distribute the data to the worker nodes you should have better performance than with vanilla NFS. I don't know what the status of CODA on Alpha is, however. You mentioned Altatech in your message. As their systems have small drives dedicated to each board, you could probably get some pretty mean performance with CODA + a separate dedicated server with the Promise array attached. C=) -------------------------------------------------------------------------- There is hardly a thing in the world that some man can not make a little worse and sell a little cheaper. -------------------------------------------------------------------------- Caskey /// pager.818.698.2306 TechnoCage Inc. ///| gpg: 1024D/7BBB1485 -------------------------------------------------------------------------- I didn't fight my way to the top of the food chain to be a vegetarian. From cbohn@afit.af.mil Sat, 10 Oct 1998 16:18:07 -0400 Date: Sat, 10 Oct 1998 16:18:07 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: g77 problems with Extreme Linux Good day, Here's the problem du jour... I've now resolved the double-underscore problem (with -fno-second-underscore), and the latest problem appears to be library-related, except that I don't run into the problem with C code. For thoroughness' sake, I tried linking with -lmpi, with -lfmpi, with -lmpi -lfmpi, and with -lfmpi -lmpi . Here's what I get: [cbohn@abc04 cbohn]$ make pi3 g77 -fno-second-underscore -I/usr/mpich/include -c pi3.f g77 -o pi3 pi3.o -L/usr/mpich/lib/LINUX/ch_p4 -lmpi /usr/mpich/lib/LINUX/ch_p4/libmpi.a(initf.o): In function `mpi_init_': initf.o(.text+0x11): undefined reference to `mpir_iargc_' initf.o(.text+0x138): undefined reference to `mpir_getarg_' collect2: ld returned 1 exit status make: *** [pi3] Error 1 [cbohn@abc04 ch_p4]$ nm libmpi.a | grep mpir U mpir_getarg_ U mpir_iargc_ mpirutil.o: Any suggestions? Thanks. cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* From hjstein@bfr.co.il Sun, 11 Oct 1998 07:58:56 -0400 Date: Sun, 11 Oct 1998 07:58:56 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: PVM robustness problems. Greg Lindahl writes: > Harvey Stein writes: > > Problem 1 - Long timeouts/slow work. > > > > Sometimes a slave will die & it seems like it takes a long time for > > the set of tasks to shut down - as long as an hour. Also, sometimes > > a given request will take much longer than expected. > > > > Has anyone else seen such problems? Does anyone have any ideas/know > > of any mechanisms for tracking down such problems? > > > PVM seems to figure out deaths based on TCP socket closures. As > everyone should know, TCP is really bad at figuring out the other > end is gone. PVM should be sending keepalives, but it doesn't. > Does your application only infrequently send messages? If so, sure, > you can go a long time before a message failing to be received > causes a link to close. You should get better behavior if the > master sends keepalives (a message that the slave can simply > discard) to the slaves every N minutes. The slaves should notice > that the master is dead when they send their results back. Actually, that's not exactly the case. PVM daemons communicate with each other via UDP & the PVM daemon on a machine communicates with tasks on the same machine via TCP or via unix domain sockets. The PVM daemons estimate the round trip time to the other daemons. It initially resends packets after 3x the estimated round trip time has elapsed without an acknowlegement being received. It doubles the retry wait for each additional retry, up to 18 seconds. The round trip time estimate itself is limited to 9 seconds. It'll retry at least 9 times before giving up & if it doesn't receive an ack after 3 minutes it considers the other daemon to be unreachable & calls hostfailentry(). (The above is from section 7.5.2 of "PVM Parallel Virtual Machine" by Al Geist et al) On the other hand, I've yet to find a discussion of exactly how PVM decides if a task itself has gone down. It may be via socket closures, as you hypothesized. BTW, it was during frequent communications that the problem was occurring, so I would have expected PVM to notice the problem faster than it was. In any case, we seem to have traced the above problem down to the default Red Hat crontab config. Cron was cleaning out /tmp but PVM keeps some files in /tmp & will get flakey if they're deleted. **** WARNING TO ALL USING PVM: **** Check your cron jobs. If you're running tmpwatch on /tmp, you'd better touch -c /tmp/pvm* before it runs so that it doesn't delete the files that PVM needs. -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From hjstein@bfr.co.il Sun, 11 Oct 1998 08:51:06 -0400 Date: Sun, 11 Oct 1998 08:51:06 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: Swap Size Greg Lindahl writes: > > I agree with this. I like the program to be terminated (if there is not > > enough memory in VM) instead of being alive but very very very very slow > > (or almost equivalent to unworkable state :< ). The program should return > > and say "Not enough memory!!" instead of keep swapping. Thank you. > > Unfortunately, it is difficult to arrange for the correct program to > be killed. Often you will instead lose some other process like inetd > or portmapper or amd. So it is better to try to set limits on per- > process memory usage and let that kill the overly-large task. This > won't reflect the usage of other tasks, unfortunately, but at least it > won't cause your system to become unusuable. Actually, Rik van Riel has a kernel patch for selecting an appropriate task to kill when out of memory. He uses a heuristic based on whether or not suid, whether or not root run, elapsed run time vs elapsed cpu time vs memory usage, etc. It's reported to work quite well. You can get it at: http://www.phys.uu.nl/~riel/patches/OOM-kill-2.1.124.patch He also has a scheduler patch which is reported to improve process scheduling under high load: http://www.phys.uu.nl/~riel/patches/schedule-2.1.123.patch I'm not running into either problem so I haven't tried either one. YMMV. -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From rauch@inf.ethz.ch Mon, 12 Oct 1998 04:35:56 -0400 Date: Mon, 12 Oct 1998 04:35:56 -0400 From: Felix Rauch rauch@inf.ethz.ch Subject: High availability? On 10 Oct 1998, Harvey J. Stein wrote: > You should be able to get ~10mb/sec from a SCSI disk. True. With Seagate ST34501W (Cheetah), we get about 12.5 MB/s for large reads. > You should be able to get ~10mb/sec over a 100mb/s ethernet port. In theory (or with TCP) yes. With NFS, I never got more than about 7.5 -- 8.0 MB/s max. I'd be very interested if someone has better performance with Linux and NFS. - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H15 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From hjstein@bfr.co.il Mon, 12 Oct 1998 08:28:44 -0400 Date: Mon, 12 Oct 1998 08:28:44 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: High availability? Felix Rauch writes: > On 10 Oct 1998, Harvey J. Stein wrote: > > You should be able to get ~10mb/sec from a SCSI disk. > > True. With Seagate ST34501W (Cheetah), we get about 12.5 MB/s for > large reads. I've seen 13.5-13.8 for direct device reading (i.e. - hdparm -Tt, not through the file system). > > You should be able to get ~10mb/sec over a 100mb/s ethernet port. > > In theory (or with TCP) yes. With NFS, I never got more than > about 7.5 -- 8.0 MB/s max. I'd be very interested if someone > has better performance with Linux and NFS. I meant under the best possible conditions != NFS, for example ftp. Of course, that's actually an unfair comparison because people typically don't count all the connection building & teardown time when doing ftp timings. However, when you got 7.5 - 8.0, did you do any NFS tuning in an attempt to speed it up. For example, NFS caching & 8k packets? -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From hahn@coffee.psychology.mcmaster.ca Mon, 12 Oct 1998 11:02:51 -0400 Date: Mon, 12 Oct 1998 11:02:51 -0400 From: Mark Hahn hahn@coffee.psychology.mcmaster.ca Subject: High availability? > > > You should be able to get ~10mb/sec from a SCSI disk. > > > > True. With Seagate ST34501W (Cheetah), we get about 12.5 MB/s for > > large reads. > > I've seen 13.5-13.8 for direct device reading (i.e. - hdparm -Tt, not > through the file system). um, 14 MB/s on a Cheetah is pretty poor. I've seen >15 MB/s, and that was on an older model, and through the filesystem (aic7895, 4k ext2, PII/333). IBM Ultrastar/ZX's should do as well. further, good (cheap/big/modern/cool) ide disks such as the Maxtor DiamondMax's, and IBM Deskstar's, will sustain well over 13 MB/s. and remember that DMA/UDMA is NOT less efficient than SCSI. regards, mark hahn. From josip@icase.edu Mon, 12 Oct 1998 13:09:37 -0400 Date: Mon, 12 Oct 1998 13:09:37 -0400 From: Josip Loncaric josip@icase.edu Subject: High availability? Bryan J. Welch wrote: > > The next consideration in build servers is how to achieve fast access to > large amounts of disk space. I'm wondering if we'd lose that much with > NFS access to some RAID servers, or if that's expensive and local disk is > best. Maybe have one slave node with tons of disk? (We're talking a few > hundred GB for large builds.) The price/performance bottleneck inherent in the (processor farm)===(disk farm) approach can be a serious limitation, and in fact the original Beowulf was in part motivated by the desire to bypass this problem. There is a paper by Sterling, Becker, Warren, Cwik, Salmon and Nitzberg: "Assessment of Beowulf-class Computing for NASA Requirements: Initial Findings from the First NASA Workshop on Beowulf-class Clustered Computing" where they point out the following: "The Beowulf project was initiated in late 1993 as a part of the NASA HPCC Earth and space sciences at the Goddard Space Flight Center. [...] An evaluation of the requirements for a scientific station for NASA showed that disk access capacity and bandwidth was far more important to user response time than was floating point performance. [...] Analysis showed that for the price of a high-end scientific workstation, assumed to be $50,000, a cluster of low cost PCs could be assembled with an order of magnitude larger disk capacity and approximately 8 times the disk bandwidth. [...] The Beowulf project was born." Earth and space sciences need truly massive amounts of data, and given a price constraint, multiple direct procesor-disk connections can offer better aggregate performance than the (processor farm)===(disk farm) approach. Of course, this need not work for you unless those large builds can be done in parallel. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist ICASE, M/S 403, NASA Langley Research Center, Hampton, VA 23681-2199 Phone: (757) 864-2192 mailto:josip@icase.edu Fax: (757) 864-6134 http://www.icase.edu/~josip/ From richieb@netlabs.net Mon, 12 Oct 1998 21:44:28 -0400 Date: Mon, 12 Oct 1998 21:44:28 -0400 From: Richie Bielak richieb@netlabs.net Subject: Problems setting up a small cluster Hi, I'm trying to set up a tiny cluster or two older P120 machines. I installed Extreme Linux on both and they both have the 3c509 network cards. I connect them via 10baseT LinkSys hub. For some reason the machines do not see each other and I can't figure out why. Here is output of "ifconfig": [root@ki /root]# ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 RX packets:63 errors:0 dropped:0 overruns:0 TX packets:63 errors:0 dropped:0 overruns:0 eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 TX packets:0 errors:0 dropped:0 overruns:0 Interrupt:10 Base address:0x300 At startup the cards are recognized: eth0: 3c509 at 0x300 tag 1, 10baseT port, address 00:20:af:13:b2:f9, IRQ 10. 3c509.c:1.16C 2/5/98 becker@cesdis.gsfc.nasa.gov Swansea University Computer Society IPX 0.34 for NET3.035 The only odd thing is that when I cat the network devices I see this: [root@ki /root]# cat /proc/net/dev Inter-| Receive | Transmit face |packets errs drop fifo frame|packets errs drop fifo colls carrier lo: 95 0 0 0 0 95 0 0 0 0 0 eth0: 0 0 0 0 0 0 0 0 0 0 126 For some reason the number under carrier for eth0 not zero. It goes up after each time I try to ping. Finally ARP tables have no entries. When I enter these manually it doesn't seem to help. Also, there are no IRQ conflicts: [root@ki /root]# cat /proc/interrupts 0: 202649 timer 1: 918 keyboard 2: 0 cascade 4: 4 + serial 8: 1 + rtc 10: 0 3c509 13: 1 + IPI 14: 15873 + ide0 15: 0 + ide1 Could this mean bad hardware? How can I test it? Thanks in advance for any suggestions. ...richie -- "It is a good day to code." http://www.netlabs.net/~richieb From rauch@inf.ethz.ch Tue, 13 Oct 1998 04:05:10 -0400 Date: Tue, 13 Oct 1998 04:05:10 -0400 From: Felix Rauch rauch@inf.ethz.ch Subject: High availability? On 12 Oct 1998, Harvey J. Stein wrote: > However, when you got 7.5 - 8.0, did you do any NFS tuning in an > attempt to speed it up. For example, NFS caching & 8k packets? I used the options "-o rsize=4096,wsize=4096" as well as "-o rsize=8192,wsize=8192" (there results are usually nearly the same). I didn't use any other options and read a file of 32 MB multiple times with the file in the cache of the server. Should I use other options to speed it up? Are there any other experiences? Regards, Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H15 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From hjstein@bfr.co.il Tue, 13 Oct 1998 05:30:14 -0400 Date: Tue, 13 Oct 1998 05:30:14 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: High availability? Felix Rauch writes: > On 12 Oct 1998, Harvey J. Stein wrote: > > However, when you got 7.5 - 8.0, did you do any NFS tuning in an > > attempt to speed it up. For example, NFS caching & 8k packets? > > I used the options "-o rsize=4096,wsize=4096" as well > as "-o rsize=8192,wsize=8192" (there results are usually nearly the > same). I didn't use any other options and read a file of 32 MB > multiple times with the file in the cache of the server. > > Should I use other options to speed it up? Are there any other > experiences? 7.5 - 8.0 might be as good as it gets... I've never actually mucked with it, but I've heard of people doing so. It took a lot of digging around but I finally found some info on it in the NFS Howto in Section 4.2, Optimizing NFS (http://sunsite.unc.edu/LDP/HOWTO/NFS-HOWTO-4.html): Newer Linux kernels (since 1.3 sometime) perform read-ahead for rsizes larger or equal to the machine page size. On Intel CPUs the page size is 4096 bytes. Read ahead will significantly increase the NFS read performance. So on a Intel machine you will want 4096 byte rsize if at all possible. Remember to edit /etc/fstab to reflect the rsize/wsize you found. A trick to increase NFS write performance is to disable synchronous writes on the server. The NFS specification states that NFS write requests shall not be considered finished before the data written is on a non-volatile medium (normally the disk). This restricts the write performance somewhat, asynchronous writes will speed NFS writes up. The Linux nfsd has never done synchronous writes since the Linux file system implementation does not lend itself to this, but on non-Linux servers you can increase the performance this way with this in your exports file: /dir -async,access=linuxbox or something similar. Please refer to the exports man page on the machine in question. Please note that this increases the risk of data loss. It might be the case that linux now supports the above. This doc hasn't been updated since last November. Checking dejanews (searching for linux nfs performance) yielded a comment to use the kernel daemon in 2.1.125 + posted patches to improve NFS performance. Also it seems to be important to use the knfsd-980922 kernel patch (against 2.1.111 or newer). ftp://ftp.yggdrasil.com/private/hjl/knfsd-980922.tar.gz ftp://ftp.yggdrasil.com/private/hjl/knfsd-980920-980922.diff.gz ftp://ftp.kernel.org/pub/linux/devel/gcc/knfsd-980922.tar.gz ftp://ftp.kernel.org/pub/linux/devel/gcc/knfsd-980920-980922.diff.gz The author of the above patch (H.J. Lu) also mentioned that the user space NFS server "basically does async write". It seems that the kernel nfs server doesn't support that without the knfsd-980922 patch. One person reported the following sorts of performance (with a linux NFS server vs an SGI): Read a 30MB file from server to AIX 4.3 workstation: >From linux-2.1.122: about 45 sec >From IRIX 5.3: about 45-50 sec Both with a stopwatch, good enough if you look at the next results: Write a 30MB file from AIX to server, includes 30sec of data preparation (no network traffic) by Catia: To linux-2.1.122: about 585 sec To IRIX 5.3: about 90 sec 2. Test: copy 30MB from IRIX client to linux server and back: cp cptest.txt /achilles/catv4 : Write to linux = 385 sec cp /achilles/catv4/cptest.txt . : Read from linux = 42 sec Network is 10MBit Ethernet TP, low traffic. He also said his write performance went up to match the SGIs when he used the async option. Again, this is just by research, not by trying it all out, so we'll all be happy to hear about what you manage to achieve in the end. -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From sws@lacasa.com Tue, 13 Oct 1998 10:36:09 -0400 Date: Tue, 13 Oct 1998 10:36:09 -0400 From: Steven Schramm sws@lacasa.com Subject: High availability? Caskey L. Dickson wrote: > As for RAID storage, Promise Technology Inc. has > come out with a product I'm eager to get. It's an external RAID enclosure > that presents a single Fast/Wide SCSI drive to the host (i.e. all raid > operations are encapsulated) but unlike traditional units of this type, > the internal drive chain is ATA/EIDE drives. 10 or 5 depending upon the > enclosure. They say you can get 48GB of usable raid 5 storage for ~$10K. > > The ultimate in commodity storage... if it works. > I have had many dealings with ISS, Corp., a high-end mass-storage VAR, which customizes RAID and tape libraries. We just ordered another RAID box from them which will give us just over 100G of useable RAID 5, with: - redundant, hot-swap power supplies - hot-swap disk trays - 128M cache - 9-bay, all steel Kingston enclosure - 18G UW SCSI disk drives all for ~$18k. And their support is excellent, should you ever need it. FWIW, Steve -- Steven W. Schramm | sws@lacasa.com CASA, Inc. | Ph : (505) 662-6820 x137 Los Alamos, NM 87544 | Fax: (505) 662-0095 From sws@lacasa.com Tue, 13 Oct 1998 10:41:40 -0400 Date: Tue, 13 Oct 1998 10:41:40 -0400 From: Steven Schramm sws@lacasa.com Subject: I.S.S. Corporate Home Page Sorry, here's the link. http://www.issc.net/ Steve From prachya@science.gmu.edu Tue, 13 Oct 1998 11:26:43 -0400 Date: Tue, 13 Oct 1998 11:26:43 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Beowulf Seeds Hi, I know that there are hundred ways to create Beowulf clusters. Are there any generalized "Beowulf Seeds" that one can download and simply copy to a boot drive of the beowulf nodes to create a Beowulf cluster? I have my own "seed" of about 30MB (slakware base) and would like to know how others manage these kind of "Beowulf Seeds" or would like to post them somewhere for other newbies to share. I'm not sure this is redundant to the extreme linux CD but it should be quite straight forward to do (and explain). By "Beowulf Seeds" I mean the kernel image + linux tree that we can tar and compress to about 20-40 MB for Beowulf newbies to download and it should be PVM-ready, MPI-ready, bview-ready, etc. Installing RedHat 5.1 (base) requires about 100MB while Slakware 3.5 requires only about 30-40 MB. By tailoring these base Linux system for PVM, MPI, BSP,..., and installing some adminitrative and monitoring tools this Linux system becomes the "Beowulf Seed" that I am talking about. The standard network configuration as recommnended by lots of experts can also be configured and standardized. For example, a common way, NODE IP NOTE ------------------------------------------------------ N1 192.168.1.1 Master node xxx.xxx.xxx.xxx Address for out side network N2 192.168.1.2 Node 2 N3 192.168.1.3 Node 3 ..... The /etc/fstab can also be generalized to mount /home and/or /usr from the master node. Thanks, --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From bug@ruff.cs.jmu.edu Tue, 13 Oct 1998 11:33:27 -0400 Date: Tue, 13 Oct 1998 11:33:27 -0400 From: David Wilburn bug@ruff.cs.jmu.edu Subject: Problems setting up a small cluster Perhaps this is a silly question, but then again most difficult problems I have are something similarly solved anyways: have you run route to let it see the computers it shares a subnet with, or maybe run routed? Ifconfig just gives you an unloaded gun, route/routed/gated/etc loads the sucker. My guess is that a route statement that would work for you would be the following (dunno if you have a router to get outside the network or not, that would be a separate route statement): route add -net 192.9.42.0 netmask 255.255.255.0 eth0 If you need more information, the NET-3-HOWTO is probably the best written howto I've ever seen from the Linux Documentation Project: http://sunsite.unc.edu/LDP/ldp.html -Dave Wilburn * David Wilburn, a.k.a. "Bug" * JMU Computer Science Student * Boycott naugahyde! Save the naugas! On Mon, 12 Oct 1998, Richie Bielak wrote: > Hi, > > I'm trying to set up a tiny cluster or two older P120 machines. > I installed Extreme Linux on both and they both have the > 3c509 network cards. I connect them via 10baseT LinkSys > hub. > > For some reason the machines do not see each other and I can't > figure out why. Here is output of "ifconfig": > > > [root@ki /root]# ifconfig > lo Link encap:Local Loopback > inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 > UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 > RX packets:63 errors:0 dropped:0 overruns:0 > TX packets:63 errors:0 dropped:0 overruns:0 > > eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 > inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 > TX packets:0 errors:0 dropped:0 overruns:0 > Interrupt:10 Base address:0x300 > > > > At startup the cards are recognized: > > eth0: 3c509 at 0x300 tag 1, 10baseT port, address 00:20:af:13:b2:f9, IRQ 10. > 3c509.c:1.16C 2/5/98 becker@cesdis.gsfc.nasa.gov > Swansea University Computer Society IPX 0.34 for NET3.035 > > > The only odd thing is that when I cat the network devices I see this: > > [root@ki /root]# cat /proc/net/dev > Inter-| Receive | Transmit > face |packets errs drop fifo frame|packets errs drop fifo colls carrier > lo: 95 0 0 0 0 95 0 0 0 0 0 > eth0: 0 0 0 0 0 0 0 0 0 0 126 > > For some reason the number under carrier for eth0 not zero. It goes > up after each time I try to ping. > > Finally ARP tables have no entries. When I enter these manually it doesn't > seem to help. > > Also, there are no IRQ conflicts: > > [root@ki /root]# cat /proc/interrupts > 0: 202649 timer > 1: 918 keyboard > 2: 0 cascade > 4: 4 + serial > 8: 1 + rtc > 10: 0 3c509 > 13: 1 + IPI > 14: 15873 + ide0 > 15: 0 + ide1 > > Could this mean bad hardware? How can I test it? > > Thanks in advance for any suggestions. > > ...richie > > > -- > > "It is a good day to code." http://www.netlabs.net/~richieb > From ymorin@enib.fr Tue, 13 Oct 1998 17:25:39 -0400 Date: Tue, 13 Oct 1998 17:25:39 -0400 From: Yann E. MORIN ymorin@enib.fr Subject: Beowulf : origins... Message en plusieurs parties et au format MIME. ------=_NextPart_000_00B7_01BDF700.10DE32A0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi! =20 For many days, I've been talking of 'beowulf'. I know it means a cluster = of machine, but, forgive my ignorance, where does the word come from? I've heard of a legend, what is it about?=20 =20 Regards, Yann.=20 =20 -- Yann E. MORIN --------------------------------------- Ecole Nationale d'Ingenieurs de Brest Laboratoire d'Informatique Industrielle Technopole Brest Iroise CP 15 - 29608 Brest Cedex - FRANCE --------------------------------------- tel : (+33) 298 056 600 (Ext 6318) fax : (+33) 298 056 629 email : bureau : ymorin@enib.fr perso : ymorin@france-mail.fr yann.morin@hol.fr --------------------------------------- =20 ------=_NextPart_000_00B7_01BDF700.10DE32A0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi!
 
For many days, I've been talking of 'beowulf'. I = know it means=20 a cluster of machine, but, forgive my ignorance, where does the word = come=20 from?
I've heard of a legend, what is it = about? 
 
Regards,
Yann. 
 
--
 
Yann E.=20 MORIN
---------------------------------------
Ecole Nationale = d'Ingenieurs=20 de Brest
Laboratoire d'Informatique Industrielle
Technopole Brest=20 Iroise
CP 15 - 29608 Brest Cedex -=20 FRANCE
---------------------------------------
tel   : = (+33) 298=20 056 600 (Ext 6318)
fax   : (+33) 298 056 629
email : = bureau : ymorin@enib.fr
   &n= bsp;   =20 perso  : ymorin@france-mail.fr
 =             &= nbsp;  =20 yann.morin@hol.fr
--------------= -------------------------
 
------=_NextPart_000_00B7_01BDF700.10DE32A0-- From prachya@science.gmu.edu Tue, 13 Oct 1998 18:44:02 -0400 Date: Tue, 13 Oct 1998 18:44:02 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Beowulf Seeds On Tue, 13 Oct 1998, Gregory R. Warnes wrote: > > I think this is a "Good Idea" (tm). I don't think it is redundant, > because the extreme linux CD doesn't seem to help at all with the setup, > which is the hard part! It can take a while to figure out how to > configure PVM, MPI, etc. > > -Greg > Thank you. My goal is to simplify and help spread the use of Beowulf. I want to set up a "Beowulf Seeds" page that will collect the contributed "seeds" from different organization Beowulf projects. (Try http://spaceops.science.gmu.edu/beowulf-seeds within a few days) Please let me know if one wants to join the club. Each seed will have description of requirements, hardware used, applications, link to creator, etc. This way, not only newbies can try the seeds of their interest, but also the experts can use them and perhaps one can evaluate different seeds using some kind of existing benchmark. I also want to know what is the smallest possible size of o The kernel image o The whole Linux tree for the "Beowulf Seed" Assuming that Linux has been installed on the master node and the "seed" is located at /home/ftp/pub/bseed1. In general, the master has different (more stuffs) Linux installation from the slaves. With or without remote boot (bootp), what I need is just a modified boot disk to run a shell script for: 1) Partitioning the disk 2) Copying the seed from the master and plant it on the drive 3) Auto or manually assigning IP and host name 4) Running lilo 5) Rebooting 6) Happy Beowulf :) --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From cbohn@afit.af.mil Tue, 13 Oct 1998 19:19:15 -0400 Date: Tue, 13 Oct 1998 19:19:15 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: Beowulf : origins... This is a multi-part message in MIME format. ------=_NextPart_000_0005_01BDF6DE.D30C1940 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Good day, In a nutshell, Beowulf is the oldest written English epic still in existance. An electronic copy is available at ftp://sunsite.unc.edu/pub/docs/books/gutenberg/etext97/bwulf10.txt . Quoting from the back cover of my copy of Beowulf, "Unique and beautiful, Beowulf brings to life a society of violence and honor, fierce warriors and bloddy battles, deadly monsters and famous swords. Written by an unknown poet in about the eighth century, this masterpiece of Anglo-Saxon literature transforms legends, myth, history, and acient songs into the richly colored tale of the hero Beowulf, the loathsome man-eater Grendel, his vengeful water-hag mother, and a treasure-hoarding firedragon. The earliest surviving epic poem in any modern European language, Beowulf is a stirring portrait of a heroic world -- somber, vast, and magnificent." and from the leaf, "...No historic Beowulf is known to have existed, but some events described in the poem did take place in the sixth century ... it is now generally accepted that, like the Iliad's Homer, there was one composer of Beowulf, who took the stories, legends, and myths of his culture's oral tradition and bound them together with his own artistic vision. Written in England at least fifty years after the conversion of the Anglo-Saxons to Christianity, perhaps later, the poem is recognized today as the longest and greatest poem extant in Old English--yet, it describes an ancient heroic society of Danes and Geats in Scandinavia; there is not one word about England, or about the people who came to be known as English, in the poem." What I'd like to know, is what inspired the "Gigaflops Workstation" team to name the system "Beowulf." Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* -----Original Message----- From: owner-beowulf@beowulf.gsfc.nasa.gov [mailto:owner-beowulf@beowulf.gsfc.nasa.gov]On Behalf Of Yann E. MORIN Sent: Tuesday, October 13, 1998 5:20 PM To: beowulf@beowulf.gsfc.nasa.gov Subject: Beowulf : origins... Hi! For many days, I've been talking of 'beowulf'. I know it means a cluster of machine, but, forgive my ignorance, where does the word come from? I've heard of a legend, what is it about? Regards, Yann. -- Yann E. MORIN --------------------------------------- Ecole Nationale d'Ingenieurs de Brest Laboratoire d'Informatique Industrielle Technopole Brest Iroise CP 15 - 29608 Brest Cedex - FRANCE --------------------------------------- tel : (+33) 298 056 600 (Ext 6318) fax : (+33) 298 056 629 email : bureau : ymorin@enib.fr perso : ymorin@france-mail.fr yann.morin@hol.fr --------------------------------------- ------=_NextPart_000_0005_01BDF6DE.D30C1940 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Good day,
In a=20 nutshell, Beowulf is the oldest written English epic still in=20 existance.  An electronic copy is available at ftp://sunsite.unc.edu/pub/docs/books/gutenberg/etext97/bwulf10.txt= =20 .
 
Quoting from the back cover of my copy of=20 Beowulf,
"Unique and beautiful, Beowulf = brings to=20 life a society of violence and honor, fierce warriors and bloddy = battles,=20 deadly monsters and famous swords.  Written by an unknown poet = in about=20 the eighth century, this masterpiece of Anglo-Saxon literature = transforms=20 legends, myth, history, and acient songs into the richly colored = tale of the=20 hero Beowulf, the loathsome man-eater Grendel, his vengeful = water-hag=20 mother, and a treasure-hoarding firedragon.  The earliest = surviving=20 epic poem in any modern European language, Beowulf is a = stirring=20 portrait of a heroic world -- somber, vast, and=20 magnificent."
and from the leaf,
"...No historic Beowulf is known to have existed, but = some=20 events described in the poem did take place in the sixth century ... = it is=20 now generally accepted that, like the Iliad's Homer, there = was one=20 composer of Beowulf, who took the stories, legends, and = myths of=20 his culture's oral tradition and bound them together with his own = artistic=20 vision.  Written in England at least fifty years after the = conversion=20 of the Anglo-Saxons to Christianity, perhaps later, the poem is = recognized=20 today as the longest and greatest poem extant in Old English--yet, = it=20 describes an ancient heroic society of Danes and Geats in = Scandinavia; there=20 is not one word about England, or about the people who came to be = known as=20 English, in the poem."
What I'd like to know, is what inspired the = "Gigaflops=20 Workstation" team to name the system=20 "Beowulf."
 
Take=20 care,
cb

*-*-*-*-*-*-*-*
Capt = Christopher A.=20 Bohn
Graduate Student, Electrical (digital) Engineering
Air Force=20 Institute of Technology     Phone (937)255-3636 (DSN = 785)
AFIT/EN638         &= nbsp;           &n= bsp;       =20 Lab x4606   Voicemail x6638
2950 P St, Box=20 4638           &nb= sp;           &nbs= p;=20 email cbohn@afit.af.mil
Wright-Patterson AFB OH=20 45433-7765          &nb= sp;     =20 EngrBohn@aol.com
         = ;     =20 http://members.aol.com/EngrBohn= /
*-*-*-*-*-*-*-*

 
-----Original Message-----
From:=20 owner-beowulf@beowulf.gsfc.nasa.gov=20 [mailto:owner-beowulf@beowulf.gsfc.nasa.gov]On Behalf Of Yann = E.=20 MORIN
Sent: Tuesday, October 13, 1998 5:20 = PM
To:=20 beowulf@beowulf.gsfc.nasa.gov
Subject: Beowulf :=20 origins...

Hi!
 
For many days, I've been talking of 'beowulf'. I = know it=20 means a cluster of machine, but, forgive my ignorance, where does = the word=20 come from?
I've heard of a legend, what is it=20 about? 
 
Regards,
Yann. 
 
--
 
Yann E.=20 MORIN
---------------------------------------
Ecole Nationale=20 d'Ingenieurs de Brest
Laboratoire d'Informatique=20 Industrielle
Technopole Brest Iroise
CP 15 - 29608 Brest Cedex = -=20 FRANCE
---------------------------------------
tel   = : (+33)=20 298 056 600 (Ext 6318)
fax   : (+33) 298 056 = 629
email :=20 bureau : ymorin@enib.fr
   &n= bsp;   =20 perso  : ymorin@france-mail.fr
 =             &= nbsp;  =20 yann.morin@hol.fr
--------------= -------------------------
 
------=_NextPart_000_0005_01BDF6DE.D30C1940-- From efinch@cais.com Tue, 13 Oct 1998 19:56:18 -0400 Date: Tue, 13 Oct 1998 19:56:18 -0400 From: Ed Finch efinch@cais.com Subject: How to manage access to a cluster? Greetings! I've acquired 6 Gateway P-60 PCs for my first cluster. The master is up and running and I'm about to start the installation on the slaves. I actually have 3 issues: 1) This is the first Beowulf-style cluster on my project (NASA's Mission to Planet Earth. There are about 700 people in my building). I've received a fair amount of friction from management for pursuing this and want to move forward carefully. The issue is this: I want the cluster to be "open" to encourage people to play with it (and hopefully we'll get real funding) but I want some accountability to its use. That is, I want to track who uses it, for how long and for what purpose. Help! 2) Red Hat 5.0 wasn't that great. I'd really like to build the cluster on 5.1, which makes NIS, etc. much easier. What is the "best" way to assemble a cluster? That is, I want the most current software. 3) By default, the Extreme Linux doesn't install C development, C++ development or development libraries. Is this right? Best regards, Ed -- Q: Why do PCs have a reset button on the front? A: Because they are expected to run Microsoft operating systems. From bob@drzyzgula.org Tue, 13 Oct 1998 20:26:08 -0400 Date: Tue, 13 Oct 1998 20:26:08 -0400 From: Bob Drzyzgula bob@drzyzgula.org Subject: Beowulf : origins... On Tue, Oct 13, 1998 at 11:20:29PM +0200, Yann E. MORIN wrote: > Hi! > > For many days, I've been talking of 'beowulf'. I know it means a cluster of machine, but, forgive my ignorance, where does the word come from? > I've heard of a legend, what is it about? > > Regards, > Yann. Beowulf is the name and subject of a medieval (~600AD) Anglo-Saxon poem, author unknown. Beowulf was the protagonist and eventual King of the Geats. Grendel was a particularly vile dragon and Wiglaf was the only loyal-to-the-end follower of Beowulf. Try: http://www.fas.harvard.edu/~layher1/medscan.html for a whole bunch of related links one of which is link to the full text of Beowulf translated into English: http://etext.lib.virginia.edu/cgibin/browse-mixed?id=AnoBeow&tag=public&images=images/modeng&data=/lv1/Archive/eng-parsed -- ============================================================ Bob Drzyzgula It's not a problem bob@drzyzgula.org until something bad happens ============================================================ From Paul.Courbis@crm.mot.com Wed, 14 Oct 1998 03:59:46 -0400 Date: Wed, 14 Oct 1998 03:59:46 -0400 From: Paul COURBIS Paul.Courbis@crm.mot.com Subject: Beowulf Seeds According to Prachya Chalermwat (on 10/14/98): > On Tue, 13 Oct 1998, Gregory R. Warnes wrote: > > > > > I think this is a "Good Idea" (tm). I don't think it is redundant, > > because the extreme linux CD doesn't seem to help at all with the setup, > > which is the hard part! It can take a while to figure out how to ^^^^^^^^^^^^ I *do* agree... Furthermore, it's quite impossible to find documentation on what needs to be configured ! If anyone has some doc, they are welcomed... (I guess I have to declare "slaves nodes" in tthe "master" but how ?...) Paul > > configure PVM, MPI, etc. > > > > -Greg > > > > Thank you. My goal is to simplify and help spread the use of Beowulf. I > want to set up a "Beowulf Seeds" page that will collect the contributed > "seeds" from different organization Beowulf projects. (Try > http://spaceops.science.gmu.edu/beowulf-seeds within a few days) Please > let me know if one wants to join the club. Each seed will have description > of requirements, hardware used, applications, link to creator, etc. This > way, not only newbies can try the seeds of their interest, but also the > experts can use them and perhaps one can evaluate different seeds using > some kind of existing benchmark. > > I also want to know what is the smallest possible size of > o The kernel image > o The whole Linux tree for the "Beowulf Seed" > > Assuming that Linux has been installed on the master node and the "seed" > is located at /home/ftp/pub/bseed1. In general, the master has different > (more stuffs) Linux installation from the slaves. With or without remote > boot (bootp), what I need is just a modified boot disk to run a shell > script for: > > 1) Partitioning the disk > 2) Copying the seed from the master and plant it on the drive > 3) Auto or manually assigning IP and host name > 4) Running lilo > 5) Rebooting > 6) Happy Beowulf :) > > --Prachya > > --------------------------------------------------------------------------- > Prachya Chalermwat George Mason University > Graduate Research Assistant (703) 993-4322 > Computational Sciences and Informatics (FAX) 993-1980 > George Mason University Email: prachya@science.gmu.edu > --------------------------------------------------------------------------- > URL: http://spaceops.science.gmu.edu > "Imagination is more important than knowledge." A. Einstein > -- -=-=-=-=-=- Paul COURBIS -=- Responsable Systemes Informatiques -=-=-=-=-=- -=-=-=- Centre de Recherche Motorola -=- Paul.COURBIS@crm.mot.com -=-=-=- -=- Voice: +33 (0)1 69.35.25.37 Fax: +33 (0)1 69.35.25.01 -=- Opinions hereabove are my own and not those of my organization From amartin@wlu.edu Wed, 14 Oct 1998 11:51:08 -0400 Date: Wed, 14 Oct 1998 11:51:08 -0400 From: Andrew Martin amartin@wlu.edu Subject: Linux and Support This is a multi-part message in MIME format. --X-X-X-X-X-X--NeoPlanet-MIME-TEXTandHTML--X-X-X-X-X-X-X17169592-X-X Content-Type: text/plain; charset="us-ascii" Gentlemen and Ladies One of the problems that face any company who is thinking about using Linux is the lack of support. I have an Idea that may help to rectify this problem. If you would like to hear about my idea and what I propose to do with it drop me an email at amartin@liberty.uc.wlu.edu.. I will post this again in a couple of weeks Andrew Martin Download Neoplanet at http://www.neoplanet.com --X-X-X-X-X-X--NeoPlanet-MIME-TEXTandHTML--X-X-X-X-X-X-X17169592-X-X Content-Type: text/html; charset="us-ascii" Gentlemen and Ladies
One of the problems that face any company who is thinking about using Linux is the lack of 
support.
I have an Idea that may help to rectify this problem. If you would like to hear about my idea and 
what I propose to do with it drop me an email at amartin@liberty.uc.wlu.edu..
I will post this again in a couple of weeks
Andrew Martin


Download NeoPlanet! --X-X-X-X-X-X--NeoPlanet-MIME-TEXTandHTML--X-X-X-X-X-X-X17169592-X-X-- From joelja@darkwing.uoregon.edu Wed, 14 Oct 1998 14:01:20 -0400 Date: Wed, 14 Oct 1998 14:01:20 -0400 From: Joel Jaeggli joelja@darkwing.uoregon.edu Subject: Linux and Support see: http://www.redhat.com/support/providers.html joelja On 14 Oct 1998, Andrew Martin wrote: > Gentlemen and Ladies > One of the problems that face any company who is thinking about using Linux is the lack of support. > I have an Idea that may help to rectify this problem. If you would like to hear about my idea and what I propose to do with it drop me an email at amartin@liberty.uc.wlu.edu.. > I will post this again in a couple of weeks > Andrew Martin > > > > Download Neoplanet at http://www.neoplanet.com -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From stevehi@soc.plym.ac.uk Wed, 14 Oct 1998 14:26:43 -0400 Date: Wed, 14 Oct 1998 14:26:43 -0400 From: Steve Hill stevehi@soc.plym.ac.uk Subject: Problems setting up a small cluster --------------0B224570A81F6A02C8EF00D5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Richie Bielak wrote: > Hi, > > I'm trying to set up a tiny cluster or two older P120 machines. > I installed Extreme Linux on both and they both have the > 3c509 network cards. I connect them via 10baseT LinkSys > hub. > > For some reason the machines do not see each other and I can't > figure out why. Here is output of "ifconfig": > > [root@ki /root]# ifconfig > lo Link encap:Local Loopback > inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 > UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 > RX packets:63 errors:0 dropped:0 overruns:0 > TX packets:63 errors:0 dropped:0 overruns:0 > > eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 > inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 > TX packets:0 errors:0 dropped:0 overruns:0 > Interrupt:10 Base address:0x300 > This sounds like exactly the problem I have just been having whilst trying to set up extreme linux, on a Dan PPro200 with the 3com 509 card in it. I don't know why, but the CD install just doesn't do the business for me. I was running round like a blue-arsed fly. I tested all my hardware and it was OK. What I finally did was a redhat 5.0 ftp install from my local mirror (sunsite.doc.ic.ac.uk), then used the extreme linux cd to add the extra beowulf bits using glint . I havn't yet recompiled the kernel, but so far everything is ok. Hope this helps Steve -- Steve Hill Center for Neural and Adaptive Systems School of Computing University of Plymouth --------------0B224570A81F6A02C8EF00D5 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Richie Bielak wrote:
Hi,

I'm trying to set up a tiny cluster or two older P120 machines.
I installed Extreme Linux on both and they both have the
3c509 network cards. I connect them via 10baseT LinkSys
hub.

For some reason the machines do not see each other and I can't
figure out why. Here is output of "ifconfig":

[root@ki /root]# ifconfig
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Bcast:127.255.255.255  Mask:255.0.0.0
          UP BROADCAST LOOPBACK RUNNING  MTU:3584  Metric:1
          RX packets:63 errors:0 dropped:0 overruns:0
          TX packets:63 errors:0 dropped:0 overruns:0

eth0      Link encap:Ethernet  HWaddr 00:20:AF:13:B2:F9
          inet addr:192.9.42.1  Bcast:192.9.42.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0
          TX packets:0 errors:0 dropped:0 overruns:0
          Interrupt:10 Base address:0x300
 

This sounds like exactly the problem I have just been having whilst trying to set up extreme linux, on a Dan PPro200 with the 3com 509 card in it. I don't know why, but the CD install just doesn't do the business for me. I was running round like a blue-arsed fly. I tested all my hardware and it was OK. What I finally did was a redhat 5.0 ftp install from my local mirror (sunsite.doc.ic.ac.uk), then used the extreme linux cd to add the extra beowulf bits using glint . I havn't yet recompiled the kernel, but so far everything is ok.

Hope this helps

Steve

-- 
Steve Hill
Center for Neural and Adaptive Systems
School of Computing
University of Plymouth
  --------------0B224570A81F6A02C8EF00D5-- From jeff.auwaerter@MCI.Com Wed, 14 Oct 1998 17:51:02 -0400 Date: Wed, 14 Oct 1998 17:51:02 -0400 From: Jeff Auwaereter jeff.auwaerter@MCI.Com Subject: pvm I am trying to get pvm3 up, but it dies with the following msg: building in src cd src; ../lib/aimk install making in LINUX/ for LINUX make[1]: Entering directory `/usr/local/pvm3/src/LINUX' cc -O -DCLUMP_ALLOC -I../../include -DARCHCLASS=\"LINUX\" -DIMA_LINUX -DSYSV SIGNAL -DNOWAIT3 -DNOUNIXDOM -DRSHCOMMAND=\"/usr/bin/rsh\" -DNEEDENDIAN -c ../hoster.c ../hoster.c: In function `pl_startup': ../hoster.c:345: storage size of `rfds' isn't known make[1]: *** [hoster.o] Error 1 make[1]: Leaving directory `/usr/local/pvm3/src/LINUX' make: *** [s] Error 2 I have PVM_ROOT & PVM_DPATH set to /usr/local/pvm3 ... I have no idea of what try next, or of what is causing the error. Any help would be greatly appreciated. tnx, Jeff From shachar@vipe.technion.ac.il Wed, 14 Oct 1998 18:21:38 -0400 Date: Wed, 14 Oct 1998 18:21:38 -0400 From: Shachar Tal shachar@vipe.technion.ac.il Subject: Linux and Support If you count out the hundreds of companies that fully support Linux commercially, check this out for major support for Linux. Guess who? Compaq. Shachar Tal ------------- Taub Computer Center, Technion, Israel Institute of Technology KeyID 0481FEF1 fingerprint = 52 1B 97 6A F2 77 AE C6 64 B6 5A 5E 14 28 8E 7E ---------- Forwarded message ---------- > From: "Jon 'maddog' Hall, USG Senior Leader" > Sender: owner-bod@li.org > To: bod@li.org, tb@li.org, officers@li.org > Cc: hall@zk3.dec.com > Subject: Compaq's support of Linux > Date: Tue, 13 Oct 98 21:43:12 -0400 > > On Tuesday, September 15th, 1998 in Paris, France at our user event named > "Eureka", and again on October 5th at DECUS in Los Angeles, Compaq Computer > Corporation announced intent to extend their support to the Linux(R) operating > system to include Intel as well as the Alpha platforms. In addition to > extending this support to another architecture, Compaq is in the process of > putting together a comprehensive program of Linux support. > > This support includes, but is not limited to: > > o working with the Linux community to port Linux to new platforms > o qualification of Linux on both Intel and Alpha platforms > o providing selected platforms with no license, specifically for Linux > and other freely available operating systems > o working with the Linux community and distributions to provide > world-wide telephone and hardware support > o porting selected Compaq software products to both Intel and Alpha > platforms > > In continuing the concept of working with the Linux community, Compaq intends > to extend its Linux support through its extensive channels partner programs. > Compaq feels that this will give the broadest possible selection of products > and solutions to our end customers, with our VARs, OEMs, Distributors and > Resellers working with the customer to match the distribution, the layered > products and third party offerings to that customer's needs. > > While it will take a little while for this program to be put into complete > action, there has been a lot of excitement inside Compaq since this decision > was announced by John Rose, Vice President of Compaq, at our European User's > event. Indeed both Compaq and one of our channel partners, Hallmark, showed > a licenseless AlphaServer 800 machine qualified for the Alpha Linux operating > system at ISPcon in San Jose this month. > > Warmest regards, > > Jon "maddog" Hall > > -- > ============================================================================= > Jon "maddog" Hall Internet: maddog@zk3.dec.com > Senior Leader, UNIX Software Group Executive Director, Linux(R) Intern'l > > Compaq Computer Corporation Linux International > Mailstop ZK03-2/U15 80 Amherst St. > 110 Spit Brook Rd. Amherst, N.H. 03031-3032 U.S.A. > Nashua, N.H. 03062-2698 U.S.A. > > WWW: http://www.compaq.com WWW: http://www.li.org > Voice: +1.603.884.1341 Voice: +1.603.672.4557 > FAX: +1.603.884.6424 Board Member: Uniforum Association > Office: ZK03-2/V15 Board Member: USENIX Association > > (R)Linux is a trademark of Linus Torvalds in the United States and other > countries. From shachar@vipe.technion.ac.il Wed, 14 Oct 1998 18:27:57 -0400 Date: Wed, 14 Oct 1998 18:27:57 -0400 From: Shachar Tal shachar@vipe.technion.ac.il Subject: Problems setting up a small cluster Hi, On Wed, 14 Oct 1998, Steve Hill wrote: > > I'm trying to set up a tiny cluster or two older P120 machines. > > I installed Extreme Linux on both and they both have the > > 3c509 network cards. I connect them via 10baseT LinkSys > > hub. > > > > For some reason the machines do not see each other and I can't > > figure out why. Here is output of "ifconfig": > > > > [root@ki /root]# ifconfig > > lo Link encap:Local Loopback > > inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 > > UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 > > RX packets:63 errors:0 dropped:0 overruns:0 > > TX packets:63 errors:0 dropped:0 overruns:0 > > > > eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 > > inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:0 errors:0 dropped:0 overruns:0 > > TX packets:0 errors:0 dropped:0 overruns:0 > > Interrupt:10 Base address:0x300 try pinging each host from the other one, then look for a complete MAC address in the ARP cache ("arp -avn"). if there is none, try sending the output of "ifconfig" and "route -n" on both machines. if ping works, it's your software which doesn't work. Shachar Tal ------------- Taub Computer Center, Technion, Israel Institute of Technology KeyID 0481FEF1 fingerprint = 52 1B 97 6A F2 77 AE C6 64 B6 5A 5E 14 28 8E 7E From richieb@netlabs.net Wed, 14 Oct 1998 22:39:39 -0400 Date: Wed, 14 Oct 1998 22:39:39 -0400 From: Richie Bielak richieb@netlabs.net Subject: Problems setting up a small cluster Shachar Tal wrote: [...] > try pinging each host from the other one, then look for a complete MAC > address in the ARP cache ("arp -avn"). if there is none, try sending the > output of "ifconfig" and "route -n" on both machines. if ping works, it's > your software which doesn't work. > Thanks a million for the suggestions. Here is the info you asked for: On machine "ki": [root@ki /root]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain 192.9.42.1 ki ki.beowulf 192.9.42.2 dojo dojo.beowulf [root@ki /root]# [root@ki /root]# ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 RX packets:18 errors:0 dropped:0 overruns:0 TX packets:18 errors:0 dropped:0 overruns:0 eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 TX packets:0 errors:0 dropped:0 overruns:0 Interrupt:10 Base address:0x300 *************** ARP tables here: [root@ki /root]# arp -avn ? (192.9.42.2) at on eth0 Entries: 1 Skipped: 0 Found: 1 ***************** Route table here: [root@ki /root]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface dojo * 255.255.255.255 UH 0 0 1 eth0 127.0.0.0 * 255.0.0.0 U 0 0 1 lo *************** Ping and ARP after. No change [root@ki /root]# ping dojo PING dojo (192.9.42.2): 56 data bytes --- dojo ping statistics --- 11 packets transmitted, 0 packets received, 100% packet loss [root@ki /root]# arp -avn ? (192.9.42.2) at on eth0 Entries: 1 Skipped: 0 Found: 1 Now on machine "dojo": [root@dojo /root]# ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 TX packets:2 errors:0 dropped:0 overruns:0 eth0 Link encap:Ethernet HWaddr 00:60:8C:BA:A9:15 inet addr:192.9.42.2 Bcast:192.9.42.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 TX packets:0 errors:0 dropped:0 overruns:0 Interrupt:10 Base address:0x300 [root@dojo /root]# [root@dojo /root]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface ki * 255.255.255.255 UH 0 0 2 eth0 127.0.0.0 * 255.0.0.0 U 0 0 1 lo [root@dojo /root]# ************* I manually entered the addresses into the ARP table ************* otherwise it was the same as on ki. [root@dojo /root]# arp Address HWtype HWaddress Flags Mask Iface ki ether 00:20:AF:13:B2:F9 CM eth0 [root@dojo /root]# ping ki PING ki (192.9.42.1): 56 data bytes --- ki ping statistics --- 6 packets transmitted, 0 packets received, 100% packet loss [root@dojo /root]# telnet ki Trying 192.9.42.1... ****** Then nothing. Finally I still have this, no interrupts on IRQ10 on both machines. [root@dojo /root]# cat /proc/interrupts 0: 71426 timer 1: 852 keyboard 2: 0 cascade 4: 6 + serial 8: 1 + rtc 10: 0 3c509 13: 1 + IPI 14: 15405 + ide0 15: 0 + ide1 ------ Could it be a hardware problem? I reset IRQ 10 to be ISA (not PnP) and I set all DMA channels to ISA in the BIOS config. Still nothing. At this point I'm ready to install RH 5.1... Still puzzled..... ...richie -- "It is a good day to code." http://www.netlabs.net/~richieb From richieb@netlabs.net Wed, 14 Oct 1998 23:17:24 -0400 Date: Wed, 14 Oct 1998 23:17:24 -0400 From: Richie Bielak richieb@netlabs.net Subject: More info on problem with two machines Hi, (I'm sending this a second time - I think the first time the mail got lost). My problem I have two machines (ki and dojo) with Extreme Linux and 3c509 NICs and I can't get them to talk. --- OK. Here is the info you guys asked for: On machine "ki": [root@ki /root]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain 192.9.42.1 ki ki.beowulf 192.9.42.2 dojo dojo.beowulf [root@ki /root]# [root@ki /root]# ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 RX packets:18 errors:0 dropped:0 overruns:0 TX packets:18 errors:0 dropped:0 overruns:0 eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 TX packets:0 errors:0 dropped:0 overruns:0 Interrupt:10 Base address:0x300 *************** ARP tables here: [root@ki /root]# arp -avn ? (192.9.42.2) at on eth0 Entries: 1 Skipped: 0 Found: 1 ***************** Route table here: [root@ki /root]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface dojo * 255.255.255.255 UH 0 0 1 eth0 127.0.0.0 * 255.0.0.0 U 0 0 1 lo *************** Ping and ARP after. No change [root@ki /root]# ping dojo PING dojo (192.9.42.2): 56 data bytes --- dojo ping statistics --- 11 packets transmitted, 0 packets received, 100% packet loss [root@ki /root]# arp -avn ? (192.9.42.2) at on eth0 Entries: 1 Skipped: 0 Found: 1 Now on machine "dojo": [root@dojo /root]# ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 TX packets:2 errors:0 dropped:0 overruns:0 eth0 Link encap:Ethernet HWaddr 00:60:8C:BA:A9:15 inet addr:192.9.42.2 Bcast:192.9.42.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 TX packets:0 errors:0 dropped:0 overruns:0 Interrupt:10 Base address:0x300 [root@dojo /root]# [root@dojo /root]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface ki * 255.255.255.255 UH 0 0 2 eth0 127.0.0.0 * 255.0.0.0 U 0 0 1 lo [root@dojo /root]# ************* I manually entered the addresses into the ARP table ************* otherwise it was the same as on ki. [root@dojo /root]# arp Address HWtype HWaddress Flags Mask Iface ki ether 00:20:AF:13:B2:F9 CM eth0 [root@dojo /root]# ping ki PING ki (192.9.42.1): 56 data bytes --- ki ping statistics --- 6 packets transmitted, 0 packets received, 100% packet loss [root@dojo /root]# telnet ki Trying 192.9.42.1... ****** Then nothing. Finally I still have this, no interruptrs on IRQ10. [root@dojo /root]# cat /proc/interrupts 0: 71426 timer 1: 852 keyboard 2: 0 cascade 4: 6 + serial 8: 1 + rtc 10: 0 3c509 13: 1 + IPI 14: 15405 + ide0 15: 0 + ide1 ------ Could it be a hardware problem? I reset IRQ 10 to be ISA (not PnP) and I set all DMA channels to ISA in the BIOS config. Still nothing. At this point I'm ready to install RH 5.1... -- "It is a good day to code." http://www.netlabs.net/~richieb From rgb@phy.duke.edu Thu, 15 Oct 1998 00:43:00 -0400 Date: Thu, 15 Oct 1998 00:43:00 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: Problems setting up a small cluster On Wed, 14 Oct 1998, Richie Bielak wrote: > [root@ki /root]# ifconfig > lo Link encap:Local Loopback > inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 > UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 > RX packets:18 errors:0 dropped:0 overruns:0 > TX packets:18 errors:0 dropped:0 overruns:0 > > eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 > inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 > TX packets:0 errors:0 dropped:0 overruns:0 > Interrupt:10 Base address:0x300 It looks like your ethernet device is up normally. What are the details of your physical network? I assume, for example, that it is a class C (from the address and netmask) and that the machines are connected somehow. Are they both plugged into a common hub or switch? Are they connected directly with a wire. If the latter, is the wire a crossover wire? If the former, is the wire straight through (and NOT a crossover wire)? Your findings below are consistent with the cards being up and happy but not being physically connected. The "incomplete" in the arp table, for example, just means that it hasn't had any response from 192.9.42.2 and therefore cannot map IP number to Ethernet address in the ARP table. > *************** ARP tables here: > > [root@ki /root]# arp -avn > ? (192.9.42.2) at on eth0 > Entries: 1 Skipped: 0 Found: 1 > > ***************** Route table here: > > [root@ki /root]# route > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use Iface > dojo * 255.255.255.255 UH 0 0 1 eth0 > 127.0.0.0 * 255.0.0.0 U 0 0 1 lo I personally like to install a default route like: route add default eth0 if I'm having any trouble at all with the network connection. It "looks" like your route is ok as an explicit entry, but I always like to have a default route because, well, nothing works if the system cannot figure out how to route to it. If you want to be "efficient" you can even define route table entries for your local network (probably 192.9.42.XX) and your gateway on that local network (if you have one) for your default, although for mostly-local traffic just setting the default to the wire works pretty well. > Could it be a hardware problem? I reset IRQ 10 to be ISA (not PnP) and > I set all DMA channels to ISA in the BIOS config. Still nothing. It always "could" be, but it looks like you have fairly standard hardware and it looks like the devices were indeed recognized and the drivers for them successfully installed into the kernel. Otherwise ifconfig would probably fail or you would get some messages at boot time. I'm voting first for a bad cable (because it is the cheapest thing to test/replace at $5-10), second for a bad NIC (ALSO cheap -- I can get generic tulip 10/100BT cards that work perfectly for $30 at my local Intrex or Best Buy), and third for a "bad" hub or switch, where "bad" in this context can mean that it doesn't coexist peacefully and negotiate successfully with your cards as well as that it might be broken. 10base hubs are $50 for eight ports, although you really want a 100base switch (or at least a hub!) and new NICs anyway to build a beowulf. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rdab100@hermes.cam.ac.uk Thu, 15 Oct 1998 01:38:20 -0400 Date: Thu, 15 Oct 1998 01:38:20 -0400 From: Dominic Baines rdab100@hermes.cam.ac.uk Subject: Linux and Support Don't forget the original: http://sunsite.unc.edu/LDP/HOWTO/Consultants-HOWTO.html It's a link off http://www.linux.org/help/ and we all know that one don't we :-) (Updated only 13 days ago now !) Dominic Joel Jaeggli wrote: > see: > > http://www.redhat.com/support/providers.html > > joelja > > On 14 Oct 1998, Andrew Martin wrote: > > > Gentlemen and Ladies > > One of the problems that face any company who is thinking about using Linux is the lack of support. Perhaps you haven't investigated this thoughily enough. I think you'll find support is just as good as any of theother OS's out there and a lot depends on your contact, who provides the support and what you tell them the problem is. > > I have an Idea that may help to rectify this problem. If you would like to hear about my idea and what I propose to do with it drop me an email at amartin@liberty.uc.wlu.edu.. > > I will post this again in a couple of weeks > > Andrew Martin > > > > > > > > Download Neoplanet at http://www.neoplanet.com > > -------------------------------------------------------------------------- > Joel Jaeggli joelja@darkwing.uoregon.edu > Academic User Services consult@gladstone.uoregon.edu > PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E > -------------------------------------------------------------------------- > It is clear that the arm of criticism cannot replace the criticism of > arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of > the right, 1843. From walt@parl.ces.clemson.edu Thu, 15 Oct 1998 10:29:37 -0400 Date: Thu, 15 Oct 1998 10:29:37 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: PVM build under glibc problems To: "Jeff Auwaereter" Subject: Re: pvm Date: Thu, 15 Oct 1998 10:29:37 -0400 From: "Walter B. Ligon III" - - - -------- I recenly rebuilt PVM and had the same problem. It comes from the change to glibc (I'll bet you are using a fairly new RedHat, right?). Now if I can only remeber how I fixed it ... Oh yeah - on line 345 of hoster.c the program declares a variable of type "rfds" Under glibc there is no such type, rather there is the type "struct rfds" If you simply insert the word "struct" before the word "rfds" on line 345 the file should compile OK. That is not the last problem you will see. I seem to remember having to change a couple of other things, but I can't remember where. Probably the best thing to do is to make the change I suggested and see what your next problem is - once I know what file it is in I can find it pretty quickly. What I really need to do is make a patch. My version of PVM has an added feature that tasks spawned from the "head" node of our beowulf are NOT scheduled on the head node. We generally only used the head node for interactive login and editing and stuff, and only use the rest of the nodes for computation. Anyway, that's another issue. Good luck. BTW, why are you rebuilding PVM? There are rpms of PVM already built and running for LINUX. I only rebuilt PVM to put in my "no head node" feature, that's why I happen to be aware of the problems. Walt -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From kragen@pobox.com Thu, 15 Oct 1998 12:17:36 -0400 Date: Thu, 15 Oct 1998 12:17:36 -0400 From: Kragen kragen@pobox.com Subject: fault tolerance On Thu, 8 Oct 1998 guilin@ix.netcom.com wrote: > In a beowulf system, if one of the slave nodes fail, will the entire system > crash? Or will the system continue with just one less node? The whole system won't crash, but depending on what mechanism you're using to distribute your work across the machines, your whole application might. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From walt@parl.ces.clemson.edu Thu, 15 Oct 1998 13:56:11 -0400 Date: Thu, 15 Oct 1998 13:56:11 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: pvm -------- > Why am I rebuilding pvm? Because I don't know what I'm doing!! I have tried > to install from the rpm by typing "rpm -i pvm3.3.11-10.src.rpm": it thinks > for just a few seconds, returns the prompt, but won't run. When I type > ./pvmd, I get > "pvmd: /usr/local/pvm3/lib/LINUX/pvmd3 does'nt exist. Make sure PVM is > built and PVM_ROOT is set correctly." > > I do not have a LINUX directory under /usr/local/pvm3/lib, it is under > /usr/src/redhat/SOURCES/pvm3/lib. > > I'm obviously doing something wrong and would appreciate any help. > thank you, > Jeff > yes, you are installing the source rpm, which is attempting to recompile pvm - which ou should not need to do. What you want is: pvm-3.3.11-6.i386.rpm This should be available at a number of ftp sites - I know the Georgia Tech site has it (ftp.cc.gatech.edu) but the easiest way is to go to the Beowulf homepage (www.beowulf.org) click on Beowulf Software and look about halfway down the page. There is a link for the binary RPM and one for the source RPM. You want the binary. This should install with no problems and you are on your way. Walt -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From bjwelch@bell-labs.com Thu, 15 Oct 1998 14:22:52 -0400 Date: Thu, 15 Oct 1998 14:22:52 -0400 From: bjwelch@bell-labs.com bjwelch@bell-labs.com Subject: Making another stone computer Here at Lucent we're ending up making another stone computer out of old 486 and pentium machines. We're just starting, and since many machines have only 325MB hard drives I was wondering what's a good minimal (!) set of partition sizes. The zero node will have plenty of space, of course, for PVM and tools. After scouting the HOWTO papers, I ended up thinking these sizes should be acceptable partition sizes for minimal hard drive machines: 50MB for /root 2x RAM for swap 5MB for /boot all that's left goes to /usr Does this sound okay? Should I skimp on swap size to give more to /root or /usr? For machines with more hard drive space, I figured I'd keep to this model to keep consistency across the nodes. This leads to just what packages need to be installed. I'm a bit lost as to whether the nodes need stuff like Network Management and ftp. We're using the extreme linux cd with Red Hat 5.0. Could anyone recommend the minimal packages to install for the nodes? We're planning on using the cluster as a gcc/c++ build server. Working my way up the Beowulf learning curve, -Bryan -- Bryan Welch - Bell Labs - bjwelch@bell-labs.com - N0SFG - PP-ASEL From lindahl@cs.virginia.edu Thu, 15 Oct 1998 15:18:10 -0400 Date: Thu, 15 Oct 1998 15:18:10 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Making another stone computer > Here at Lucent we're ending up making another stone computer out of old > 486 and pentium machines. We're just starting, and since many machines > have only 325MB hard drives I was wondering what's a good minimal > (!) set of partition sizes. How about 100% of the disk into / ? Then you have maximum flexibility. There's nothing worse than having to repartition all your disk drives. The only minus is that if you fill up a partition, you fill all the partitions... -- g From cobbjw@ornl.gov Thu, 15 Oct 1998 15:36:41 -0400 Date: Thu, 15 Oct 1998 15:36:41 -0400 From: John W. Cobb cobbjw@ornl.gov Subject: Commodity RAID pricing (Was Re: High availability?) You know if the core Beowulf Philosophy is optimum price/performance by using COTS products, then it is perhaps appropriate to talk about Beo-RAID (i.e. cheapest and best perforamnce raid). The context is one of mounting all of the needed disk-space on the cluster's "mother node" as RAID. While this won't scale well and won't give parallel I/O speeds, it may increase reliability. So what is the best price for RAID in MB/$ these days. Caskey gives: 48GB/~10K$ = 4.8MB/S Steven gets 100GB/18K$ = 5.6MB/$ Let me add a data point. In an alternate life, I am in the business of providing file service for a building full of W95 users. We just bought a new NT server to be used primarily as a file-server. We got a quite good price. The server is being discounted because of technological obselescence. The configuration and costs are as follows: 1) Server: IBM PC-Server 704 [includes 3 redundant 420 W power supplies, 12 hot-swappable drive bays, PPro 200MHz CPU 512K cache, 256MB RAM, SCSI Raid card (IBM SERVRAID-1) ] $2450 2) One Copy of NT 4.0 Server ~$800 3) 12 9.1GB Quantum 7200 drives @ $399/disk ==> $4788 (I don't have that much on this configuration myself, but all it would take is a call and money in my wallet) 4) @ 4.5 GB mirrored system disks: $500 5) 10/100 NIC ~$50 Total Cost: $8558 Total Storage capacity: 91 GB (10 disks, 1 for striping and 1 hot spare for rebuilding on the fly) Price/Perf. # = 91GB/$8538 > 10MB/$ --- RAID price. Now the three caveats must be noted: 1) This is a computer with RAID attached, not just RAID. One could view it simply as network attached storage independant of the mother-node instead of mountined on the mother-node. However, I did not talk about NFS server SW for NT, how it performs or what it costs (I'm currently not running NFS, only NT file sharing). Note however, that as network attached storage, this is very "High Availability" in that the system disk is mirrored, the data storage is RAID with a hot spare, the power supply is redundant, and if I want to, there is a slot for anyother PPro (cost ~$400?) 2) This is an NT configuration -- Not Linux. But if the point is price/perf, then the Beowulf ethic is not necessarily OS-religious -- whatever does the job. If Linux is desired/required, then obviously the PPro can run Linux. I am not sure about drivers for the RAID card though. What is the status of Linux support for RAID cards? 3) This is not the highest performance disk setup. It is only 7200 RPM. But if the notion is to have a single file system for the entire cluster, then I/O performance is probably NOT the driver. If it was, one would set up a truly parallel file-system using node-local disks arrangined in some form of redundancy. When I realized these numbers I was a bit astonished. I had usually thought of RAID as unreasonable expensive. What I think is occuring is that last year's servers are on fire-sale in order to clear inventory for the newer ones. The original list price of the 704 was something like 20K$. They 704's are available from people like Onsale or Computer Geeks Info about the server specs can be found at and At 8:33 AM -0600 10/13/98, Steven Schramm wrote: >Caskey L. Dickson wrote: > > > >> As for RAID storage, Promise Technology Inc. has >> come out with a product I'm eager to get. It's an external RAID enclosure >> that presents a single Fast/Wide SCSI drive to the host (i.e. all raid >> operations are encapsulated) but unlike traditional units of this type, >> the internal drive chain is ATA/EIDE drives. 10 or 5 depending upon the >> enclosure. They say you can get 48GB of usable raid 5 storage for ~$10K. >> >> The ultimate in commodity storage... if it works. ... >I have had many dealings with ISS, Corp., a high-end mass-storage VAR, which >customizes RAID and tape libraries. We just ordered another RAID box from >them which will give us just over 100G of useable RAID 5, with: > >- redundant, hot-swap power supplies >- hot-swap disk trays >- 128M cache >- 9-bay, all steel Kingston enclosure >- 18G UW SCSI disk drives > > >all for ~$18k. And their support is excellent, should you ever need it. John W. Cobb cobbjw@ornl.gov Spallation Neutron Source V. 423.576.5439 Oak Ridge National Laboratory F. 423.576-3041 MS-8218 Oak Ridge, TN 37831-8218 http://www.ornl.gov/sns/ Talk to teach, Listen to learn. From jmdavis@hsc.vcu.edu Thu, 15 Oct 1998 17:29:34 -0400 Date: Thu, 15 Oct 1998 17:29:34 -0400 From: Mike Davis jmdavis@hsc.vcu.edu Subject: Making another stone computer Brian, You don't provide information as to the amount of RAM. But a nice setup could be 2xRAM + small /var + remainder for /. This should provide the maximum amount of space for / while making sure that logs and messages don't inadvertantly fill up /. I've run mail servers for groups of up to 25 people on old NeXT slabs with 100 MB hard drives and 16 MB of RAM. So if you make your boxes X-less you should have enough space to make everything work. Mike bjwelch@bell-labs.com wrote: > > Here at Lucent we're ending up making another stone computer out of old > 486 and pentium machines. We're just starting, and since many machines > have only 325MB hard drives I was wondering what's a good minimal > (!) set of partition sizes. > The zero node will have plenty of space, of course, for PVM and tools. > > After scouting the HOWTO papers, I ended up thinking these sizes should > be acceptable partition sizes for minimal hard drive machines: > 50MB for /root > 2x RAM for swap > 5MB for /boot > all that's left goes to /usr > > Does this sound okay? Should I skimp on swap size to give more to /root > or /usr? > For machines with more hard drive space, I figured I'd keep to this > model to keep consistency across the nodes. > > This leads to just what packages need to be installed. I'm a bit lost > as to whether the nodes need stuff like Network Management and ftp. > We're using the extreme linux cd with Red Hat 5.0. Could anyone > recommend the minimal packages to install for the nodes? We're planning > on using the cluster as a gcc/c++ build server. > > Working my way up the Beowulf learning curve, > -Bryan > > -- > Bryan Welch - Bell Labs - bjwelch@bell-labs.com - N0SFG - PP-ASEL -- Mike Davis University Computing Services-MCV Campus SGI Systems Administrator Virginia Commonwealth University jmdavis@hsc.vcu.edu 804-828-9843 x142 (fax: 804-828-9807) From eugene@liposome.genebee.msu.su Thu, 15 Oct 1998 17:37:13 -0400 Date: Thu, 15 Oct 1998 17:37:13 -0400 From: Eugene Leitl eugene@liposome.genebee.msu.su Subject: Beowulf at ESO's VLT from Scientific Computing World, Issue 41 September 1998, pp. 31-32 data acquisition -- very large telescope Data systems support deep-space observation [...] What to do with the data The high data rates from the VLT pose particular challenges for data processing. In order to cope with the huge volumes and the complexity of the task, the ESO is developing high- performance computing facilities dedicated to analysing VLT data. ESO scientists are collaborating with scientists at the Center for Advanced Computation Research at the California Institute of Technology, USA, to build a parallel computer based on PC technology (and using the Beowulf paradigm) that will be attached to the VLT Science Archive for data calibration and data-mining activities. "The system will be targeted at performance in 10 GFlop range", said Quinn. For data analysis, the scientists use the ESO-MIDAS (Munich Image Data Analysis System) environment, and also the IRAF and IDL systems. ESO-MIDAS (avaliable from ESO) under the GNU General Public License) provides general tools for image processing and data reduction with emphasis on astronomical applications; in addition, it contains applications packages for stellar and surface photometry, image sharpening and decomposition, statistics and others. IRAF (Image Reduction and Analysis Facility) contains a selection of programs for general image processing and graphics, as well as programs for the reduction and analysis of optical and infrared astronomy data. IRAF is freely available for all platforms via anonymous ftp to ftp://iraf.noao.edu (from the National Optical Astronomy Observatories, Tucson, USA) and includes a complete programming environment for scientific applications. IDL is commercially avaliable from the USA's Research Systems. [...] From jason@primenet.com Thu, 15 Oct 1998 18:10:53 -0400 Date: Thu, 15 Oct 1998 18:10:53 -0400 From: Jason Wagner jason@primenet.com Subject: List Admin Can someone please send me the email address for this list's administrator? Thanks. Jason Jason Wagner Applications Systems Analyst University of Arizona at Sierra Vista jason@uasv.arizona.edu http://indigo.uasv.arizona.edu 12821209 The opinions expressed here are strictly my own, and in no way reflect the opinions of The University of Arizona. From richieb@netlabs.net Thu, 15 Oct 1998 22:14:16 -0400 Date: Thu, 15 Oct 1998 22:14:16 -0400 From: Richie Bielak richieb@netlabs.net Subject: Problems setting up a small cluster Robert G. Brown wrote: > > On Wed, 14 Oct 1998, Richie Bielak wrote: > > > [root@ki /root]# ifconfig > > lo Link encap:Local Loopback > > inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0 > > UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1 > > RX packets:18 errors:0 dropped:0 overruns:0 > > TX packets:18 errors:0 dropped:0 overruns:0 > > > > eth0 Link encap:Ethernet HWaddr 00:20:AF:13:B2:F9 > > inet addr:192.9.42.1 Bcast:192.9.42.255 Mask:255.255.255.0 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:0 errors:0 dropped:0 overruns:0 > > TX packets:0 errors:0 dropped:0 overruns:0 > > Interrupt:10 Base address:0x300 > > It looks like your ethernet device is up normally. What are the details > of your physical network? I assume, for example, that it is a class C > (from the address and netmask) and that the machines are connected > somehow. Are they both plugged into a common hub or switch? The machines are plugged into a hub with 10baseT cables. Steve Hill sent me a DOS program to test the hardware and I was able to verify that the cards, cables and the hub work. [...] > > > *************** ARP tables here: > > > > [root@ki /root]# arp -avn > > ? (192.9.42.2) at on eth0 > > Entries: 1 Skipped: 0 Found: 1 > > > > ***************** Route table here: > > > > [root@ki /root]# route > > Kernel IP routing table > > Destination Gateway Genmask Flags Metric Ref Use Iface > > dojo * 255.255.255.255 UH 0 0 1 eth0 > > 127.0.0.0 * 255.0.0.0 U 0 0 1 lo > > I personally like to install a default route like: > > route add default eth0 I added this but it still didn't help. :-( > > if I'm having any trouble at all with the network connection. It > "looks" like your route is ok as an explicit entry, but I always like to > have a default route because, well, nothing works if the system cannot > figure out how to route to it. If you want to be "efficient" you can > even define route table entries for your local network (probably > 192.9.42.XX) and your gateway on that local network (if you have one) > for your default, although for mostly-local traffic just setting the > default to the wire works pretty well. Well, it's a trivial network just the two machines. [...] > > I'm voting first for a bad cable (because it is the cheapest thing to > test/replace at $5-10), second for a bad NIC (ALSO cheap -- I can get > generic tulip 10/100BT cards that work perfectly for $30 at my local > Intrex or Best Buy), and third for a "bad" hub or switch, where "bad" in > this context can mean that it doesn't coexist peacefully and negotiate > successfully with your cards as well as that it might be broken. 10base > hubs are $50 for eight ports, although you really want a 100base switch > (or at least a hub!) and new NICs anyway to build a beowulf. Unfortunately :-) the hardware seems OK. The DOS test works between the two machines. ...richie -- "It is a good day to code." http://www.netlabs.net/~richieb From bob@drzyzgula.org Fri, 16 Oct 1998 00:26:52 -0400 Date: Fri, 16 Oct 1998 00:26:52 -0400 From: Bob Drzyzgula bob@drzyzgula.org Subject: Commodity RAID pricing (Was Re: High availability?) On Thu, Oct 15, 1998 at 03:37:24PM -0500, John W. Cobb wrote: > You know if the core Beowulf Philosophy is optimum price/performance by > using COTS products, then it is perhaps appropriate to talk about Beo-RAID > (i.e. cheapest and best perforamnce raid). > > The context is one of mounting all of the needed disk-space on the > cluster's "mother node" as RAID. While this won't scale well and won't give > parallel I/O speeds, it may increase reliability. > > So what is the best price for RAID in MB/$ these days. All, At my office, we've been building RAID systems from parts for about a year now, and have been pretty happy with the results. We use them both on X86/NT systems and on Sun/Solaris systems. I've used one on a Linux system for a while, but it had to go into real production. I am designing one into a Beowulf that I'm about to propose to my employer. Note that, although we've been able to make this fly, I wouldn't want to try it without a real integration shop. We have a five-bench workshop with everything from crimp tools to oscilliscopes to a mechanic's vise, and on occasion we need all of it. The RAID systems are generally easy, but get tricky in a few places. The Ribbon cables can be fussy and it can be a pain to spec them out before you have all the stuff in your hands; this is part of why we recently got all the crimp tools and teflon cable and stuff to build our own. Also, if you need to bring a 68-pin connector out to the back of a chassis and you don't have a hole for a 68-pin connector, it can sometimes be a pain. It took me about two years to find someone who would make me a draw-stud panel punch for a 68-pin connector. I also couldn't find the 2-56 stand-offs and machine screws with the right dimensions for what we wanted to do at a reasonable price; I wound up having them custom made by Accurate Machine Screw company. This is not at all cost-effective in small batches, though. Also, there are some places where you may *have* to make a custom wiring harnass, such as to wire the lead-acid battery to the SCSI controller. All this being said, I've listed below all the parts that we are using. The bottom line is that, for a JBOD configuration, the pricing comes out to 9.6MB/$ using eight 18.2GB drives (145GB). Made fully redundant and thus trimmed to 91GB, the pricing works out to about 6MB/$. If you use 9.1GB drives, the pricing won't be as favorable, about 4.3MB/$ fully redundant. Also note that I have not included the price of the server in the below, mostly because I figure pretty much everyone on this list knows how to build a server :-) I hope that y'all find this interesting, and let me know if you have any questions. Oh, and one last caveat: I use CMD 'cause I like them and because they are willing to work with us direct. Yes, I know of Mylex, DPT, and others, but I like the CMDs. No, I don't want to switch to another controller unless I have to. Yes, I know other people who hate CMDs and have had bad luck with them. As always, YMMV. Well, maybe another caveat after all: Yes, I very well may have missed something below, or some part number or price might be wrong. IF YOU DECIDE TO DO THIS PLEASE CHECK ALL THE PART AND ORDER NUMBERS YOURSELF. I am not responsible for you ordering the wrong thing. It is late and I should be in bed rather than typing. Please let me know if you notice anything. --Bob ------------------------------------------------ Kingston (http://www.kingston.com/prod/storage/) prices quoted below are from CMP Express, http://www.cmpexpress.com. Lower prices are almost certainly possible through competitive bidding. * Kingston DS500-SR 9-bay Rackmount Cabinet, unwired with two power supplies @ $908.52 * Kingston DE100 SCA Tray and frame, consisting of: DE100I-RSW Wide SCSI Frame $113.60 DE100I-CSWC SCA Carrier Tray $100.98 DX100-SWC/H Hot Swap Board $55.91 DXSOL/LOCK Solenoid Lock $30.28 -------- $300.77 x 8 -------- $2406.16 * Kingston Loop cable from backplate to RAID controller, Ultra-2, DC500-DCW1-U2 @ $76.14 * Kingston Rack Mount Rails (actually made by Jonathon), DXRCK-SLIDE @ $217.42 * Kingston Battery Bracket DX500-BBRKT @ $15.98 Total Kingston Parts = $3624.22 FYI, the solenoid lock will keep an operator from pulling a drive until it is completely spun down and isolated from the SCSI chain. This is done by a simple timeout, rather than by sensing the state of the drive. Still, it is an effective protection against an operator in a stressful situation. Note that there are other sources for similar equipment, although most of the OEMs tend to like Kingston. Trimm Industries makes such cabinets, and Granite Digital sells a line as well, although I'm not sure who makes it. Trimm also makes Fiber Channel cabinets. Digital Storageworks is another source, although in my experience the canisters are a royal pain in the tush. One other source (that I use for PC tower and rackmount chassis but not RAID so far) is California PC products, http://www.calpc.com. They have a line of chassis and "dataport" hot swap cansisters that will probably work out to be cheaper than the Kingstons. One nice thing about their canisters is that they each have fans in them, although if you depend on them you'll want to develop a schedule to check on them; I can't imagine that many fans won't have a healthy failure rate. They are also about to introduce a SCSI backplane that will allow you to plug in the "dataport" trays without the frames or ribbon cables on the back. Drawbacks are that they don't yet have a mounting bracket for the Lead-acid battery, they use cheap General Devices mounting rails rather than the Jonathons, and there are only eight bays instead of nine. Still, I'm comptemplating getting some of their RAID stuff for evaluation. ------------------------------------------------ * CMD (http://www.cmd.com) CRD-5440-001 Four-channel SCSI-to-SCSI RAID controller. Should cost about $1800-2000 from a distributer or reseller. DO! go to CMD's website and grab some of their documentation and white papers; there is a pile of good information available from there. ------------------------------------------------ * Internal Teflon 68-conductor ribbon cables from Granite Digital (http://www.scsipro.com) or other quality SCSI cable shop. You'll need three cables, two with four connectors and one with three connectors. You'll want to work out the measurements yourself (we build our own in-house and trim to fit), but they should cost about $200 for the three cables. The way we do it, we put three drives on channel 2, three on channel 3 and two on channel 4. This is overkill, in that you could put more drives in a single channel than that, but we didn't want to go from chassis to chassis with the SCSI cables and since the channels are there, well... Also note that it is possible to dual-host the CMD controller, so that you could have two host channels and two drive channels. Doing this you could use two SCSI interfaces on the host, potentially improving your throughput. I've never tried this. Note that Granite Digital's website is another good source of information on SCSI subsystem design. ------------------------------------------------ * External 68-pin SCSI-3 Certified cables from AMP (http://connect.amp.com), AMP part number 621885-1. Available from Allied Electronics (http://www.allied.avnet.com). Can't find this one in partciular on their website but I know that they can get them; figure $100 or so for a 3-foot cable. * External 68-pin SCSI-3 Single-ended active terminator, AMP part # 869516-1, about $50 from Allied Electronics or Digi-Key. Total AMP parts, $150 ------------------------------------------------ * Two 64MB 72-pin Parity SIMMs, maybe $200 total. (These are for cacheing in the CMD controller) ------------------------------------------------ * Panasonic LCR6V10BP-1 6V, 10Ah Lead-Acid battery. About $17 from Digi-Key. You'll need some Fast-On (or equivalent) connectors, a solderless connector crimp tool, and some hook-up wire, say 20 or 18AWG. This may be the hardest part; Ideally, you should have the ability to crimp 0.1" pitch header connectors, such as the AMP MOD IV (these are the kind of connectors used to connect the reset switch, for example, to a motherboard). This would be to connect the battery to the CMD controller. You might be able to get a custom cable shop (say NuData/Workstation Express) to do it. CMD provides the crimp pins and contact housing, but I still haven't gotten the spec from them so that I can get the right crimp tool. So I've been using AMP MOD IV connectors, which only sort of work. Contact me for more information if you really want to do this. You really do need the battery. A 10Ah battery will hold the cache for most of a weekend, and remember, you can have up to 128MB of data in your cache! ----------------------------------------------- So the total for all the parts net of the drives themselves would be (rounding quite a bit): Kingston Parts: $3630 CMD Controller: 2000 Ribbon Cables: 200 Ext Cable & Terminator: 150 Memory: 200 Battery: 20 -------------------------------- Total: $6200 ---------------------------------------------- Then, of course, you need some disk drives to put into it. You'll need to pay very close attention to CMD's hardware compatiblity list... this is extremely important, far more important than in any other thing I've built. I've seen a RAID unit go from absolute scrap iron to flawless performance just from flashing the drives with a new firmware. You have been warned. I've really had very good luck with Fujitsu drives, with their technical support, with their warranty policy, and their coordination with CMD, so I'll recommend them, but CMD's HCL has a lot of other vendors listed, and, as always, YMMV. The Fujitsu MAA3182SC 18.2GB drive is selling for $1,106.94 at CMP express; again, you'll get your best price shopping it around. So, for eight 18.2GB drives, you'll have to shell out about $8,800. ----------------------------------------------- With the drives and the enclosure, the total comes to just about $15,000, for 145GB of JBOD storage, or about 91GB of RAID-5 storage with one parity drive, one hot spare drive and one warm spare drive. (A hot spare is kept spun up and will start being rebuilt as soon as a drive failure occurs. The warm spare is kept spun down and doens't spin up until the hot spare gets put into service.) This then works out to 9.6MB/$ unprotected and 6MB/$ protected. ---------------------------------------------- One thing to be careful of is heat. The Kingston chassis has six fans in it counting the ones in the power supplies, but it is sometimes not enough. If your building (like mine) turns the A/C off overnight, fear for your drives. We've found that we can only really run these things in our data center. This is, I've found, especially a problem with 1.6" drives of any sort. They don't have as much airspace around them and they throw off more heat to begin with. This is one good reason to stick with 9GB drives for now, because the 18GB drives are still all 1.6". -------------------------------------------- From jei@zor.hut.fi Fri, 16 Oct 1998 07:32:07 -0400 Date: Fri, 16 Oct 1998 07:32:07 -0400 From: Jukka E Isosaari jei@zor.hut.fi Subject: Commodity RAID pricing (Was Re: High availability?) On Fri, 16 Oct 1998, Bob Drzyzgula wrote: > On Thu, Oct 15, 1998 at 03:37:24PM -0500, John W. Cobb wrote: > > You know if the core Beowulf Philosophy is optimum price/performance by > > using COTS products, then it is perhaps appropriate to talk about Beo-RAID > > (i.e. cheapest and best perforamnce raid). > > > > The context is one of mounting all of the needed disk-space on the > > cluster's "mother node" as RAID. While this won't scale well and won't give > > parallel I/O speeds, it may increase reliability. > > > > So what is the best price for RAID in MB/$ these days. > > All, > > At my office, we've been building RAID systems > from parts for about a year now, and have ... You make it sound so difficult. :-) Well, it may be, but I have been using software-RAID on my Linux for a few months now, and it was very easy to put up, and didn't cost me anything except for the hard disks themselves. I don't know if there's a performance penalty when the system grows.. However, I can mix IDE and SCSI systems, and different size disks. Well, at least it's cheap. :) ++ J From jeremic@clarkson.edu Fri, 16 Oct 1998 10:29:25 -0400 Date: Fri, 16 Oct 1998 10:29:25 -0400 From: jeremic@clarkson.edu jeremic@clarkson.edu Subject: which MPI? Hello All, I am trying to run NAS Parallel Benchmarks and MPICH seems to have some problem during linking phase: sokocalo.cee-NPB2.3 40>make bt NPROCS=4 CLASS=B ========================================= = NAS Parallel Benchmarks 2.3 = = MPI/F77/C = ========================================= cd BT; make NPROCS=4 CLASS=B make[1]: Entering directory `/home/jeremic/NPB2.3/BT' make[2]: Entering directory `/home/jeremic/NPB2.3/sys' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/home/jeremic/NPB2.3/sys' ../sys/setparams bt 4 B pgf77 -lm -o ../bin/bt.B.4 bt.o make_set.o initialize.o exact_solution.o exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o lhsx.o lhsy.o lhsz.o x_solve.o y_solve.o z_solve.o add.o error.o verify.o setup_mpi.o ../common/print_results.o ../common/timers.o -L/usr/mpich/lib/LINUX/ch_p4 -lmpi Linking: /usr/mpich/lib/LINUX/ch_p4/libmpi.a(initf.o): In function `mpi_init_': initf.o(.text+0x11): undefined reference to `mpir_iargc_' initf.o(.text+0x138): undefined reference to `mpir_getarg_' make[1]: *** [../bin/bt.B.4] Error 1 make[1]: Leaving directory `/home/jeremic/NPB2.3/BT' make: *** [bt] Error 2 sokocalo.cee-NPB2.3 41> I installed MPICH from rpm. Should I use LAM (I have tried it but it gave far more errors!). Has anybody experienced similar problems? Any hints are appreciated! Best regards, Boris From prachya@science.gmu.edu Fri, 16 Oct 1998 10:39:24 -0400 Date: Fri, 16 Oct 1998 10:39:24 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: MFLOPS of different CPUs Hi, Anyone knows how I can get the peak MFLOPS of different type of CPUs? Thanks. --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From cbohn@afit.af.mil Fri, 16 Oct 1998 11:11:47 -0400 Date: Fri, 16 Oct 1998 11:11:47 -0400 From: Bohn, Christopher A. cbohn@afit.af.mil Subject: which MPI? Gee, this looks awefully familar. I've been fighting this problem, myself, with a far simpler application (pi3.f from the mpich installation on the Extreme Linux CD). I've been offered three suggestions: - the mpich build I'm using is bad ... recompile mpich -- I've tried compiling the program on other platforms with the same results ... either all these mpich builds are bad, or this isn't the problem - the g77 compiler I'm using doesn't support getarg & iarg as instrinsic functions ... a compiler that does is nagf90 -- this is a commercially available compiler, with a link from the Beowulf Software page leading to it -- it's recently been superseded by nagf95 -- there's a 30-day trial version available for download -- I haven't tried this yet - mpich uses iargc & getarg which are not part of the standard (according to the person who offered this suggestion; Pachecho's "Parallel Programming with MPI doesn't list them in the index, either). This suggestor offered two solutions: -- get another compiler or another mpi -- write the subroutine where MPI_INIT & MPI_FINALIZED are called in C, and the rest of the program in f77. This will result in initc.o getting linked, and iargc & getarg (or the equivalent) are a part of c. The trick, then, is calling c from f77 & vice-versa. -- I haven't tried this yet, either Hope this helps both you & me. Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* > -----Original Message----- > From: jeremic@clarkson.edu [SMTP:jeremic@clarkson.edu] > Sent: Friday, October 16, 1998 10:29 AM > To: beowulf@beowulf.gsfc.nasa.gov > Cc: jeremic@sokocalo.cee.clarkson.edu > Subject: which MPI? > > Hello All, > > I am trying to run NAS Parallel Benchmarks and MPICH seems to have some > problem during linking phase: > > sokocalo.cee-NPB2.3 40>make bt NPROCS=4 CLASS=B > ========================================= > = NAS Parallel Benchmarks 2.3 = > = MPI/F77/C = > ========================================= > > cd BT; make NPROCS=4 CLASS=B > make[1]: Entering directory `/home/jeremic/NPB2.3/BT' > make[2]: Entering directory `/home/jeremic/NPB2.3/sys' > make[2]: Nothing to be done for `all'. > make[2]: Leaving directory `/home/jeremic/NPB2.3/sys' > ../sys/setparams bt 4 B > pgf77 -lm -o ../bin/bt.B.4 bt.o make_set.o initialize.o exact_solution.o > exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o lhsx.o > lhsy.o lhsz.o x_solve.o y_solve.o z_solve.o add.o error.o verify.o > setup_mpi.o ../common/print_results.o ../common/timers.o > -L/usr/mpich/lib/LINUX/ch_p4 -lmpi > Linking: > /usr/mpich/lib/LINUX/ch_p4/libmpi.a(initf.o): In function `mpi_init_': > initf.o(.text+0x11): undefined reference to `mpir_iargc_' > initf.o(.text+0x138): undefined reference to `mpir_getarg_' > make[1]: *** [../bin/bt.B.4] Error 1 > make[1]: Leaving directory `/home/jeremic/NPB2.3/BT' > make: *** [bt] Error 2 > sokocalo.cee-NPB2.3 41> > > > I installed MPICH from rpm. Should I use LAM (I have tried it but it > gave far more errors!). Has anybody experienced similar problems? > > > Any hints are appreciated! > > Best regards, Boris > From cbohn@afit.af.mil Fri, 16 Oct 1998 11:32:19 -0400 Date: Fri, 16 Oct 1998 11:32:19 -0400 From: Bohn, Christopher A. cbohn@afit.af.mil Subject: MFLOPS of different CPUs Good day, I don't know where there is such a listing, but you can upper-bound it quite simply by multiplying the number of independent FP pipelines by the clock rate. For example, in a hypothetical processor with 2 FP pipelines and is clocked at 500MHz: 2 flop/cycle x 500e06 cycles/sec = 1e9 flop/sec = 1Gflops upper-bound Naturally, as soon as your problem no longer fits in the register file (to include reorder buffers, etc), then your realized performance will drop from the upper-bound. http://www.top500.org/ lists the LINPACK actual & theoretical flop ratings for the top 500 supercomputer sites -- not quite what you asked for, but possibly useful. Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* > -----Original Message----- > From: Prachya Chalermwat [SMTP:prachya@science.gmu.edu] > Sent: Friday, October 16, 1998 10:39 AM > To: beowulf@beowulf.gsfc.nasa.gov > Subject: MFLOPS of different CPUs > > Hi, > > Anyone knows how I can get the peak MFLOPS of different type of CPUs? > Thanks. > > --Prachya > > -------------------------------------------------------------------------- > - > Prachya Chalermwat George Mason University > Graduate Research Assistant (703) 993-4322 > Computational Sciences and Informatics (FAX) 993-1980 > George Mason University Email: prachya@science.gmu.edu > 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 > -------------------------------------------------------------------------- > - > URL: http://spaceops.science.gmu.edu > "Imagination is more important than knowledge." A. Einstein From bill@math.ucdavis.edu Fri, 16 Oct 1998 12:04:27 -0400 Date: Fri, 16 Oct 1998 12:04:27 -0400 From: Bill Broadley bill@math.ucdavis.edu Subject: MFLOPS of different CPUs >For example, in a hypothetical processor with 2 FP pipelines and is clocked >at 500MHz: > >2 flop/cycle x 500e06 cycles/sec = 1e9 flop/sec = 1Gflops upper-bound > >Naturally, as soon as your problem no longer fits in the register file (to >include reorder buffers, etc), then your realized performance will drop from >the upper-bound. Keep in mind that as mentioned everything must be in the register file, but often a specific mix. For instance on the alpha 21164 all dependencies must be met, everything in registers, and the mix of instructions must be one add/sub and one multiply to hit 2 flops a cycle. Even that's impossible for more then a few instructions since you either need a long stream of instructions (causing stalls on icache misses) or loops (which cause stalls on compare/branch). There have been some codes that have managed almost 100% of peak performance, digital's put out a PR announcements about hitting 1 gflop on a 500 Mhz 21164, not sure if they counted a load/store as a flop or not. It was a "real" code with a hand optimized inner loop that had a favorable mix of add's/multiplies. Peak performance is fun to talk about, but of little relevance when comparing hardware. With every generation traditional cpu's are getting more cache dependent. Vector machines and the Tera are the only machines I know of where large fractions of peak performance aren't a surprise. From cobbjw@ornl.gov Fri, 16 Oct 1998 12:22:36 -0400 Date: Fri, 16 Oct 1998 12:22:36 -0400 From: John W. Cobb cobbjw@ornl.gov Subject: Commodity RAID pricing (Was Re: High availability?) >You know if the core Beowulf Philosophy is optimum price/performance by >using COTS products, then it is perhaps appropriate to talk about Beo-RAID >(i.e. cheapest and best perforamnce raid). ... >So what is the best price for RAID in MB/$ these days. > >Caskey gives: 48GB/~10K$ = 4.8MB/S >Steven gets 100GB/18K$ = 5.6MB/$ > >Let me add a data point.... as follows: > >1) Server: IBM PC-Server 704 [includes 3 redundant 420 W power supplies, 12 >hot-swappable drive bays, PPro 200MHz CPU 512K cache, 256MB RAM, SCSI Raid >card (IBM SERVRAID-1) ] $2450 >2) One Copy of NT 4.0 Server ~$800 >3) 12 9.1GB Quantum 7200 drives @ $399/disk ==> $4788 (I don't have that >much on this configuration myself, but all it would take is a call and >money in my wallet) >4) @ 4.5 GB mirrored system disks: $500 >5) 10/100 NIC ~$50 > >Total Cost: $8558 >Total Storage capacity: 91 GB (10 disks, 1 for striping and 1 hot spare for >rebuilding on the fly) > >Price/Perf. # = 91GB/$8538 > 10MB/$ --- RAID price. > >Now the three caveats must be noted: Oops, I forgot to mention the 4th one. What's been my biggest pain in bringing this system on line? Well, the system is sold w/o disks. No problem, I just buy disks on the open market. However, the disk carriers to hold the hot-swap disks in place are a single source item. Only IBM sells it. Well, the reason the 704 is currently so cheap is that IBM has discontinued it and resellers are offering it marked down. IBM looked at the product as ending it's lifetime, but they have currently been swamped with reseller's customers wanting a dozen carriers per server --- a demand they did not anticipate (they were discontinuing the servers right? ). Well, I have had to wait 4 weeks for carriers to come in. Their price was only $22/carrier -- not too expensive, although they are just a piece of plastic. However, the real hassle has been the wait and not knowing when they would arrive. The situation was so critical that I saw some carriers auctioned on Onsale for in excess of $100 a piece -- sheesh. I guess the lesson is to always be vigilant in the commodity world. Watch-out for hidden sole-sources and make sure you don't end up in a long wait-state waiting for some minor, but crucial component. cheers, -john .w cobb John W. Cobb cobbjw@ornl.gov Spallation Neutron Source V. 423.576.5439 Oak Ridge National Laboratory F. 423.576-3041 MS-8218 Oak Ridge, TN 37831-8218 http://www.ornl.gov/sns/ Talk to teach, Listen to learn. From rgb@phy.duke.edu Fri, 16 Oct 1998 13:20:16 -0400 Date: Fri, 16 Oct 1998 13:20:16 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different CPUs On Fri, 16 Oct 1998, Prachya Chalermwat wrote: > Hi, > > Anyone knows how I can get the peak MFLOPS of different type of CPUs? One used to use the Linpack benchmark (available from netlib?) for this purpose and probably still do, although: a) vendors started tuning CPU design and compilers for the linpack; b) if you simply fill a loop with a multiply, divide, add and subtract, time the loop, time the empty loop, subtract and divide the time you'll often get FLOPS that is very DIFFERENT from the linpack FLOPS rating; c) raw FLOPS measured either of these ways doesn't necessarily test lots of things that affect real floating point performance, like cache size or design, memory bus speed, pipelining efficiency; d) even measures of all of these things don't show how well as system will perform on trancendentals - some vendors do trancendentals in hardware, others in software and there can be a large difference; e) finally, raw FLOPS (even measured in ways that account for all the above) are still only part of the "floating point performance equation". Integer performance matters, as most programs do a mix of float and integer operations. So does the efficiency of the operating system, as scheduling, context switches, interrupt handling, processor affinity and systems calls in general affect perceived floating point performance. For all of these reasons FLOPS (and IPS) stopped being used as a common measure of CPU/system performance a decade or so ago. Far more common today are SPECmarks -- see www.specbench.org. If your organization belongs to or joins the SPEC consortium, I believe that you can run your own SPECmarks, which by now is a fairly involved suite that tests lots of different code mixes and yields data that can be interpreted in terms of the quality of many of the various subsystems that relate to overall performance. Sometimes one can find a particular task in the suite that closely resembles what you want to accomplish, which is also useful when selecting a system or processor family. Nevertheless, my experience is that even venerable and faulty as the MIPS and MFLOPS ratings are, they are nevertheless remarkably useful as measures of RELATIVE performance. Of course, so is the CPU clock, and for similar reasons. That is, although there are obviously exceptions for particular applications or task mixes, in many cases if one compares the MIPS/MFLOPS and SPEC(int/fp) rating of two different systems, the RELATIVE performance of the systems will be reasonably well represented by either of them, and will in most cases will scale fairly obviously with system clock. By reasonably well, I mean that it is very rare indeed that they will differ by as much as a factor of 2, which is negligible on a log scale:-) Aside from comparing MIPS/MFLOPS or SPECint/SPECfp performance of different CPU families or systems, the next best thing to do is just benchmark YOUR code. After all, I don't much care how fast my system runs the linpack. What I care about is how fast it runs my application. It is the one benchmark that cannot be wrong or irrelevant. IHTH rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From kragen@pobox.com Fri, 16 Oct 1998 13:59:11 -0400 Date: Fri, 16 Oct 1998 13:59:11 -0400 From: Kragen kragen@pobox.com Subject: High availability? On Fri, 9 Oct 1998, Caskey L. Dickson wrote: > If you use something like CODA to distribute the data to the worker nodes > you should have better performance than with vanilla NFS. I don't know > what the status of CODA on Alpha is, however. The current status of Coda is that it's about as fast as NFS in most ways, but an order of magnitude slower in some ways. It requires a lot more babysitting than NFS, because it's a lot more complex and a lot less mature. Eventually Coda or something similar will be the way to go. Not yet, though. (Hey, something similar to all-cache-NUMA, but for disks, would be neat. The idea is that whenever a CPU wants to access a page of memory (disk in this case), it migrates from wherever it currently lives onto that CPU's disk, and its current owner gives up ownership of it. This would give you the better aggregate bandwidth that local disks give you, while giving you the better manageability that NFS gives you. Its major disadvantage is that a single crashed machine means a whole hung cluster, and a single corrupted disk means a whole destroyed filesystem. Has someone already done this? Surely!) Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From johns@cacr.caltech.edu Fri, 16 Oct 1998 14:55:34 -0400 Date: Fri, 16 Oct 1998 14:55:34 -0400 From: John Salmon johns@cacr.caltech.edu Subject: strange behavior with mpirun < file I'm seeing some strange behavior when redirecting stdin for mpirun. This is on Linux(kernel 2.1.103, bash 1.14.7, glibc). When I say: mpirun -nolocal -machinefile machines prog < existing_file I seem to lose output to stdout. I have added explicit 'fflush(stdout)' calls in prog (even where I think they should be unnecessary), and in particular, there is one immediately before MPI_Finalize(), but to no avail. If I do not redirect stdin, I see my stdout output. Similarly, if I do not say -nolocal, I also see my stdout output. Output to other open files (explicitly opened with fopen()) seems to indicate that the program finishes normally, but stdout is disappearing down a black hole. Is this a known bug? Some background may be relevant. Why am I doing this? Because I was calling mpirun in a script and I found that if I did not redirect stdin and tried to put the script in the background with '&', the whole thing would hang with, e.g.,: % ./dotimings & % jobs [1]+ Stopped (tty input) ./dotimings This is a well-known symptom of rsh looking for terminal input even though none is available. The conventional treatment is to give it /dev/null as stdin. So I tried using mpirun -np 2 -machinefile machines prog < /dev/null inside the script and I got the same strange lack of stdout. The same problem happens up wtih normal (not /dev/null) files used for redirection. Thanks in advance, John Salmon From gropp@mcs.anl.gov Fri, 16 Oct 1998 15:19:05 -0400 Date: Fri, 16 Oct 1998 15:19:05 -0400 From: William Gropp gropp@mcs.anl.gov Subject: [MPI #3853] strange behavior with mpirun < file At 11:54 AM 10/16/98 -0700, John Salmon wrote: > >I'm seeing some strange behavior when redirecting stdin for mpirun. >This is on Linux(kernel 2.1.103, bash 1.14.7, glibc). > >When I say: > >mpirun -nolocal -machinefile machines prog < existing_file All mpirun is doing is using rsh (or other remote command) to invoke the program. I tried rsh collage01 cat < f.f on our LINUX system, and that worked ok. I also tried this within a shell script: !# /bin/sh rsh collage01 cat and then invoked with foo.sh < f.f So my first question, does that work for you? (Using the rsh that mpirun is using, which you can get with the -t option). If it does, we'll have to dig deeper. Bill From josip@icase.edu Fri, 16 Oct 1998 15:28:32 -0400 Date: Fri, 16 Oct 1998 15:28:32 -0400 From: Josip Loncaric josip@icase.edu Subject: MFLOPS of different CPUs Prachya Chalermwat wrote: > > Anyone knows how I can get the peak MFLOPS of different type of CPUs? Peak (as in "guaranteed not to exceed") for Pentium II is MHz/2 (i.e. 400MHz Pentium II will not exceed 200 MFLOP/s). Linpack (N=100) performance of Pentium Pro 200MHz is 62 MFLOP/s (see http://performance.netlib.org/performance/html/PDSsearch.html). Matrix multiply result for a cluster of Pentium II 350MHz CPUs of 135MFLOP/s/CPU was reported by Giovanni Scalmani on this mailing list. Memory bandwidth can become a bottleneck, and then the STREAMS results apply (see http://www.cs.virginia.edu/stream/standard/MFLOPS.html for a complete list): > STREAM Memory MFLOPS --- John D. McCalpin, mccalpin@cs.virginia.edu > Revised to Wed Jun 10 11:19:07 PDT 1998 > > All results are in MFLOPS --- 1 MFLOPS=10^6 FP OPS/sec > > ---------------------------------------------------------- > Machine ID ncpus SCALE ADD TRIAD > ---------------------------------------------------------- > > Nightshade_NC440BX_350 1 15.9 12.5 21.4 > Generic_440BX_400 1 19.2 15.1 26.3 > Generic_440BX_350 1 17.5 13.7 22.6 There are other sources of CPU information (e.g. http://infopad.eecs.berkeley.edu/CIC/) and performance measurement methods (e.g. http://www.scl.ameslab.gov/Projects/HINT/HINThomepage.html) but if all you want is the "guaranteed not to exceed" peak performance number, MHz/2 is about right for Pentium IIs. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 403 http://www.icase.edu/~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From warnes@biostat.washington.edu Fri, 16 Oct 1998 15:54:45 -0400 Date: Fri, 16 Oct 1998 15:54:45 -0400 From: Gregory R. Warnes warnes@biostat.washington.edu Subject: High availability? On Fri, 16 Oct 1998, Kragen wrote: > Its major disadvantage is that a single crashed machine means a whole > hung cluster, and a single corrupted disk means a whole destroyed ^^^^^^^^^^^^^^^^^^^^^^^ > filesystem. Not necessarily, if you do RAID-style data-striping. It should be possible to keep extra copies of the data on several machines (provided there is space) with the current "canonical" copy moving around. In fact, in any reasonable implementation you would likely get this for short periods anyway -- since you are not going to reformat the sectors you just transfered to another machine.. ------------------------------------------------------------------------------- Gregory R. Warnes | It is high time that the ideal of success warnes@biostat.washington.edu | be replaced by the ideal of service. | Albert Einstein ------------------------------------------------------------------------------- From kragen@pobox.com Fri, 16 Oct 1998 16:01:22 -0400 Date: Fri, 16 Oct 1998 16:01:22 -0400 From: Kragen kragen@pobox.com Subject: High availability? On Fri, 16 Oct 1998, Gregory R. Warnes wrote: > On Fri, 16 Oct 1998, Kragen wrote: > > Its major disadvantage is that a single crashed machine means a whole > > hung cluster, and a single corrupted disk means a whole destroyed > ^^^^^^^^^^^^^^^^^^^^^^^ > > filesystem. > > Not necessarily, if you do RAID-style data-striping. It should be > possible to keep extra copies of the data on several machines (provided > there is space) with the current "canonical" copy moving around. In > fact, in any reasonable implementation you would likely get this for > short periods anyway -- since you are not going to reformat the sectors > you just transfered to another machine.. Yes, and then you have to worry about consistency in case of crashes, etc. It's definitely a soluble problem, but it's a problem of significant size. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From josip@icase.edu Fri, 16 Oct 1998 16:14:56 -0400 Date: Fri, 16 Oct 1998 16:14:56 -0400 From: Josip Loncaric josip@icase.edu Subject: MFLOPS of different CPUs Josip Loncaric wrote: > > Prachya Chalermwat wrote: > > > > Anyone knows how I can get the peak MFLOPS of different type of CPUs? > > Peak (as in "guaranteed not to exceed") for Pentium II is MHz/2 (i.e. > 400MHz Pentium II will not exceed 200 MFLOP/s). Let me rephrase this: peak MFLOPS rating for Pentium II is equal to its MHz rating (see http://www.eng.yale.edu/it/doc/sc97/default.htm). Last year, Jack Dongarra actually reported seeing 250 MFLOPS on the Pentium II, out of 266 MFLOPS theoretical maximum. The fact that the L2 cache runs at only half the processor speed on Pentium II has less impact on the peak MFLOPS than I thought. A table of theoretical and observed peak MFLOPS can be found at http://www.cs.utk.edu/~rwhaley/ATL/INDEX.HTM#373 (listing various processors). Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 403 http://www.icase.edu/~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From lindahl@cs.virginia.edu Fri, 16 Oct 1998 16:17:48 -0400 Date: Fri, 16 Oct 1998 16:17:48 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: High availability? > (Hey, something similar to all-cache-NUMA, but for disks, would be > neat. The idea is that whenever a CPU wants to access a page of memory > (disk in this case), it migrates from wherever it currently lives onto > that CPU's disk, and its current owner gives up ownership of it. This > would give you the better aggregate bandwidth that local disks give > you, while giving you the better manageability that NFS gives you. Sprite had a better trick -- let the file be cached everywhere until someone opens it for write. Then destroy all the caches and have everyone read and write to/from only the "master" copy. > Its major disadvantage is that a single crashed machine means a whole > hung cluster, and a single corrupted disk means a whole destroyed > filesystem. The first is easy to work around in the design, if you want to. The second requires more cleverness. -- g From kragen@pobox.com Fri, 16 Oct 1998 16:21:27 -0400 Date: Fri, 16 Oct 1998 16:21:27 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different CPUs On Fri, 16 Oct 1998, Robert G. Brown wrote: > Nevertheless, my experience is that even venerable and faulty as the > MIPS and MFLOPS ratings are, they are nevertheless remarkably useful as > measures of RELATIVE performance. Of course, so is the CPU clock, and > for similar reasons. That is, although there are obviously exceptions > for particular applications or task mixes, in many cases if one compares > the MIPS/MFLOPS and SPEC(int/fp) rating of two different systems, the > RELATIVE performance of the systems will be reasonably well represented > by either of them, and will in most cases will scale fairly obviously > with system clock. By reasonably well, I mean that it is very rare > indeed that they will differ by as much as a factor of 2, which is > negligible on a log scale:-) I've been reading about the Tera MTA. It appears to be about as fast as a 14-processor Cray T90 (the two-processor 255-MHz MTA, that is, the only one so far built) although the T90 runs at 440 MHz. Tera's press releases claim that their machines should be able to perform about an order of magnitude better than traditional machines that have the same peak MFLOPS rating. Very interesting architecture. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From hahn@coffee.psychology.mcmaster.ca Fri, 16 Oct 1998 16:33:46 -0400 Date: Fri, 16 Oct 1998 16:33:46 -0400 From: Mark Hahn hahn@coffee.psychology.mcmaster.ca Subject: performance of new, small-scale switches? has anyone measured the performance of this new crop of small-scale switches? I'm talking about 4-16-port 10/100 switches that actually cost only around $50/port. I'm guessing that they're cheap because they're not expandable, based on a new generation of highly integrated controllers. that doesn't imply that they're slow or limited in bandwidth, though. anyone have numbers? thanks, mark hahn. -- operator may differ from spokesperson. hahn@coffee.mcmaster.ca http://java.mcmaster.ca/~hahn From joelja@darkwing.uoregon.edu Fri, 16 Oct 1998 19:09:42 -0400 Date: Fri, 16 Oct 1998 19:09:42 -0400 From: Joel Jaeggli joelja@darkwing.uoregon.edu Subject: performance of new, small-scale switches? We have ordered several 4 port asante 10/100 switches. While I'n not going to use them in a beowulf cluster (we use bay networks 350T's for that) I can certainly report on their performance when they arrive... Asante claims a gigabit backplane on the 4 port switch which would imply full wirespeed on all 4 ports fdx. joelja On Fri, 16 Oct 1998, Mark Hahn wrote: > has anyone measured the performance of this new crop of small-scale > switches? I'm talking about 4-16-port 10/100 switches that actually > cost only around $50/port. I'm guessing that they're cheap because > they're not expandable, based on a new generation of highly integrated > controllers. that doesn't imply that they're slow or limited in bandwidth, > though. anyone have numbers? > > thanks, mark hahn. > -- > operator may differ from spokesperson. hahn@coffee.mcmaster.ca > http://java.mcmaster.ca/~hahn > -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From eugene.leitl@lrz.de Sat, 17 Oct 1998 06:19:40 -0400 Date: Sat, 17 Oct 1998 06:19:40 -0400 From: root eugene.leitl@lrz.de Subject: performance of new, small-scale switches? Mark Hahn writes: > has anyone measured the performance of this new crop of small-scale > switches? I'm talking about 4-16-port 10/100 switches that actually > cost only around $50/port. I'm guessing that they're cheap because > they're not expandable, based on a new generation of highly integrated > controllers. that doesn't imply that they're slow or limited in bandwidth, > though. anyone have numbers? Oh, yes, that's an interesting one. If they have around 1 Gbit total bandwidth and can do cut-through at few us switching latency, they would seem to have advantages as compared a 3..4 Tulip NIC/node 8-node design I'm contemplating to do. Pray also do mention the maker/model! Regards, Eugene From hjstein@bfr.co.il Sat, 17 Oct 1998 16:19:44 -0400 Date: Sat, 17 Oct 1998 16:19:44 -0400 From: Harvey J. Stein hjstein@bfr.co.il Subject: PVM build under glibc problems "Walter B. Ligon III" writes: > BTW, why are you rebuilding PVM? There are rpms of PVM already > built and running for LINUX. I only rebuilt PVM to put in my "no > head node" feature, that's why I happen to be aware of the > problems. What's this "no head node" feature all about? -- Harvey J. Stein BFM Financial Research hjstein@bfr.co.il From pu@ku.ac.th Sun, 18 Oct 1998 05:04:54 -0400 Date: Sun, 18 Oct 1998 05:04:54 -0400 From: Dr.Putchong Uthayopas pu@ku.ac.th Subject: MFLOPS of different CPUs Hi, We have a data that we obtain from running linpackc on several processors, I will post that on the web and inform you about the site later. Roughly, it is about this PP200 49 MFlops PII233 50-51 MFlops PII300 59 Mflops R10000 (on SGI Powerchallenge) = 80 Mflops RS6000 (77Mhz CPU on our SP2) = 60 Mflops Putchong Uthayopas Parallel Research Group Kasetsart University, Bangkok, Thailand. Prachya Chalermwat wrote: > Hi, > > Anyone knows how I can get the peak MFLOPS of different type of CPUs? > Thanks. > > --Prachya > > --------------------------------------------------------------------------- > Prachya Chalermwat George Mason University > Graduate Research Assistant (703) 993-4322 > Computational Sciences and Informatics (FAX) 993-1980 > George Mason University Email: prachya@science.gmu.edu > 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 > --------------------------------------------------------------------------- > URL: http://spaceops.science.gmu.edu > "Imagination is more important than knowledge." A. Einstein From bob@drzyzgula.org Sun, 18 Oct 1998 19:05:02 -0400 Date: Sun, 18 Oct 1998 19:05:02 -0400 From: Bob Drzyzgula bob@drzyzgula.org Subject: Commodity RAID pricing (Was Re: High availability?) On Fri, Oct 16, 1998 at 12:23:02PM -0500, John W. Cobb wrote: > ...The situation was so > critical that I saw some carriers auctioned on Onsale for in excess of $100 > a piece -- sheesh... You know, this is kind of interesting, given how much stuff Computer Geeks supplies to Onsale. I had seen that server deal also, and finally decided against it because it was just sooo big. I don't recall any of the carriers being sold through Computer Geeks themselves. --Bob -- ============================================================ Bob Drzyzgula It's not a problem bob@drzyzgula.org until something bad happens ============================================================ From rgb@phy.duke.edu Sun, 18 Oct 1998 23:01:45 -0400 Date: Sun, 18 Oct 1998 23:01:45 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different CPUs On Fri, 16 Oct 1998, Kragen wrote: > I've been reading about the Tera MTA. It appears to be about as fast > as a 14-processor Cray T90 (the two-processor 255-MHz MTA, that is, the > only one so far built) although the T90 runs at 440 MHz. > > Tera's press releases claim that their machines should be able to > perform about an order of magnitude better than traditional machines > that have the same peak MFLOPS rating. > > Very interesting architecture. How does it work? Massive parallelism on chip? A ten or twenty step pipeline? How does it run on von Neumann code where parallelism isn't much of an advantage? How are they going to keep such a beast fed? Enquiring minds want to know... rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From bill@math.ucdavis.edu Sun, 18 Oct 1998 23:38:15 -0400 Date: Sun, 18 Oct 1998 23:38:15 -0400 From: Bill Broadley bill@math.ucdavis.edu Subject: MFLOPS of different cpu's >How does it work? Massive parallelism on chip? A ten or twenty step >pipeline? How does it run on von Neumann code where parallelism isn't >much of an advantage? How are they going to keep such a beast fed? > >Enquiring minds want to know... Basically the tera does a thread switch every cycle, so the processor state is replicated for the number of supported threads (I don't know how many that is offhand). (I.e. registers, and other state are replicated.) So to the thread the world looks like infinitely fast memory, i.e. 1 cycle to load. Hitting peak performance requires only that you have enough threads to run. It greatly simplies the programming, compilers, OS etc. You can ignore multiple levels of the memory hierarchy, blocking, prefetching, branch prediction, cache lines, unrolling loops, inlining procedures, alignment, pages, register renaming etc. So given enough threads whatever the thread does will be ready by the time the thread makes it through the queue again. On a similiar vein Digital and IBM are exploring SMT which is similiar except multiple threads are running on a single cpu sharing all resources on the chip. I.e. Any instruction from any thread can go to any functional unit. Digitals paper in particular mentions up to 5.4 intructions per cycle on a mix of TeX and spec benchmarks. -- Bill Broadley From K.Gurney@sheffield.ac.uk Mon, 19 Oct 1998 03:59:31 -0400 Date: Mon, 19 Oct 1998 03:59:31 -0400 From: Dr Kevin Gurney K.Gurney@sheffield.ac.uk Subject: Beowulf as general compute server? I am new to this group and am seeking advice about building Linux-based compute servers. I will first explain my problem. We are a largely Macintosh-based Deprament but several of us use Matlab and simulink in our teaching. Mathworks has apparently no intention of continued support for Matlab after rev 5.1. One solution would be to install a new class with 30 PCs running Linux, but I am not going to win this battle with my colleagues ....we have only recently upgraded the lab to new Macs. Having learned of the Beowulf project I am was wondering if there may not be another solution: To install a smallish (6-10 node) Beowulf cluster and use this to serve the Macs which could run PowerPC linux or the exodus Xserver software. We could then run Matlab (and a host of other Unix goodies that I can think of such as the genesis and Stuttgart neural simulators) on the cluster with no problems. My question is whether this is a suitable use of a Beowulf machine? Would it support well a basically multi-user interactive environment (rather than batch processing of large concurrent problems for which I guess it was intended)? I can see no reason why not but I need to armed with some facts before I approach the dept. with a proposal. Does anyone have ideas on the number of nodes I would need? (My estimate is a pure guess) - and a ball-park figuer for the cost of themachine? I would also be garteful for any information on possible application to large scale problems in neural modelling. If we could use this a research machine too, the request might carry more weight. many thanks for your help Kevin Gurney ----------------------------------- Dr. Kevin Gurney Dept. of Psychology , University of Sheffield, Sheffield S10 2TP, UK. K.Gurney@shef.ac.uk Tel: 0114 222 6566 Fax: 0114 276 6515 From mccsnrw@afs.mcc.ac.uk Mon, 19 Oct 1998 04:44:24 -0400 Date: Mon, 19 Oct 1998 04:44:24 -0400 From: mccsnrw@afs.mcc.ac.uk mccsnrw@afs.mcc.ac.uk Subject: Beowulf as general compute server? Kevin, First of all I have no experience running Matlab on linux at all, so I don't know the answer. Let me ask a couple of questions, though. 1- do you expect to use the cluster as a parallel machine, or rather as one server/user? This has to do with compute intensity .... 2- If it is parallel(isable), how much time are the tasks expected to take? 3- If not, how many simultaneous users do you expect? 4- There are some nice pieces of scheduling software I have seen on the list, that will send a taks to the least busy host 5- generically, it should be possibe! Niels -- Dr Niels R. Walet http://www.phy.umist.ac.uk/Theory/people/walet.html Dept. of Physics, UMIST, P.O. Box 88, Manchester, M60 1QD, U.K. Phone: +44(0)161-2003693 Fax: +44(0)161-2004303 Niels.Walet@umist.ac.uk From Eugene.Leitl@lrz.uni-muenchen.de Mon, 19 Oct 1998 06:27:26 -0400 Date: Mon, 19 Oct 1998 06:27:26 -0400 From: Eugene Leitl Eugene.Leitl@lrz.uni-muenchen.de Subject: MFLOPS of different cpu's On Sun, 18 Oct 1998, Bill Broadley wrote: > Basically the tera does a thread switch every cycle, so the processor > state is replicated for the number of supported threads (I don't > know how many that is offhand). (I.e. registers, and other state are > replicated.) This is off-topic, but this sounds like a perfectly horrible architecture. Instead of having several simple CPUs on-die which communicate by message-passing, they bloat the die and thus kill the yield by having lots of funky machinery on it. Also, wonder how this design scales into high GHz range. Ok, I shut up now. Regards, Eugene P.S. Aargh! Still no data about new fast $50/port switches. Are they mythical, or what? From rauch@inf.ethz.ch Mon, 19 Oct 1998 07:46:06 -0400 Date: Mon, 19 Oct 1998 07:46:06 -0400 From: Felix Rauch rauch@inf.ethz.ch Subject: High availability? On 13 Oct 1998, Harvey J. Stein wrote: > Checking dejanews (searching for linux nfs performance) yielded a > comment to use the kernel daemon in 2.1.125 + posted patches to > improve NFS performance. Also it seems to be important to use > the knfsd-980922 kernel patch (against 2.1.111 or newer). I tried that (2.1.111 + knfsd-981010) and didn't get much better results. I read a 32 MB file with NFS over 100bT 10 times (the file was in the cache on the server but not on the client) with "-o rsize=4096,wsize=4096". With this configuration I got 7.6 - 7.9 MB/s (M=1000). Thanks for your information anyway. Regards, Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H15 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From cbohn@afit.af.mil Mon, 19 Oct 1998 07:45:46 -0400 Date: Mon, 19 Oct 1998 07:45:46 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: MFLOPS of different cpu's On Sun 10/18/98 11:38 PM, Bill Broadley [bill@math.ucdavis.edu] wrote: > Basically the tera does a thread switch every cycle, so the processor > state is replicated for the number of supported threads (I don't > know how many that is offhand). (I.e. registers, and other state are > replicated.) > So to the thread the world looks like infinitely fast memory, i.e. 1 cycle > to load. Hitting peak performance requires only that you have enough > threads > to run. > It greatly simplies the programming, compilers, OS etc. You can ignore > multiple > levels of the memory hierarchy, blocking, prefetching, branch prediction, > cache lines, unrolling loops, inlining procedures, alignment, pages, > register renaming etc. On Mon 10/19/98 6:27 AM, Eugene Leitl [Eugene.Leitl@lrz.uni-muenchen.de] replied: > This is off-topic, but this sounds like a perfectly horrible architecture. > Instead of having several simple CPUs on-die which communicate by > message-passing, they bloat the die and thus kill the yield by having lots > of funky machinery on it. Also, wonder how this design scales into high > GHz range. On the other hand, this *is* intended to be the processor for a supercomputer, and in traditional supercomputers, cost is not high on the list of concerns. In terms of performance, I'm sure we'd all agree that the interprocess communication performance would be much higher if the message never had to leave the die -- the latency would be almost negligible. I do, however, have to question whether the illusion of 1-cycle memory access would allow the neglect of branch prediction. I'll agree that the other benefits listed are indeed possible if the illusion of 1-cycle memory access is maintained, but poor branch prediction cannot be undone by rapid memory access. There is another aspect that caught me off guard (don't judge my ignorance until after this paragraph). As I read the description, the apparent gain in memory performance and reduction in context switch costs is at the expense of computational performance. If only the registers & other state-holders are replicated, then the reason 1-cycle apparent memory access is available is because the apparent clock cycle is p times longer than the chip's clock cycle, where p is the number of process threads (up to the maximum the chip can hold) -- if the execution units are not also duplicated, then a process that does not require a memory access must still wait anyway, as though it did require a memory access. (This would eliminate the need for registers & cache! This is why register renaming, memory hierarchy, prefetching, etc, can be neglected.) If the execution units are indeed duplicated to avoid stalling the processes unnecessarily, then we are not providing the illusion of 1-cycle memory access. On the other hand, each process *does* need to access memory each cycle ... to fetch an instruction. I still find it hard to swallow that the performance can be improved by slowing down execution, but if they're willing to spend the development money, I won't judge them until after we see actual performance figures. Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology     Phone (937)255-3636 (DSN 785) AFIT/EN638                              Lab x4606   Voicemail x6638 2950 P St, Box 4638                         email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765                 EngrBohn@aol.com                http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* From rgb@phy.duke.edu Mon, 19 Oct 1998 08:17:22 -0400 Date: Mon, 19 Oct 1998 08:17:22 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different cpu's On Sun, 18 Oct 1998, Bill Broadley wrote: > Basically the tera does a thread switch every cycle, so the processor > state is replicated for the number of supported threads (I don't > know how many that is offhand). (I.e. registers, and other state are > replicated.) > > So to the thread the world looks like infinitely fast memory, i.e. 1 cycle > to load. Hitting peak performance requires only that you have enough threads > to run. > > It greatly simplies the programming, compilers, OS etc. You can ignore multiple > levels of the memory hierarchy, blocking, prefetching, branch prediction, > cache lines, unrolling loops, inlining procedures, alignment, pages, > register renaming etc. > > So given enough threads whatever the thread does will be ready by > the time the thread makes it through the queue again. I have to confess that I'm still having difficulty envisioning how such a system can be fed a nonlocal code/data mix if it is as fast as claimed -- what have they done with the CPU -- memory bottleneck? The world may "look" like infinitely fast memory to a thread, but somewhere out there is silicon with a clock and a memory bus and it isn't infinitely fast... Is this something like a MTSD ("multiple thread, single data", to abuse some probably archaic acronyms) design? All the threads get the SAME data (so data access is slow enough that conventional memory buses can keep up) but might do different things with it? If so, doesn't that really slow down the system for anything but very special task mixes? I thought that more than half the effort of engineering a real parallel supercomputer was coming up with a fast enough memory subsystem to keep up... > On a similiar vein Digital and IBM are exploring SMT which is similiar > except multiple threads are running on a single cpu sharing all resources > on the chip. I.e. Any instruction from any thread can go to any functional > unit. Digitals paper in particular mentions up to 5.4 intructions per cycle > on a mix of TeX and spec benchmarks. OK, so instead of having e.g. floating point pipelines, they put a whole SMP architecture on a single chip with a single cache and single set of registers, expanded to hold multiple contexts/threads, and exploit the parallelism available in a heterogeneous task mix to distribute independent threads? If so, a nice idea for a server CPU (and curiously like SPARC with its multiple parallel ALU's and on-chip context switches, but more symmetric) but it isn't clear to me how this will handle data dependencies between threads and the inevitable CPU-memory bottleneck much better than multiple processors in an ordinary SMP system. Easier to program, sure, and just maybe more scalable although at some point you hit a limit of how much you can put on any single wafer. SMP provides a clear scaling route to more processors (effectively more silicon area), with various scaling limits on the interconnect. SMT sounds like it provides more on-chip parallelism (at some cost -- the real estate has to come from somewhere) and has its own scaling limits when one builds an MP SMT. All very interesting. Thanks (to you and the others who have responded) for the information. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From kragen@pobox.com Mon, 19 Oct 1998 09:11:33 -0400 Date: Mon, 19 Oct 1998 09:11:33 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different CPUs On Sun, 18 Oct 1998, Robert G. Brown wrote: > On Fri, 16 Oct 1998, Kragen wrote: > > I've been reading about the Tera MTA. It appears to be about as fast > > as a 14-processor Cray T90 (the two-processor 255-MHz MTA, that is, the > > only one so far built) although the T90 runs at 440 MHz. > > > > Tera's press releases claim that their machines should be able to > > perform about an order of magnitude better than traditional machines > > that have the same peak MFLOPS rating. > > > > Very interesting architecture. > > How does it work? Massive parallelism on chip? A ten or twenty step > pipeline? 11-step pipeline, no memory cache, 100-nanosecond-or-so memory access, hardware context switches after every instruction, hardware support for 128 threads. So each thread only runs one instruction every 20-128 cycles, so memory has quite a few cycles to respond in. So on parallel code, it can run very close to maximum theoretical efficiency. > How does it run on von Neumann code where parallelism isn't > much of an advantage? Very slowly. > How are they going to keep such a beast fed? With six kilowatts. www.tera.com Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From walt@parl.ces.clemson.edu Mon, 19 Oct 1998 09:19:39 -0400 Date: Mon, 19 Oct 1998 09:19:39 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: PVM build under glibc problems -------- > "Walter B. Ligon III" writes: > > > BTW, why are you rebuilding PVM? There are rpms of PVM already > > built and running for LINUX. I only rebuilt PVM to put in my "no > > head node" feature, that's why I happen to be aware of the > > problems. > > What's this "no head node" feature all about? > > -- > Harvey J. Stein > BFM Financial Research > hjstein@bfr.co.il OK, our beowulf machines all have one (or more) nodes set up as "head" nodes or "manager" nodes or whatever. They are connected to the internet for external access of the machine, and also act as servers for the rest of the nodes. We like to perform all of our computation on the other nodes - for a number of reasons. We also like to promote the illusion that the other nodes are not actually separate computers, thus on our systems we rarely if ever actually log into anything but the "head" nodes. PVM insists that the node you start PVM from be included in the virtual machine. We generally like to start the machine from the "head" node, BUT we do NOT want PVM to schedule tasks on that node. I made a 3 line modification to PVM that prevents automatic scheduling to the head node. Tasks can still be scheduled on the head node by specifying its name in the "where" variable of pvm_spawn(). That's what its all about. -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From walt@parl.ces.clemson.edu Mon, 19 Oct 1998 10:12:51 -0400 Date: Mon, 19 Oct 1998 10:12:51 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: PVM build under glibc problems -------- > I was hoping you'd tell me you modified PVM for robustness in the face > of a master PVM daemon outage. Any idea if anyones done that? > > -- > Harvey J. Stein > BFM Financial Research > hjstein@bfr.co.il No, I've not heard of that. We rarely have that problem. Anyway, we ARE working on a new message passing system called BNM which eliminates the master pvmd - and the slave pvmd's for that matters. Essentially it integrates the parallel virtual machine with the operating system, so that as a whole it appears more seamless. BNM is a low-level kind of thing; PVM, MPI, or other things could be implemented on top of it. We HAVE tried to make decisions that would make a robust system. The one daemon we do have is stateless, and can be restarted if it crashes. But, we really haven't been focusing on robustness. Anyway, that's the only thing I know about, maybe someone else out there knows something. Walt -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From cklempay@acm.jhu.edu Mon, 19 Oct 1998 10:16:48 -0400 Date: Mon, 19 Oct 1998 10:16:48 -0400 From: Corbett J. Klempay cklempay@acm.jhu.edu Subject: New JHU ACM cluster Hello all... (my first post..whee!) Due to the coolness of one of our professors, my ACM chapter was given $11k to spend to build a cluster. The hardware is roughly: - 8 PII-350's - 128 MB SDRAM in the 7 compute nodes, 384 in the master - 6.4 GB Quantum Fireballs in the compute nodes - 2 x 9.1 GB IBM 10k RPM USCSI drives (NCR 875-based Promise card) for the master - APC UPS for the master - Tulip-based Netgear Fast cards (2 in the master) - 16 port Netgear Fast switch - lame Trident 9750 4 MB video in the compute nodes (not that they'll be really used) - 17" Hitachi monitor for the master, along with an 8 MB Matrox G200 AGP - Plextor 4x write/12x read SCSI CD-R in the master Our main ACM admin (to keep things straight in his head) made a schematic of what he thinks the layout is supposed to be...see http://www2.acm.jhu.edu/~slipcon/cluster.jpg This equipment should be arriving within the next 2-3 weeks, we think. (each node is going to already be assembled) The thing is, I'll probably be largely responsible for any technical issues (both hardware and software) relating to this cluster. Any advice on anything related to this setup and how it could best be used would be appreciated. My first concern: once this stuff arrives and we physically link everything together, what is the best/proper way to go about doing the installs on these? Also: I have the retail Extreme Linux (which I think was ultimately courtesy of Erik Hendriks)...is this the most current, or should I be doing an FTP install? Also #2: Does anyone know of any other student ACM chapter with a cluster? I'm wondering if we have bragging rights :) Thanks in advance :) ------------------------------------------------------------------------------ Corbett J. Klempay Quote of the Week: http://www2.acm.jhu.edu/~cklempay "Keep your faith in all beautiful things; in the sun when it is hidden, in the Spring when it is gone." PGP Fingerprint: 7DA2 DB6E 7F5E 8973 A8E7 347B 2429 7728 76C2 BEA1 ------------------------------------------------------------------------------ From redhat@admin.big-orange.net Mon, 19 Oct 1998 10:38:17 -0400 Date: Mon, 19 Oct 1998 10:38:17 -0400 From: Redhat redhat@admin.big-orange.net Subject: Beowulf as general compute server? On Mon, 19 Oct 1998 09:00:05 +0000, you wrote: >Having learned of the Beowulf project I am was wondering if there may not >be another solution: To install a smallish (6-10 node) Beowulf cluster and >use this to serve the Macs which could run PowerPC linux or the exodus >Xserver software. A friend of mine installed Linux on a Power Mac and has never regretted it. blaze your trail -- Redhat redhat@admin.big-orange.net "I am become Shiva, destroyer of worlds." From hahn@coffee.psychology.mcmaster.ca Mon, 19 Oct 1998 11:10:08 -0400 Date: Mon, 19 Oct 1998 11:10:08 -0400 From: Mark Hahn hahn@coffee.psychology.mcmaster.ca Subject: small switches, also Tera > > Basically the tera does a thread switch every cycle, so the processor ... > This is off-topic, but this sounds like a perfectly horrible architecture. which shows that you have never done a seriously big parallel program whose main bottleneck was memory latency. this is the most common problem facing "supercomputing" apps, that memory is several hundred issue slots away (and cache is ineffective on large data.) > P.S. Aargh! Still no data about new fast $50/port switches. Are they > mythical, or what? here's what I have so far, pulled from NECX... price ports vendor, model, comments 450 8 allied telesyn at-fs708 8p sw 204 4 asante fs4004e "frag-free or s/f" 1Gb, "wire speed", some mention of 14880 pps forwarding 410 8 asante fs4008e 1100 16 asante fs4016e 273 4 kingston kns400/r 490 8 kingston kns800/r 271 4 linksys exzs44 395 6 linksys exzs66 450 8 linksys 555 8 dlink des-1008 store-forward, 148,800 pps/port(hd) 294 4 netgear fs104 cutthrough vs s/f. 148K pps forward, 11 uS cutthrough; 20 uS 64byte packets 585 8 netgear fs508 945 16 netgear fs516 1.2 gb fabric, 148K pps, 80 uS max 505 8 smc ezswitch I'd very much appreciate anyone who has practical comments on these switches, and/or pointers to real technical specs (rather than the marketing crap most readily accessible...) From kodym@mit.jyu.fi Mon, 19 Oct 1998 11:10:38 -0400 Date: Mon, 19 Oct 1998 11:10:38 -0400 From: Petr Ladislav Kodym kodym@mit.jyu.fi Subject: File size limit for Linux Hi, Could anybody tell me what's the maximum file size limit for Linux on Intel and Alpha platforms ? Many thanks in advance. Petr From scastill@nmsu.edu Mon, 19 Oct 1998 11:13:22 -0400 Date: Mon, 19 Oct 1998 11:13:22 -0400 From: Steven Castillo scastill@nmsu.edu Subject: MFLOPS of different CPUs On Fri, 16 Oct 1998, Josip Loncaric wrote: > Prachya Chalermwat wrote: > > > > Anyone knows how I can get the peak MFLOPS of different type of CPUs? > > Peak (as in "guaranteed not to exceed") for Pentium II is MHz/2 (i.e. > 400MHz Pentium II will not exceed 200 MFLOP/s). > > Linpack (N=100) performance of Pentium Pro 200MHz is 62 MFLOP/s (see > http://performance.netlib.org/performance/html/PDSsearch.html). > > Matrix multiply result for a cluster of Pentium II 350MHz CPUs of > 135MFLOP/s/CPU was reported by Giovanni Scalmani > on this mailing list. > > Memory bandwidth can become a bottleneck, and then the STREAMS results > apply (see http://www.cs.virginia.edu/stream/standard/MFLOPS.html for a > complete list): > > > STREAM Memory MFLOPS --- John D. McCalpin, mccalpin@cs.virginia.edu > > Revised to Wed Jun 10 11:19:07 PDT 1998 > > > > All results are in MFLOPS --- 1 MFLOPS=10^6 FP OPS/sec > > > > ---------------------------------------------------------- > > Machine ID ncpus SCALE ADD TRIAD > > ---------------------------------------------------------- > > > > Nightshade_NC440BX_350 1 15.9 12.5 21.4 > > Generic_440BX_400 1 19.2 15.1 26.3 > > Generic_440BX_350 1 17.5 13.7 22.6 > > There are other sources of CPU information (e.g. > http://infopad.eecs.berkeley.edu/CIC/) and performance measurement > methods (e.g. > http://www.scl.ameslab.gov/Projects/HINT/HINThomepage.html) but if all > you want is the "guaranteed not to exceed" peak performance number, > MHz/2 is about right for Pentium IIs. > > Sincerely, > Josip > Hello Prachya: We were able to achieve 87 mflops on a 333 Mhz PII on LU factorization using a home-brew cyclic, block-column algorithm and Greg Henry's excellent dgemm (http://www.cs.utk.edu/~ghenry/distrib/) on a 1000x1000 double precision matrix. On a parallel version of this same code, we achieved 1.768 gflops on 32 333 Mhz PII's (NMSU Beowulf, http://www.cs.nmsu.edu/pcl/) on a double precision 20,000 x 20,000 matrix. Local column pivoting was used for numerical stability. We did find that the single node performance was strongly affected by the block size of the dgemm. In fact, a block size that was not a power of 2 reduced the single node performance by about 30%. All the results above used a block size of 32. For more information, please see http://emlab2.nmsu.edu/papers/par_LUsolve/ Even better performance could probably be had by using one of the public domain cyclic block torus solvers (scalapack or plapack), but we have not been able to successfully compile and run either of these on our system. Steve Castillo Electromagnetics Laboratory New Mexico State University scastill@nmsu.edu From kragen@pobox.com Mon, 19 Oct 1998 11:16:47 -0400 Date: Mon, 19 Oct 1998 11:16:47 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Robert G. Brown wrote: > I have to confess that I'm still having difficulty envisioning how such > a system can be fed a nonlocal code/data mix if it is as fast as claimed > -- what have they done with the CPU -- memory bottleneck? The world may > "look" like infinitely fast memory to a thread, but somewhere out there > is silicon with a clock and a memory bus and it isn't infinitely fast... It doesn't really look like infinitely-fast memory --- more like memory that usually takes a few cycles to respond. The CPU is VLIW, and you can issue a memory operation each cycle, and specify how many instructions in the future the result will be used. It needs to be at least 5 (IIRC) to make sure your thread doesn't block. The memory units and the CPUs are connected to a packet-switched network that can serve up a memory word per cycle. So multiple fetches can be going on at once, improving the effective memory bandwidth. That's why the memory latency is so long, by the way --- 100ns is a fairly large latency these days, after all. :) > Is this something like a MTSD ("multiple thread, single data", to abuse > some probably archaic acronyms) design? All the threads get the SAME > data Nope. > SMP provides a clear scaling route to more processors > (effectively more silicon area), with various scaling limits on the > interconnect. The Tera architecture is SMP, btw. The smallest Tera they're talking about eventually building will be eight processors and will cost about $5e6, according to the NYTimes article. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From cbohn@afit.af.mil Mon, 19 Oct 1998 11:23:54 -0400 Date: Mon, 19 Oct 1998 11:23:54 -0400 From: Bohn, Christopher A. cbohn@afit.af.mil Subject: g77 problems with Extreme Linux AND RE: which MPI? This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01BDFB74.779F5230 Content-Type: text/plain Good day! I've got the sample Fortran code from the Extreme Linux CD MPICH installation running, by commenting out the calls to MPI_Init & MPI_Finalize in the Fortran code and writing a wrapper in C that calls those functions. I've attached the appropriate files for the curious. <> <> <> [cbohn@abc04 wrapper]$ make g77 -fno-second-underscore -I/usr/mpich/include -c pi3f.f egcs -DHAVE_CONFIG_H -I../../mpid/ch2 -I../../lib/LINUX/ch_p4 -I/usr/mpich/incl ude -DMPI_LINUX -c pi3c.c g77 -o pi3 pi3f.o pi3c.o -L/usr/mpich/lib/LINUX/ch_p4 -lmpi [cbohn@abc04 wrapper]$ mpirun -np 1 pi3 Process 0 of 1 is alive Enter the number of intervals: (0 quits) 5 pi is approximately: 3.1449258640033282 Error is: 0.0033332104135351 Enter the number of intervals: (0 quits) 0 [cbohn@abc04 wrapper]$ Take care, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* ------_=_NextPart_000_01BDFB74.779F5230 Content-Type: application/octet-stream; name="Makefile" Content-Disposition: attachment; filename="Makefile" Content-Location: ATT-0-F576D7845267D21187790000F8058EDD-M AKEFILE # Generated automatically from Makefile.in by configure. ##### User configurable options ##### # This is an example Makefile.in (or Makefile configured with mpireconfig) # for the programs cpi, pi3, and cpilog. ARCH = LINUX COMM = ch_p4 INSTALL_DIR = /usr/mpich CC = egcs F77 = g77 CLINKER = egcs FLINKER = g77 OPTFLAGS = # LIB_PATH = -L/usr/mpich/lib/LINUX/ch_p4 FLIB_PATH = -L/usr/mpich/lib/LINUX/ch_p4 LIB_LIST = -lmpi # INCLUDE_DIR = -I$(INSTALL_DIR)/include ### End User configurable options ### CFLAGS = -DHAVE_CONFIG_H -I../../mpid/ch2 -I../../lib/LINUX/ch_p4 $(OPTFLAGS) $(INCLUDE_DIR) -DMPI_$(ARCH) FFLAGS = -fno-second-underscore $(INCLUDE_DIR) $(OPTFLAGS) #FFLAGS = -fno-second-underscore -DHAVE_CONFIG_H -I../../mpid/ch2 -I../../lib/LINUX/ch_p4 $(OPTFLAGS) $(INCLUDE_DIR) -DMPI_$(ARCH) LIBS = $(LIB_PATH) $(LIB_LIST) FLIBS = $(FLIB_PATH) $(LIB_LIST) #FLIBS = $(FLIB_PATH) -lfmpi $(LIB_LIST) EXECS = pi3 default: pi3 all: $(EXECS) pi3: pi3f.o pi3c.o $(INSTALL_DIR)/include/mpi.h $(INSTALL_DIR)/include/mpif.h $(FLINKER) $(OPTFLAGS) -o pi3 pi3f.o pi3c.o $(FLIBS) clean: /bin/rm -f *.o *~ PI* $(EXECS) .c.o: $(CC) $(CFLAGS) -c $*.c .f.o: $(F77) $(FFLAGS) -c $*.f ------_=_NextPart_000_01BDFB74.779F5230 Content-Type: application/octet-stream; name="pi3c.c" Content-Disposition: attachment; filename="pi3c.c" Content-Location: ATT-1-F676D7845267D21187790000F8058EDD-P I3C.C #include "mpi.h" #include extern void pi3_(); int main(argc,argv) int argc; char *argv[]; { MPI_Init(&argc,&argv); pi3_(); MPI_Finalize(); return 0; } ------_=_NextPart_000_01BDFB74.779F5230 Content-Type: application/octet-stream; name="pi3f.f" Content-Disposition: attachment; filename="pi3f.f" Content-Location: ATT-2-F776D7845267D21187790000F8058EDD-P I3F.F c********************************************************************** c pi.f - compute pi by integrating f(x) = 4/(1 + x**2) c c Each node: c 1) receives the number of rectangles used in the approximation. c 2) calculates the areas of it's rectangles. c 3) Synchronizes for a global summation. c Node 0 prints the result. c c Variables: c c pi the calculated result c n number of points of integration. c x midpoint of each rectangle's interval c f function to integrate c sum,pi area of rectangles c tmp temporary scratch space for global summation c i do loop index c**************************************************************************** subroutine pi3 include 'mpif.h' double precision PI25DT parameter (PI25DT = 3.141592653589793238462643d0) double precision mypi, pi, h, sum, x, f, a integer n, myid, numprocs, i, rc c function to integrate f(a) = 4.d0 / (1.d0 + a*a) c call MPI_INIT( ierr ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr ) print *, 'Process ', myid, ' of ', numprocs, ' is alive' sizetype = 1 sumtype = 2 10 if ( myid .eq. 0 ) then write(6,98) 98 format('Enter the number of intervals: (0 quits)') read(5,99) n 99 format(i10) endif call MPI_BCAST(n,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr) c check for quit signal if ( n .le. 0 ) goto 30 c calculate the interval size h = 1.0d0/n sum = 0.0d0 do 20 i = myid+1, n, numprocs x = h * (dble(i) - 0.5d0) sum = sum + f(x) 20 continue mypi = h * sum c collect all the partial sums call MPI_REDUCE(mypi,pi,1,MPI_DOUBLE_PRECISION,MPI_SUM,0, $ MPI_COMM_WORLD,ierr) c node 0 prints the answer. if (myid .eq. 0) then write(6, 97) pi, abs(pi - PI25DT) 97 format(' pi is approximately: ', F18.16, + ' Error is: ', F18.16) endif goto 10 c 30 call MPI_FINALIZE(rc) 30 stop end ------_=_NextPart_000_01BDFB74.779F5230-- From kragen@pobox.com Mon, 19 Oct 1998 11:30:43 -0400 Date: Mon, 19 Oct 1998 11:30:43 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Capt Bohn, Christopher A. wrote: > I do, however, have to question whether the illusion of 1-cycle memory > access would allow the neglect of branch prediction. I'll agree that the > other benefits listed are indeed possible if the illusion of 1-cycle memory > access is maintained, but poor branch prediction cannot be undone by rapid > memory access. Branch prediction is useful because it avoids pipeline stalls. On the Tera MTA, at the time you execute the branch, the next instruction in your thread hasn't yet entered the pipeline, so branch prediction is unnecessary. > If only the registers & other > state-holders are replicated, then the reason 1-cycle apparent memory access > is available is because the apparent clock cycle is p times longer than the > chip's clock cycle, where p is the number of process threads (up to the > maximum the chip can hold) -- if the execution units are not also > duplicated, then a process that does not require a memory access must still > wait anyway, as though it did require a memory access. (This would > eliminate the need for registers & cache! This is why register renaming, > memory hierarchy, prefetching, etc, can be neglected.) If the execution > units are indeed duplicated to avoid stalling the processes unnecessarily, > then we are not providing the illusion of 1-cycle memory access. On the > other hand, each process *does* need to access memory each cycle ... to > fetch an instruction. I still find it hard to swallow that the performance > can be improved by slowing down execution, but if they're willing to spend > the development money, I won't judge them until after we see actual > performance figures. Performance on single-threaded code is, as you say, extremely poor. But the CPU can still execute one instruction per cycle if it has enough threads to run, and without having to worry about memory latency. If you have 100 threads, each with 10 ms of work to do, then on a Pentium running Linux, each one will probably run for a sequential 10 ms. This is a total of 1 second. During that 10 ms, the CPU spends a lot of time waiting for cache misses. Perhaps the total time it would take each thread to do its work, if it didn't have to wait for the memory, would be only 3 ms. If you ran the same program on the MTA, then the CPU would execute the first instruction of all 100 threads, then the second instruction of all 100 threads, etc. The total time taken up by one thread will be only 3 ms; the thread does see a very slow cycle time, but all 100 threads get done in 300 ms. (This is neglecting things like running on a VLIW CPU where you can do memory access in parallel with other instructions, 64-bit memory access, etc.) Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From walt@parl.ces.clemson.edu Mon, 19 Oct 1998 11:45:22 -0400 Date: Mon, 19 Oct 1998 11:45:22 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: MFLOPS of different cpu's -------- > On Sun, 18 Oct 1998, Bill Broadley wrote: > > > Basically the tera does a thread switch every cycle, so the processor > > state is replicated for the number of supported threads (I don't > > know how many that is offhand). (I.e. registers, and other state are > > replicated.) > > This is off-topic, but this sounds like a perfectly horrible architecture. > Instead of having several simple CPUs on-die which communicate by > message-passing, they bloat the die and thus kill the yield by having lots > of funky machinery on it. Also, wonder how this design scales into high > GHz range. I would have to disagree with you. Granted, the Tera architecture is in a COMPLETELY different class than a MP machine, but it isn't inherently better or worse. The idea behind it is latency hiding. Communication is via shared memory. All modern microprocessors get much of their performance from instruction level parallelism. The problem with ILP is that there are dependencies between instructions for a given program, so often the processor cannot exeucte in parallel. The Tera architecture solves that problem by having multiple independent threads executing all at once. The instructions of each thread are by definition independent of the instructions from the other threads. Thus every cycle there is an instruction ready to execute. As for the die complexity, most of the changes are in register storage - which is pretty simple to extend, and can be made rather robust. In fact there is alot of hardware on current microprocessors that deal with hardware interlock of instruction dependencies that isn't needed in Tera - so one can argue that the hardware is SIMPLER, and probably would reach BETTER yields in mass production. Of course, the weak piont is that Tera MUST have more tasks than processors - empirical studies showed that 4 was about right (and that was the number supported in the prototypes). Finally technology scaling would also be on the side of this design - once again the idea is to remove a bunch of hardware that makes ILP work and replace it with multiple independent tasks that don't need that hardware. This will make critical paths in the chip smaller. Of course, noone will ever build a Tera, because it won't run MS Windows. It is a cool research machine, and some of the ideas HAVE found their way into some newer RISC systems. But it is a very interesting architecture, and one should not dismiss it so lightly. Walt > > Ok, I shut up now. > > Regards, > Eugene > > P.S. Aargh! Still no data about new fast $50/port switches. Are they > mythical, or what? > -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From bill@math.ucdavis.edu Mon, 19 Oct 1998 11:46:46 -0400 Date: Mon, 19 Oct 1998 11:46:46 -0400 From: Bill Broadley bill@math.ucdavis.edu Subject: MFLOPS of different cpu's I didn't intend for this discussion to start up here, I'd suggest moving it to comp.arch, I'll make one final reply... > I have to confess that I'm still having difficulty envisioning how such > a system can be fed a nonlocal code/data mix if it is as fast as claimed > -- what have they done with the CPU -- memory bottleneck? The world may > "look" like infinitely fast memory to a thread, but somewhere out there > is silicon with a clock and a memory bus and it isn't infinitely fast... Well bandwidth is easy, it costs money of course, as opposed to latency which is very hard. I believe tera was designed for the bandwidth to support peak execution, and the latency tolerance through threads to support it. Keep in mind tera's not designed for very fast performance on a single thread, but maximum throughput for a bunch of threads. Just the kind of thing you would expect in a large computing facility doing batch jobs. > Is this something like a MTSD ("multiple thread, single data", to abuse > some probably archaic acronyms) design? All the threads get the SAME > data (so data access is slow enough that conventional memory buses can Not sure what you would do with the same data, but no, each load gets the value of memory of the address it asked for. If theres enough threads to keep the cpu busy in the mean time then in the threads point of view it gets the memory in one cycle. > thought that more than half the effort of engineering a real parallel > supercomputer was coming up with a fast enough memory subsystem to keep > up... Alot of what supercomputers are is fast enough memory and I/O to keep up with peak performance. As opposed to desktops that depend on caches to make up for the 30-1 difference between cpu and memory latency. It's actually when that 30-1 difference kills performance where the supercomputers earn their worth (not to mention problems that need GB's of ram etc). > > On a similiar vein Digital and IBM are exploring SMT which is similiar > > except multiple threads are running on a single cpu sharing all resources > > on the chip. I.e. Any instruction from any thread can go to any functional > > unit. Digitals paper in particular mentions up to 5.4 intructions per cycle > > on a mix of TeX and spec benchmarks. Tera - Switches threads every cycle, up to 128 in hardware, runs 1 instruction from a single thread thats ready to run per cycle. SMT - collects a pool of read to run instructions for say 8 threads, and each cycle uses any mix of those instructions to keep any functional units that are available busy. So A cpu that can do 4 int ops and 2 fp ops every cycle will try to find 6 instructions from the pool of threads that are ready to run. A similiar design without SMT (approximately a 264) tries to find 6 instructions of parallism in a single process. Something thats very hard to do because of the serial nature that most programs are written. SMT allows MUCH more memory latency tolerance then you get with traditional designs. With a 264 every time you branch, or do a dependent operations i.e. a=b+c; b++ You have to look elsewhere for independent instructions, or guess which way a branch goes. With multiple theads it's easy, just look for an independent op in a different thread. Digitals paper "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor" shows that even things like infinitely fast memory, infinite bandwidth caches, turning off branch prediction, etc has a very small effect on an 8 thread cpu. Basically there relatively simple changes to support threads (2 extra cycles in the pipeline, replicated registers, replicated PC counters) allows a cpu to get 2.5 times closer to peak performance. Quite a bang for the buck considering the transistor investment. > and the inevitable CPU-memory > bottleneck much better than multiple processors in an ordinary SMP Well with cpu's in the 1-2 ns per clock range, and dram/sdram and similiar still around 60 ns it's the latency that kills you, multi GB/sec memory systems are feasible to build. Problem is how do you keep a quad issue cpu busy during that 60 ns (240 issue slots) when something important isn't in the cache? If you have 7 other threads you have a really good chance at finding something for the cpu to do. > system. Easier to program, sure, and just maybe more scalable although > at some point you hit a limit of how much you can put on any single > wafer. SMP provides a clear scaling route to more processors Well todays cpu's are in the 7.5-15 million transistor range, next generation chips (21364, IA64, PA8500) all seem to be planning for 100 million transistors or so. Question is how do you speed up todays cpu's significantly with the more transistors? 6 times the transistors for a 264 could let you go with a bigger onchip cache, 8 int and 4 fp functional units. Question is can you keep them busy? What will the performance improvement be? So what to do with the extra transistors? SMT is a possible answer. With promises of 2.5 times the performance just by replicating the registers sounds pretty attractive to me. BTW in my mentioned paper SMP is analyzed as well. Interconnects for SMP are much slower, you run into the same memory latency problems, and you end up with very expensive machines that leave alot of it's transistors idle alot of the time. Keep in mind a SMT solution should cost very little if any more then a cpu. > (effectively more silicon area), with various scaling limits on the > interconnect. SMT sounds like it provides more on-chip parallelism (at > some cost -- the real estate has to come from somewhere) and has its own > scaling limits when one builds an MP SMT. SMT allows cpu's to exploit dramatically more parallism, I don't see any effect (good or bad) when you use multiple of them. To get more performance SMT uses more bandwidth, but in a much easier to deal with way then traditional cpu's (bandwidth limited, not latency limited). From hatridge@straubing.baynet.de Mon, 19 Oct 1998 12:02:37 -0400 Date: Mon, 19 Oct 1998 12:02:37 -0400 From: Jim Hatridge hatridge@straubing.baynet.de Subject: 2 Questions Hi all; A few days ago someone wrote that when you have a mixed Beowulf of different computer speeds, ie 33MHz, 66 MHz etc, the boss computer should be the slowest. Could anyone explain why? Next, Dr. Loncaric wrote that the max MFLOP/s was MHz/2. If I understand this would mean that my little 33MHz should do upto 16.5 MFLOP/s. Is this right? Also if I was to makeup a Beowulf of, say, 10x 33MHz's could I then expect 165 MFLOP/s? Sorry, if all this sounds silly, but we all had to learn sometime :). Thanks! JIM ----------------------------------------- Jim Hatridge Germany hatridge@straubing.baynet.de M$ -- Ghostdriver* on the road to the future! (*German Slang for the guy driving on the wrong side of the road!) From mccsnrw@afs.mcc.ac.uk Mon, 19 Oct 1998 12:03:58 -0400 Date: Mon, 19 Oct 1998 12:03:58 -0400 From: mccsnrw@afs.mcc.ac.uk mccsnrw@afs.mcc.ac.uk Subject: MFLOPS of different cpu's > Of course, noone will ever build a Tera, because it won't run MS Windows. > It is a cool research machine, and some of the ideas HAVE found their way > into some newer RISC systems. But it is a very interesting architecture, > and one should not dismiss it so lightly. > As far as I know UC San Diego bought the first Tera - delivered late last or early this year! This is no vapourware! Niels -- Dr Niels R. Walet http://www.phy.umist.ac.uk/Theory/people/walet.html Dept. of Physics, UMIST, P.O. Box 88, Manchester, M60 1QD, U.K. Phone: +44(0)161-2003693 Fax: +44(0)161-2004303 Niels.Walet@umist.ac.uk From rgb@phy.duke.edu Mon, 19 Oct 1998 12:12:17 -0400 Date: Mon, 19 Oct 1998 12:12:17 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different CPUs On Mon, 19 Oct 1998, Kragen wrote: > 11-step pipeline, no memory cache, 100-nanosecond-or-so memory access, > hardware context switches after every instruction, hardware support for > 128 threads. So each thread only runs one instruction every 20-128 > cycles, so memory has quite a few cycles to respond in. So on parallel > code, it can run very close to maximum theoretical efficiency. > > > How does it run on von Neumann code where parallelism isn't > > much of an advantage? > > Very slowly. > > > How are they going to keep such a beast fed? > > With six kilowatts. > > www.tera.com Thanks, I read all that myself a few minutes ago. I'm especially amused by the water-cooled 6kw (10 if you do lots of I/O:-). This isn't exactly a workstation architecture... So basically, instead of traditional L2 cache, they have devoted all the cache memory to a stack of hardware contexts and have built a custom DMA controller stack that pipelines a stream of memory access requests from the various threads. I wonder how SPARC (with its hardware context switch) would compare if one (re)designed the OS to similarly exploit the latency window on delivery of memory data. I've LONG wondered if one could build a context stack out of L2 cache in software -- the few times I've asked on a list the answer has generally been "no" and that the CPU manufacturer doesn't leave cache programming and operation in the hands of mere mortals. It may be that this is ultimately turns out to be a mistake. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From kragen@pobox.com Mon, 19 Oct 1998 12:17:16 -0400 Date: Mon, 19 Oct 1998 12:17:16 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Eugene Leitl wrote: > On Sun, 18 Oct 1998, Bill Broadley wrote: > > Basically the tera does a thread switch every cycle, so the processor > > state is replicated for the number of supported threads (I don't > > know how many that is offhand). (I.e. registers, and other state are > > replicated.) > > This is off-topic, but this sounds like a perfectly horrible architecture. > Instead of having several simple CPUs on-die which communicate by > message-passing, they bloat the die and thus kill the yield by having lots > of funky machinery on it. I don't know what their yields are like, but yes, they have had a lot of funky hardware problems they didn't really expect. On the other hand, they *have* been able to avoid caches and all sorts of complicated machinery mainstream RISC processors have to use. Hurrah for commodity hardware. :) > Also, wonder how this design scales into high GHz range. We'll probably see in the future. They've got it up to 240MHz now, and hope to get it up to 333MHz soon. > P.S. Aargh! Still no data about new fast $50/port switches. Are they > mythical, or what? Can't find anything on 'em, myself. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From rgb@phy.duke.edu Mon, 19 Oct 1998 12:26:02 -0400 Date: Mon, 19 Oct 1998 12:26:02 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Kragen wrote: > > SMP provides a clear scaling route to more processors > > (effectively more silicon area), with various scaling limits on the > > interconnect. > > The Tera architecture is SMP, btw. The smallest Tera they're talking > about eventually building will be eight processors and will cost about > $5e6, according to the NYTimes article. I suppose the question would be (to bring this back to beowulfism, not that it every really left it): "how much of the idea underlying all this can be implemented in a beowulf design". It sounds like the switching memory architecture has homology to a switched network, for example. Can the synchronicity be profitably emulated? rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From kragen@pobox.com Mon, 19 Oct 1998 13:03:22 -0400 Date: Mon, 19 Oct 1998 13:03:22 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Robert G. Brown wrote: > I suppose the question would be (to bring this back to beowulfism, not > that it every really left it): "how much of the idea underlying all this > can be implemented in a beowulf design". It sounds like the switching > memory architecture has homology to a switched network, for example. > Can the synchronicity be profitably emulated? I think we need to have new processor designs to take advantage of these ideas. The underlying problem is that microprocessors have got faster a lot faster than significant-sized RAM has. You could get 300ns RAM in 1980; you can get 8ns RAM today, only 40 times faster. But the difference between a 4.77MHz 8086 and a 450MHz Pentium-II is a factor of 100 from the clock, a factor of 2 from the larger registers, a factor of 1.5 or so from the superscalarity, and a factor of 4-10 from the pipelining, so maybe conservatively it's 1200-3000 times as fast as the 8086 --- and that's only on integer stuff. So the amount of stuff a CPU can do in a given number of RAM cycles has increased by a factor of 40, conservatively. (The RAM is four times as wide, so it reduces to a factor of 10.) So the question is: how do we cope with this? Any time the relative performance of two important parts of a computer changes by an order of magnitude, it becomes likely that the strategies for achieving optimal performance will change. What architecture will best take advantage of the extremely slow RAM of today? 10ns is enough time for a theoretical 500MHz Pentium II to do ten or so operations. Something as radically different as the Tera MTA seems to be called for. The need is compelling at the high end, where scalability is hitting brick walls in some applications. But even my desktop PC is limited by its RAM speed. In a few years, when optical memory becomes a commodity, we'll have to rethink everything again. Our fastest memory system will be durable again, as it hasn't been since the 1970s with core memory, and it will have bandwidths an order of magnitude higher than current memory bandwidths. It may be WORM, which will necessitate a *lot* of changes in the way we do things. It will have capacities several orders of magnitude larger than our current disks. Mechanical disks will become obsolete, the last mechanical part of the computer. Suddenly filesystem designs tuned for seek times in which a megabyte of data could have been transferred will become obsolete; transaction-processing will become a simple matter of programming, instead of a slow hardware-limited process used only when it was absolutely necessary; programs written in low-level languages that assumed they were running in RAM, not WORM memory, will need to be rewritten; and a thousand other things we can't imagine. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From prachya@science.gmu.edu Mon, 19 Oct 1998 13:10:57 -0400 Date: Mon, 19 Oct 1998 13:10:57 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: MFLOPS of different CPUs Hi all, Thanks for all your very useful information and discussions. Although the peak MFLOPS is sometimes meaningless, it is still something that we all want to be close to. Question! If we have N pipelines (FP units), 1 FLOPS/cycle, and C clock speed, will we have a peak of N flop/cycle x C cycles/sec FLOPS (theoretically)? What are the limitations of increasing the number of FP units besides the data depency problem? Thanks. --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From kragen@pobox.com Mon, 19 Oct 1998 13:17:35 -0400 Date: Mon, 19 Oct 1998 13:17:35 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Walter B. Ligon III wrote: > mass production. Of course, the weak piont is that Tera MUST have more > tasks than processors - empirical studies showed that 4 was about right (and > that was the number supported in the prototypes). I'm not quite sure what you're saying. 4 tasks is about right? The current Tera prototype at SDSC needs to have at least 21 runnable threads to keep from spending time idle. > Of course, noone will ever build a Tera, because it won't run MS Windows. > It is a cool research machine, Maybe this is a joke? There's a prototype Tera MTA being evaluated at SDSC, and several more at Tera headquarters. MS Windows is not a requirement when your target market is the folks who are looking to replace their Cray T90, but of course *you* know that --- you've been using such systems for years! > and some of the ideas HAVE found their way into some newer RISC systems. Which ones? I'd like to read about them. Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From hanzl@noel.feld.cvut.cz Mon, 19 Oct 1998 13:42:53 -0400 Date: Mon, 19 Oct 1998 13:42:53 -0400 From: Vaclav Hanzl hanzl@noel.feld.cvut.cz Subject: Beowulf as general compute server? Dear Kevin, > Having learned of the Beowulf project I am was wondering if there may not > be another solution: To install a smallish (6-10 node) Beowulf cluster and > use this to serve the Macs which could run PowerPC linux or the exodus > Xserver software. > We could then run Matlab ... on the cluster with no problems. I think your solution is perfectly possible. We run Matlab 5.2 on our MAGI cluster without problem. Of course every Matlab process runs on one node only and there seemes to be no parallel solution available in the near future; the best one can do is to reserve the node for Matlab only (and select least loaded nodes for new processes). Fortunately all this can be done in script replacing 'matlab', you do not care too much if it called something like 'rsh node7 DISPLAY=$DISPLAY real_matlab'. Possible number of students per node depends on what they do, my guess would be something from 2 (for experienced ones) to 4 (for novices). Possible contras might be: - generally higher price of Matlab licenses on any UNIX (Of course there still is Octave, which is free.) - care about security of your cluster - that slight hassle of creating node-selection script Bye Vaclav Hanzl +-----------------------------------------------------------------------+ | Czech Technical University in Prague fax: (+420 2) 243 10 784 | | Faculty of Electrical Engineering, K331 or (+420 2) 311 1786 | | Technicka 2 | | 166 27 Prague 6, Czech Republic email: hanzl@noel.feld.cvut.cz | +-----------------------------------------------------------------------+ | Our cluster supercomputer page is at http://noel.feld.cvut.cz/magi | +-----------------------------------------------------------------------+ From bill@math.ucdavis.edu Mon, 19 Oct 1998 13:44:46 -0400 Date: Mon, 19 Oct 1998 13:44:46 -0400 From: Bill Broadley bill@math.ucdavis.edu Subject: MFLOPS of different CPUs On Mon, Oct 19, 1998 at 01:10:54PM -0400, Prachya Chalermwat wrote: > Hi all, > > Thanks for all your very useful information and discussions. Although the > peak MFLOPS is sometimes meaningless, it is still something that we all > want to be close to. > > Question! If we have N pipelines (FP units), 1 FLOPS/cycle, and C clock > speed, will we have a peak of > > N flop/cycle x C cycles/sec FLOPS (theoretically)? > > What are the limitations of increasing the number of FP units besides the > data depency problem? Transistors for extra FP units Register pressure (to keep N functional units busy) L1 cache latency/bandwidth (if your not 100% in registers) L2 cache latency/bandwidth (if you don't fit in L1) Memory latency/bandwidth (if you don't fit in L2) Disk latency/bandwidth (if you don't fit in memory) Cpu's are getting better at tolerating latency with speculative execution, out of order execution, branch prediction, but still rarely on production codes get more then 1 flop a cycle (for desktop cisc/risc anyways). Supercomputers like vector and tera's basically design in enough memory bandwidth to keep all the floating point units busy for any problem that fits in ram. They don't always run at peak mflops, but often it's a large percentage. Getting even 50% of a 21164's peak mflops (2 * clock speed) is quite a feat even with cache friendly code. Even 10% is impressive if it's from main memory. From walt@parl.ces.clemson.edu Mon, 19 Oct 1998 13:44:59 -0400 Date: Mon, 19 Oct 1998 13:44:59 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: MFLOPS of different cpu's -------- > On Mon, 19 Oct 1998, Walter B. Ligon III wrote: > > mass production. Of course, the weak piont is that Tera MUST have more > > tasks than processors - empirical studies showed that 4 was about right (and > > that was the number supported in the prototypes). > > I'm not quite sure what you're saying. 4 tasks is about right? The > current Tera prototype at SDSC needs to have at least 21 runnable > threads to keep from spending time idle. The early research (early 90's) indicated 4 tasks per processor hid the latency - I don't know anything about the current commercial model. The number depends on alot of stuff that may have changed since the earlier results were published. > > Of course, noone will ever build a Tera, because it won't run MS Windows. > > It is a cool research machine, > > Maybe this is a joke? There's a prototype Tera MTA being evaluated at > SDSC, and several more at Tera headquarters. MS Windows is not a > requirement when your target market is the folks who are looking to > replace their Cray T90, but of course *you* know that --- you've been > using such systems for years! Yeah, it's a joke. Of course its no joke that the likelyhood that this company will survive in the supercomputer market is rather low. Once again if it ran MS Windows maybe more people (other than those who don't care about MS Windows) would buy them creating a biger market. Maybe if someone started writing video games for them ... :-) > > and some of the ideas HAVE found their way into some newer RISC systems. > > Which ones? I'd like to read about them. Manoj Franklin has been doing something called "multiscalar" architecture which also has hardware support for tasks and uses the inherent independence of instructions from different tasks to hide latency. He also uses some techniques from the VLIW stuff and traditional superscalar ILP techniques. I'm not much of an expert on all that - but it's pretty cool stuff. > > Kragen > > -- > Kragen Sitaker > A well designed system must take people into account. . . . It's hard to > build a system that provides strong authentication on top of systems that > can be penetrated by knowing someone's mother's maiden name. -- Schneier > Walt -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From kragen@pobox.com Mon, 19 Oct 1998 13:55:54 -0400 Date: Mon, 19 Oct 1998 13:55:54 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Walter B. Ligon III wrote: > > On Mon, 19 Oct 1998, Walter B. Ligon III wrote: > Yeah, it's a joke. Of course its no joke that the likelyhood that this > company will survive in the supercomputer market is rather low. Once > again if it ran MS Windows maybe more people (other than those who don't > care about MS Windows) would buy them creating a biger market. It may be the case that the likelihood that *any* company will survive in the supercomputer market is rather low. Praise the lord and pass the commodity processors. > Maybe > if someone started writing video games for them ... :-) Hmm, this could be pretty cool. What could you do with a machine 150 times as fast as a Pentium-II 450? ;) > > Which ones? I'd like to read about them. > > Manoj Franklin has been doing something called "multiscalar" architecture > which also has hardware support for tasks and uses the inherent independence > of instructions from different tasks to hide latency. He also uses some > techniques from the VLIW stuff and traditional superscalar ILP techniques. > I'm not much of an expert on all that - but it's pretty cool stuff. Cool! Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From walt@parl.ces.clemson.edu Mon, 19 Oct 1998 13:59:14 -0400 Date: Mon, 19 Oct 1998 13:59:14 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: MFLOPS of different cpu's -------- > > There is another aspect that caught me off guard (don't judge my ignorance > until after this paragraph). As I read the description, the apparent gain > in memory performance and reduction in context switch costs is at the > expense of computational performance. If only the registers & other > state-holders are replicated, then the reason 1-cycle apparent memory access > is available is because the apparent clock cycle is p times longer than the > chip's clock cycle, where p is the number of process threads (up to the > maximum the chip can hold) -- if the execution units are not also > duplicated, then a process that does not require a memory access must still > wait anyway, as though it did require a memory access. (This would > eliminate the need for registers & cache! This is why register renaming, > memory hierarchy, prefetching, etc, can be neglected.) If the execution > units are indeed duplicated to avoid stalling the processes unnecessarily, > then we are not providing the illusion of 1-cycle memory access. On the > other hand, each process *does* need to access memory each cycle ... to > fetch an instruction. I still find it hard to swallow that the performance > can be improved by slowing down execution, but if they're willing to spend > the development money, I won't judge them until after we see actual > performance figures. > They can pipeline the execution units and then each instruction can issue in its cycle regardless of whether it does a memory access or not. The problem tends to be memory latency. Memory bandwidth is doable (lots of memory modules) and even shared memory is doable, but most of the mechanisms for connecting alot of shared memory to many prcessors is latency. Memory access can be pipelined to keep requests and responses flowing, but the round-trip latency is always a problem. A multi-tasking approach is superior at overcomming this problem when there are plenty of tasks available to run. There is no inherent reason to sacrifice inter-processor communication or execution speed or clock speed. > I have to confess that I'm still having difficulty envisioning how such > a system can be fed a nonlocal code/data mix if it is as fast as claimed > -- what have they done with the CPU -- memory bottleneck? The world may > "look" like infinitely fast memory to a thread, but somewhere out there > is silicon with a clock and a memory bus and it isn't infinitely fast... Pipelined memory gives you the bandwidth you need, the task switching means that the next instruction executed by a task is guaranteed to have the memory read completed so it won't stall. > Is this something like a MTSD ("multiple thread, single data", to abuse > some probably archaic acronyms) design? All the threads get the SAME > data (so data access is slow enough that conventional memory buses can > keep up) but might do different things with it? If so, doesn't that > really slow down the system for anything but very special task mixes? I > thought that more than half the effort of engineering a real parallel > supercomputer was coming up with a fast enough memory subsystem to keep > up... No, its nothing like that at all. It is as MIMD machine. Also, there isn't a shared bus, so the memory bottleneck isn't typically a problem. The only problem is memory latency, and that is what the multithread support is for. > OK, so instead of having e.g. floating point pipelines, they put a whole > SMP architecture on a single chip with a single cache and single set of > registers, expanded to hold multiple contexts/threads, and exploit the > parallelism available in a heterogeneous task mix to distribute > independent threads? If so, a nice idea for a server CPU (and curiously > like SPARC with its multiple parallel ALU's and on-chip context > switches, but more symmetric) but it isn't clear to me how this will > handle data dependencies between threads and the inevitable CPU-memory > bottleneck much better than multiple processors in an ordinary SMP > system. There are no data dependencies between threads other than those one encounters in all shared memory programming, which must be handled via known synchronization mechanisms. The difference with SMP is in the memory interconnect. the "bottleneck" is removed by using an MIN network with plenty of bandwidth but not very good latency - the latency is what all the multitasking is trying to get rid of. > Easier to program, sure, and just maybe more scalable although > at some point you hit a limit of how much you can put on any single > wafer. SMP provides a clear scaling route to more processors > (effectively more silicon area), with various scaling limits on the > interconnect. SMT sounds like it provides more on-chip parallelism (at > some cost -- the real estate has to come from somewhere) and has its own > scaling limits when one builds an MP SMT. Programming would be almost exactly the same as on an SMP - though you will need more threads per CPU to get the required efficiency - so you need more overall parallelism in your problem. There is no inherent limit to processors on this kind of machine, you can add as many as you like just as you would for any other multiprocessor. The limit is in the CPU/memory interconnect. Of course, an MIN interconnect is MUCH more scalable than a shared bus, so one can easily get to thousands of processors and still be effective - where as a bus-based system cannot. As for whether to use a muti-threaded CPU it is really independent of the CPU/memory interconnect, but it mkes the most sense with a MIN type interconnect. Walt -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From prachya@science.gmu.edu Mon, 19 Oct 1998 14:32:38 -0400 Date: Mon, 19 Oct 1998 14:32:38 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Recommended Disk Utilization Strategies Hi, The Beowulf usually consists of nodes with one or more local disks. I found that they are very low utilized. Cross mounting them to all nodes does not seem to be quite useful since the mounted directories are not in contiguous space. If locality exploitation is not an issue, is there any simple tool like 'md' that can combine multiple mounted directories into one logical partition (RAID style)? What are the disadvantages of using existing linux parallel file systems? --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From josip@icase.edu Mon, 19 Oct 1998 15:33:31 -0400 Date: Mon, 19 Oct 1998 15:33:31 -0400 From: Josip Loncaric josip@icase.edu Subject: 2 Questions Jim Hatridge wrote: > > Next, Dr. Loncaric wrote that the max MFLOP/s was MHz/2. If I understand > this would mean that my little 33MHz should do upto 16.5 MFLOP/s. Is this > right? Also if I was to makeup a Beowulf of, say, 10x 33MHz's could I then > expect 165 MFLOP/s? I corrected that later to MFLOPS=MHz (for a Pentium II), since the L2 cache speed equal to MHz/2 does not affect the peak MFLOPS as much as I thought (average MFLOPS are another matter). However, your 33MHz machine is not a Pentium II but most likely a 486 or earlier chip whose performance figures almost certainly differ. At any rate, the theoretical peak is hard to approach. For most applications the average realized MFLOPS is at least an order of magnitude less than the peak. Moreover, 10 compute nodes running in parallel usually suffer various overheads so you cannot expect 10x speedup. Some people parallelize anyway because they would rather have results in three days than in three weeks and because a single compute node might not hold enough RAM to run their problem in the first place. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 403 http://www.icase.edu/~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From hahn@coffee.psychology.mcmaster.ca Mon, 19 Oct 1998 15:52:19 -0400 Date: Mon, 19 Oct 1998 15:52:19 -0400 From: Mark Hahn hahn@coffee.psychology.mcmaster.ca Subject: MFLOPS of different cpu's NOTE: this thread should move to comp.arch. > faster than significant-sized RAM has. You could get 300ns RAM in > 1980; you can get 8ns RAM today, only 40 times faster. But the it's worse than that: DRAM is still in the 70-150 ns range for a general read. unless you're talking SRAM, where the number is more like 30ns. of course, if you're reading a whole burst, you get an amortized cost of around 20 ns, but that's cheating just like cache. > today? 10ns is enough time for a theoretical 500MHz Pentium II to do > ten or so operations. Something as radically different as the Tera 100ns realistic random DRAM latency is long enough for a typical 300 MHz PII/K6 to try to issue around 100 instructions. there's no fix on the horizon that will signficantly change DRAM latency. regards, mark hahn. -- operator may differ from spokesperson. hahn@coffee.mcmaster.ca http://java.mcmaster.ca/~hahn From josip@icase.edu Mon, 19 Oct 1998 16:05:39 -0400 Date: Mon, 19 Oct 1998 16:05:39 -0400 From: Josip Loncaric josip@icase.edu Subject: MFLOPS of different CPUs Prachya Chalermwat wrote: > > Question! If we have N pipelines (FP units), 1 FLOPS/cycle, and C clock > speed, will we have a peak of > > N flop/cycle x C cycles/sec FLOPS (theoretically)? > > What are the limitations of increasing the number of FP units besides the > data depency problem? "In theory, there is no difference between theory and practice. In practice, there is." In theory, N*C=FLOPS but in practice there are enough problems in exploiting parallelism to employ an army of computer scientists and mathematicians. Just to mention a few: finding algorithms that parallelize well, dealing with memory latency via caching, ensuring sufficient bandwidth at all levels of memory hierarchy, etc. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 403 http://www.icase.edu/~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From deadline@plogic.com Mon, 19 Oct 1998 17:04:34 -0400 Date: Mon, 19 Oct 1998 17:04:34 -0400 From: Douglas Eadline deadline@plogic.com Subject: MFLOPS of different CPUs On Fri, 16 Oct 1998, Kragen wrote: > On Fri, 16 Oct 1998, Robert G. Brown wrote: > > Nevertheless, my experience is that even venerable and faulty as the > > MIPS and MFLOPS ratings are, they are nevertheless remarkably useful as > > measures of RELATIVE performance. Of course, so is the CPU clock, and > > for similar reasons. That is, although there are obviously exceptions > > for particular applications or task mixes, in many cases if one compares > > the MIPS/MFLOPS and SPEC(int/fp) rating of two different systems, the > > RELATIVE performance of the systems will be reasonably well represented > > by either of them, and will in most cases will scale fairly obviously > > with system clock. By reasonably well, I mean that it is very rare > > indeed that they will differ by as much as a factor of 2, which is > > negligible on a log scale:-) > > I've been reading about the Tera MTA. It appears to be about as fast > as a 14-processor Cray T90 (the two-processor 255-MHz MTA, that is, the > only one so far built) although the T90 runs at 440 MHz. > > Tera's press releases claim that their machines should be able to > perform about an order of magnitude better than traditional machines > that have the same peak MFLOPS rating. > > Very interesting architecture. Yes Indeed. If I had to guess what Transmeta is up to, I would say that it is something along these lines. This is mere speculation of course. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From rgb@phy.duke.edu Mon, 19 Oct 1998 17:53:37 -0400 Date: Mon, 19 Oct 1998 17:53:37 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Mark Hahn wrote: > > NOTE: this thread should move to comp.arch. I'm not completely convinced of that. I think that serious beowulf design requires a fairly deep knowledge of parallel architectures, and this has been a most invigorating discussion from which I have learned a great deal. Some of it may even be useful in beowulfs. For example, it does suggest >>A<< way to constructively soak up IPC latency in a boring old PVM or MPI program, especially in an SMP box -- try to schedule data transfers with large latency in such a way that the data will be there when the node task gets to the point where it needs it. Or, run MULTIPLE pvm/mpi jobs and arrange it so that the communications phase in one overlaps the execution phase in the other. To the extent that the communications can be backgrounded (which, with DMA-driven network cards, is perhaps significant) one can increase overall efficiency at the expense of absolute speed on each task. After all, all observations that are being made about memory access latency are even more true for IPC latency. Anyway, I would like to thank ALL the folks who have contributed. You can't know too much about this sort of thing, and "threads" on threads are a great way to learn. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From prachya@science.gmu.edu Mon, 19 Oct 1998 18:31:04 -0400 Date: Mon, 19 Oct 1998 18:31:04 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: New JHU ACM cluster Hi, I looked at the picture. I believe that the most recommended local IP addresses should be 192.168.x.x (check previous discussions about this issue in the beowulf mailing list). Also get a simple set of "theHive commands" for parallel rsh (Psh) and bview (for monitoring) from http://newton.gsfc.nasa.gov/thehive/ (not sure these are on the Extreme linux CD or not). After everything is up (including PVM and MPI) then try to get MPBENCH at http://www.cs.utk.edu/~mucci/mpbench/ to test your communication performance for PVM and MPI. On Mon, 19 Oct 1998, Corbett J. Klempay wrote: > Hello all... (my first post..whee!) ... > > My first concern: once this stuff arrives and we physically link > everything together, what is the best/proper way to go about doing the > installs on these? > --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From caskey@technocage.com Mon, 19 Oct 1998 20:39:50 -0400 Date: Mon, 19 Oct 1998 20:39:50 -0400 From: Caskey L. Dickson caskey@technocage.com Subject: New JHU ACM cluster On Mon, 19 Oct 1998, Prachya Chalermwat wrote: > I looked at the picture. I believe that the most recommended local IP > addresses should be 192.168.x.x (check previous discussions about this > issue in the beowulf mailing list). Also get a simple set of "theHive > commands" for parallel rsh (Psh) and bview (for monitoring) from > http://newton.gsfc.nasa.gov/thehive/ (not sure these are on the Extreme > linux CD or not). Personal note: 192.168/16 and 10/8 are both valid private net addresses. Personally I always start with 10/16 so I have room to subnet my private network space easily. It has worked rather well as I have grown from one net (10.0/16) to six (10.1/16, 10.2/16, 10.3/16, 10.23/16, 10.4/16) without the need to re-number dozens of machines. A 24 bit mask would allow you an array of 252 machines with two addresses set aside for the head and a gateway. This is just my personal preference. Also, I find the 10.0.0.0 easier to type. C=) -------------------------------------------------------------------------- Heuer's Law: Any feature is a bug unless it can be turned off. -------------------------------------------------------------------------- Caskey /// pager.818.698.2306 TechnoCage Inc. ///| gpg: aiiieeeeeee!!! -------------------------------------------------------------------------- Early bird gets the worm, but the second mouse gets the cheese. From lindahl@cs.virginia.edu Mon, 19 Oct 1998 21:59:06 -0400 Date: Mon, 19 Oct 1998 21:59:06 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Beowulf as general compute server? > Having learned of the Beowulf project I am was wondering if there may not > be another solution: To install a smallish (6-10 node) Beowulf cluster and > use this to serve the Macs which could run PowerPC linux or the exodus > Xserver software. Sure. Back in the recent good old days, we used xterms and compute servers. You're just doing the same thing. Btw, you can use a piece of software called "vnc" as a free, thin X client under MacOS. > I can see no reason why not but I need to armed with some facts before I > approach the dept. with a proposal. Does anyone have ideas on the number > of nodes I would need? (My estimate is a pure guess) - and a ball-park > figuer for the cost of themachine? You can guess whatever you like for these. $2000 buys a pretty nice node; you can often put 4-5 casual Matlab users on one node because they are often thinking more than they are running. > I would also be garteful for any information on possible application to > large scale problems in neural modelling. If we could use this a research > machine too, the request might carry more weight. We have a neural network simulator that runs using MPI that we run on our cluster. However, it doesn't do dynamic load balancing, so it's a lose to have it running against random matlab users. -- g From jhpark@nurapt.kaist.ac.kr Tue, 20 Oct 1998 00:39:10 -0400 Date: Tue, 20 Oct 1998 00:39:10 -0400 From: jhpark@nurapt.kaist.ac.kr jhpark@nurapt.kaist.ac.kr Subject: LAM-Absoft and mpich-Absoft RPM file I rebuilt LAM and mpich rpm packages included in Extreme Linux for Absoft Fortran. I have modifed some configuration file for compiler's underscore problem. Absoft's compiler option, -f -B108 is included in mpich-Absoft. LAM-Absoft doesn't include -f -B108 option. If you need, download from this site ftp://nurapt.kaist.ac.kr/pub From kragen@pobox.com Tue, 20 Oct 1998 08:07:32 -0400 Date: Tue, 20 Oct 1998 08:07:32 -0400 From: Kragen kragen@pobox.com Subject: LAM-Absoft and mpich-Absoft RPM file On Tue, 20 Oct 1998 jhpark@nurapt.kaist.ac.kr wrote: > I rebuilt LAM and mpich rpm packages included in Extreme Linux for Absoft > Fortran. > > I have modifed some configuration file for compiler's underscore problem. > > Absoft's compiler option, -f -B108 is included in mpich-Absoft. > LAM-Absoft doesn't include -f -B108 option. What do these options do? Are the results of your compilation useful to people who use other compilers? Kragen (who thinks it's nice that people are sharing this work) -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From jhpark@nurapt.kaist.ac.kr Tue, 20 Oct 1998 09:32:08 -0400 Date: Tue, 20 Oct 1998 09:32:08 -0400 From: jhpark@nurapt.kaist.ac.kr jhpark@nurapt.kaist.ac.kr Subject: LAM-Absoft and mpich-Absoft RPM file Absoft compiler option: -f : fold all symbolic names to lower case. (absoft compiler distinguish upper case from lower case) -B108 : append a single underscore to the names of subroutines and functions. I think that LAM-Absoft and mpich-Absoft can treat the lower case + underscore names of subroutines included in libmpi.a . LAM and mpich package included in Extreme Linux append double underscore in their fortran subroutine in libmpi.a. I think it is better to exclude "-f -B108" option in mpich-Absoft. On Tue, 20 Oct 1998, Kragen wrote: > On Tue, 20 Oct 1998 jhpark@nurapt.kaist.ac.kr wrote: > > I rebuilt LAM and mpich rpm packages included in Extreme Linux for Absoft > > Fortran. > > > > I have modifed some configuration file for compiler's underscore problem. > > > > Absoft's compiler option, -f -B108 is included in mpich-Absoft. > > LAM-Absoft doesn't include -f -B108 option. > > What do these options do? Are the results of your compilation useful > to people who use other compilers? > > Kragen (who thinks it's nice that people are sharing this work) > > -- > Kragen Sitaker > A well designed system must take people into account. . . . It's hard to > build a system that provides strong authentication on top of systems that > can be penetrated by knowing someone's mother's maiden name. -- Schneier > From walt@parl.ces.clemson.edu Tue, 20 Oct 1998 09:39:57 -0400 Date: Tue, 20 Oct 1998 09:39:57 -0400 From: Walter B. Ligon III walt@parl.ces.clemson.edu Subject: Recommended Disk Utilization Strategies -------- > Dr. Ligon, > > On Mon, 19 Oct 1998, Walter B. Ligon III wrote: > [...] > > > > > > > I don't know if there are any codes that allow you to strip among remote ^^^^^ > > mounted file systems. Some of the early Intel parallel file systems did > > exactly that. > > > > We have developed a linux parallel file system (PVFS) that does what you > > want, but it doesn't use NFS - it uses its own servers. It is generally > > alot faster if you are doing large transfers and about the same for small > > transfers. > > > > There are a couple of disadvantages. First we don't have mmap implemented > > well for it. We DO have mmap, but it is a limited implementation and doesn't > > perform very well. Your codes should run, but its not the way to do things. > > Thank you. Does every I/O-bound application call mmap? > If I want to do a user-level mapping over the cross-mounted > directories for parallel read/write, do you think it is worth the effort > (if it's not already been done somewhere)? > Best regards, > > --Prachya > BTW, the word was stripe, not strip (althought that certainly has interesting connotations ...) Anyway, almost every linux program calls mmap() to load libraries and stuff, but as long as those are on "normal" file systems everything is OK. In Linux the regular I/O calls go through the block cache, and thus operate like mmap anyway, but this is not a problem for PVFS because it is independent of the block cache. Most normal applications do not use mmap(), so in general you should be pretty much OK. As for doing a user-level NFS striping thing, I think it HAS been done, but probably not in a form of any use to us. I have throught about doing exactly what you propose many times in the past so that I could compare that approach to PVFS in terms of performance. So, I guess my response is yes, if you have time to write some code like that and make it available, do it. BTW, if you DO decide to do it, you might want to talk with us about setting up a user-level file system - we've spent a couple of years figuring out how to make it work smoothly. Of course, the BEST thing is to integrate it with the kernel via the VFS, but for initial development user-level code is much easier to work with. Walt -- Dr. Walter B. Ligon III Associate Professor ECE Department Clemson University From zatonsk@cacr.ioc.ac.ru Tue, 20 Oct 1998 09:41:07 -0400 Date: Tue, 20 Oct 1998 09:41:07 -0400 From: Zatonsky George V. zatonsk@cacr.ioc.ac.ru Subject: File size limit for Linux Petr Ladislav Kodym wrote: > > Hi, > > Could anybody tell me what's the maximum file size limit for Linux on > Intel and Alpha platforms ? > > Many thanks in advance. > > Petr Dear Petr, a month ago I've investigated this problem. For 32-OSes (include Linux) the upper file size is 2Gb. Absolutely; because kernel inside uses call with signed long int as parameter for file offset. Alpha use 64bit OS, and hence max file size is several TB (I don't remember exactly). There is a patch to kernel I have not tried yet (I am going to test it this week): ftp://atrey.karlin.mff.cuni.cz//pub/local/mj/linux/smugfs-0.0.tar.gz. This patch will allow large files under linux. The second alternative is filesystem compressing. I have found DOUBLE from sunsite archive disk sets. This also not tested. I would greatly appreciate if anybody have experience and share me the results :). George Below I post some replies to the same question I've asked a month ago: From:Colin Plumb > I have a program which easy produce large (>2Gb) files. Is it possible > to handle such files under ext2fs? Short answer: no. Longer answer: only on a 64-bi Linux port, like Linux-Alpha. It's not ext2, but the entire VFS layer that can't deal with files > 2 GB, and fixing that is, as they say, "non-trivial". (By which we mean "very hard".) It will be done, but it's going to affect *every* file system and so won't be in 2.2. -- -Colin > On 64-bit platforms, ext2fs will handle large files. Handling large > files on 32-bit platforms requires more development work, and programs > would have to use an alternative API to access such files, since the 2GB > limit is really a 32-bit signed integer limit, and most of the system > types involved are 32-bit on 32-bit platforms. From: Martin Mares I don't see any large problem on the API side. It seems to me that the only syscalls affected are of the stat() family. Some time ago, I tried to implement a 64-bit clean filesystem and aside from the stat problem, everything else was relatively simple (except for several filesystems doing divisions on file offsets and us not linking libgcc with the kernel). You can look at the result at ftp://atrey.karlin.mff.cuni.cz/ /pub/local/mj/linux/smugfs-0.0.tar.gz (very experimental code, but seems to work well). From: Matti Aarnio The problem is within the VFS layer, and at the syscalls. That layer carries thru only 32-bit SIGNED offsets, and stat() et.al. carry 32-bit (unsigned?) sizes, which mean the current interface to user-space can handle only 2 GB files even when late 2.1 series can handle up to 4GB files in the EXT2 filesystem code itself. I recall the stopper against doing 64-bit API at 32-bit platforms has been partly the VFS, and mostly buffer-cache. From: "Stephen C. Tweedie" Hi, On Wed, 30 Sep 1998 15:48:02 +0300 (EEST), Matti Aarnio said: > The problem is within the VFS layer, and at the syscalls. > That layer carries thru only 32-bit SIGNED offsets, and > stat() et.al. carry 32-bit (unsigned?) sizes, which mean > the current interface to user-space can handle only 2 GB > files even when late 2.1 series can handle up to 4GB files > in the EXT2 filesystem code itself. Yep. The user-visible off_t is a signed int (and lseek() semantics absolutely require signed, not unsigned, values). > I recall the stopper against doing 64-bit API at 32-bit > platforms has been partly the VFS, and mostly buffer-cache. The buffer cache has nothing to do with the matter: buffers are indexed by physical block numbers, which get converted along the way into sector offsets, so the only limit in the buffer cache is 2^31 * 512 (bytes-per-sector) which gives an upper limit of 1TB per physical block device. > The buffer-cache system needs to have file offsets for > each block, you see.. Reworking those into high-performance > 64-bit-at-32-bit-systems was deemed too late in the cycle > some 6 months ago when this was brought up. It's the page cache and virtual memory system which is the bigger problem, not the buffer cache and block device IO system. The whole virtual memory system assumes that the fundamental VM units, pages and vm areas, correspond to int byte offsets into files. > Which way ever, I would like to see the necessary new > syscalls being implemented even if our underlying facilities > are not up to speed in all parts. For a reference, see: > http://www.sas.com/standards/large.file/index.html That LFS standard is already implemented in glibc as of 2.1. It's only the kernel interface which remains to be done. --Stephen -- \\\|/// \\ ~ ~ // ( @ @ ) *-------oOOo-(_)-oOOo-------* George V.Zatonsky NMR laboratory N.D. Zelinsky Institute of Organic Chemistry, Leninsky pr., 47 Moscow 117913 Russia e-mail: zatonsk@nmr1.ioc.ac.ru tel.: (7-095) 135-9094 fax : (7-095) 135-5328 From rgb@phy.duke.edu Tue, 20 Oct 1998 10:48:34 -0400 Date: Tue, 20 Oct 1998 10:48:34 -0400 From: Robert G. Brown rgb@phy.duke.edu Subject: MFLOPS of different cpu's (fwd) (From Bob Glamm) ... > Can you bounce > what I just sent you to the beowulf list? I keep forgetting that > the 'r' key sends it back to the sender & not the list... > > Thanks! > > -Bob rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu ---------- Forwarded message ---------- Date: Tue, 20 Oct 1998 10:48:34 -0400 From: Bob Glamm To: "Robert G. Brown" Subject: Re: MFLOPS of different cpu's > > NOTE: this thread should move to comp.arch. > > I'm not completely convinced of that. I think that serious beowulf > design requires a fairly deep knowledge of parallel architectures, and > this has been a most invigorating discussion from which I have learned a > great deal. Some of it may even be useful in beowulfs. For example, it Robert is absolutely correct here. For high-performance computing (which is what beowulf is) it is essential to have an excellent understanding of the underlying architecture AND of any ideas that may contribute to increasing efficiency/performance/workload throughput. The transfer of novel ideas from one paradigm to another (i.e. Tera's fast-context switching to more efficient message-passing/faster workload throughput) is how a lot of new and good research and development is done, especially in the area of high-performance computing. It also helps the discussion base out. There are three camps on this mailing list: the "complete newbie but interested in parallel computing" camp, the "scientist with a varying level of computer architecture knowledge" camp, and the "advanced degree in comp. arch/comp. eng." camp. Any transfer of knowledge that can be achieved from the latter camps to the previous camps (or vice versa :) to advance the state of the beowulf community in general (as this discussion apparently did) should be encouraged. -Bob From eroman@dirac.ams.sunysb.edu Tue, 20 Oct 1998 12:44:41 -0400 Date: Tue, 20 Oct 1998 12:44:41 -0400 From: eroman@dirac.ams.sunysb.edu eroman@dirac.ams.sunysb.edu Subject: Clusters running 2.1 don't scale. I'm running a cluster of 70 Dual Pentium-II workstations. We use Red Hat Linux 5.1 with kernel version 2.1.125. Since 2.1.99 or so, MPICH has not worked properly. If I run the NAS Benchmarks, or our own code, the jobs die after about 1 minute. One node will report a load of 1. If I netstat on that node I see the following: tcp 0 7796 star3.messier:1160 star4.messier:1139 ESTABLISHED We see that a message is stuck in a send buffer somewhere. We're using p4 for communcation and Intel EtherExpress Pro 100B cards. I've tried recompiling MPICH-1.1.0, changing the kernel versions, installing the latest EtherExpress Pro drivers, and modifying the p4 code. Nothing fixes the problem. We did not have this problem on kernel version 2.0.33. No one here knows enough about the networking stack to fix the problem, or even debug it. I'd like to see this fixed. Our cluster now has 128 processors and we're looking forward to giving it some real work to do. Until this problem is solved though, our machine is worthless. -- Eric Roman Department of Applied Mathematics (516)632-8545 SUNY/Stony Brook From lindahl@cs.virginia.edu Tue, 20 Oct 1998 14:38:49 -0400 Date: Tue, 20 Oct 1998 14:38:49 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: Clusters running 2.1 don't scale. > I'm running a cluster of 70 Dual Pentium-II workstations. We use Red > Hat Linux 5.1 with kernel version 2.1.125. Since 2.1.99 or so, MPICH > has not worked properly. If you run non-production software, you may run into problems. 2.1 series kernels are often very stable, which has lulled some folks into a false sense of security. If you really want to debug this, you can strace processes on various nodes and see where they are. But it sounds like a kernel bug. -- g From bug@ruff.cs.jmu.edu Tue, 20 Oct 1998 15:02:58 -0400 Date: Tue, 20 Oct 1998 15:02:58 -0400 From: David Wilburn bug@ruff.cs.jmu.edu Subject: File size limit for Linux Is this to say that Alpha Linux automagically supports 2>GB files, or does it require a patch to VFS and/or ext2? -Dave Wilburn * David Wilburn, a.k.a. "Bug" * JMU Computer Science Student * Boycott naugahyde! Save the naugas! On Tue, 20 Oct 1998, Zatonsky George V. wrote: > a month ago I've investigated this problem. For 32-OSes (include Linux) > the upper file size is 2Gb. Absolutely; because kernel inside uses call > with signed long int as parameter for file offset. Alpha use 64bit OS, > and hence max file size is several TB (I don't remember exactly). There > is a patch to kernel I have not tried yet (I am going to test it this > week): > ftp://atrey.karlin.mff.cuni.cz//pub/local/mj/linux/smugfs-0.0.tar.gz. > This patch will allow large files under linux. > The second alternative is filesystem compressing. I have found DOUBLE > from sunsite archive disk sets. This also not tested. > I would greatly appreciate if anybody have experience and share me the > results :). > > George > From gropp@mcs.anl.gov Tue, 20 Oct 1998 15:18:58 -0400 Date: Tue, 20 Oct 1998 15:18:58 -0400 From: William Gropp gropp@mcs.anl.gov Subject: MFLOPS of different CPUs At 01:10 PM 10/19/98 -0400, Prachya Chalermwat wrote: >Hi all, > >Thanks for all your very useful information and discussions. Although the >peak MFLOPS is sometimes meaningless, it is still something that we all >want to be close to. > >Question! If we have N pipelines (FP units), 1 FLOPS/cycle, and C clock >speed, will we have a peak of > > N flop/cycle x C cycles/sec FLOPS (theoretically)? > >What are the limitations of increasing the number of FP units besides the >data depency problem? For a lot of calculations, there is another simple number. If your calculation won't fit in cache, then the bandwidth to memory can be the controlling factor. For things like sparse matrix-vector multiply, you very roughly top out at M bytes/second / 8 bytes/flop = M/8 FLOPS (theoretically) (This assumes a perfect cache and ignores the vectors and any non-floating point data in the sparse matrix data structure; M is the bandwidth of access to main memory). Even this is an overestimate :) Bill From kragen@pobox.com Tue, 20 Oct 1998 16:55:32 -0400 Date: Tue, 20 Oct 1998 16:55:32 -0400 From: Kragen kragen@pobox.com Subject: MFLOPS of different cpu's (fwd) On Tue, 20 Oct 1998, Robert G. Brown wrote: > ---------- Forwarded message ---------- > From: Bob Glamm > It also helps the discussion base out. There are three camps on this > mailing list: the "complete newbie but interested in parallel computing" > camp, the "scientist with a varying level of computer architecture > knowledge" camp, and the "advanced degree in comp. arch/comp. eng." camp. Hey, you forgot me, the "guy who reads a lot about computer architecture and likes to talk about it but has never designed or built a computer" camp. ;) Kragen -- Kragen Sitaker A well designed system must take people into account. . . . It's hard to build a system that provides strong authentication on top of systems that can be penetrated by knowing someone's mother's maiden name. -- Schneier From lindahl@cs.virginia.edu Tue, 20 Oct 1998 17:48:10 -0400 Date: Tue, 20 Oct 1998 17:48:10 -0400 From: Greg Lindahl lindahl@cs.virginia.edu Subject: MFLOPS of different cpu's (fwd) > It also helps the discussion base out. There are three camps on this > mailing list: the "complete newbie but interested in parallel computing" > camp, the "scientist with a varying level of computer architecture > knowledge" camp, and the "advanced degree in comp. arch/comp. eng." camp. Don't forget the camp of people who think that mailing lists should stick to a topic, and that the toic of this list is commodity cluster computing, not the architecture of the Tera. -- g p.s. Sid mentioned today that it finally runs Unix. From shachar@vipe.technion.ac.il Wed, 21 Oct 1998 10:57:31 -0400 Date: Wed, 21 Oct 1998 10:57:31 -0400 From: Shachar Tal shachar@vipe.technion.ac.il Subject: small switches, also Tera Hi, On Mon, 19 Oct 1998, Mark Hahn wrote: > price ports vendor, model, comments > > 555 8 dlink des-1008 store-forward, 148,800 pps/port(hd) Our comm dept. has bought several of these, I asked them and say it's performance is slighter below bigger switches. It's measured pps rate is about 130Kpps (2 computers (PII-350s) directly attached, in an FTP transfer of a 300MB file). It also falls slower when there is a lot of traffic. > I'd very much appreciate anyone who has practical comments on these > switches, and/or pointers to real technical specs (rather than the > marketing crap most readily accessible...) I'll get their spec sheets (if we have them) and mail the list. Shachar Tal ------------- Taub Computer Center, Technion, Israel Institute of Technology KeyID 0481FEF1 fingerprint = 52 1B 97 6A F2 77 AE C6 64 B6 5A 5E 14 28 8E 7E From fraserf@dove.net.au Wed, 21 Oct 1998 11:16:04 -0400 Date: Wed, 21 Oct 1998 11:16:04 -0400 From: Fraser Farrell fraserf@dove.net.au Subject: minimal Beowulf? G'day all, Recently I've acquired four old 386's with network cards and I'm considering setting them up as a minimal Beowulf cluster. Being new to Linux in general I'm wondering if these machines are even capable of running a Beowulf? Here's the hardware: - 386 cpu, 4MB RAM - 80 or 100MB HDD (DR-DOS 7 + Personal Netware installed for testing purposes) - Hercules mono monitors (yes, the original 720x348 pixels) - 10Mbps NICs (no docs, but all behave as NE2000's) I guess the HD space is going to be a problem. The RedHat installation on my main computer ate about 200MB, although I've seen much smaller examples on other machines. Questions: 1 - is it possible on this hardware? I know Linux itself will work, I'm wondering about all the extra components for "Beowulfing" 2 - suggestions for FAQs to read, absolutely essential files to install, whatever. I'm still learning. cheers, Fraser Farrell From hatridge@straubing.baynet.de Wed, 21 Oct 1998 12:01:14 -0400 Date: Wed, 21 Oct 1998 12:01:14 -0400 From: Jim Hatridge hatridge@straubing.baynet.de Subject: MFLOPS of different cpu's On Mon, 19 Oct 1998, Eugene Leitl wrote: > P.S. Aargh! Still no data about new fast $50/port switches. Are they > mythical, or what? Hi All, Well, don't know how fast they are, but I saw down at the local "PC World" computer shop two switches for sale. One has 8 ports and cost 135 DM (ca. 83U$) and the other has 16 ports and cost 399 DM (ca. 245 U$). Is this what you were writing about? JIM ----------------------------------------- Jim Hatridge Germany hatridge@straubing.baynet.de M$ -- Ghostdriver* on the road to the future! (*German Slang for the guy driving on the wrong side of the road!) From cbohn@afit.af.mil Wed, 21 Oct 1998 12:19:08 -0400 Date: Wed, 21 Oct 1998 12:19:08 -0400 From: Bohn, Christopher A. cbohn@afit.af.mil Subject: "perf" patch on Extreme Linux CD Good day, Now that I've got our system nominally operational (that is, operational enough that we can work on our thesis efforts), I'm starting into the "features" phase. The one that's stumping right now is the "perf" patch to the Linux kernel that's part of the installation on the Extreme Linux CD. I'm attempting to run the examples found in /usr/doc/libperf-0.3/examples directory. When I attempt to run the examples, the system replies "perf_reset: function not implemented". The two explanations I can think of are a) There's something else (undocumented) I need to do in the source code, at compile time, or at runtime. b) The patch wasn't compiled into the kernel on the Extreme Linux CD. Naturally, I really hope it's (a). If so, what is it that I'm overlooking? If it's (b), then before I start mucking around with recompiling the kernel, are there any other Beowulf additions that weren't included in the kernel on the CD (in particular, channel bonding), so I can hit them all in one stroke? Here's the supplied Makefile: CFLAGS=-Wall -g -I.. LDFLAGS=-L.. -lperf BINS=psys pfork pwait pflops all: $(BINS) clean: rm -f $(BINS) psys: psys.c gcc $(CFLAGS) psys.c -o psys $(LDFLAGS) pfork: pfork.c gcc $(CFLAGS) pfork.c -o pfork $(LDFLAGS) pwait: pwait.c gcc $(CFLAGS) pwait.c -o pwait $(LDFLAGS) pflops: pflops.c gcc $(CFLAGS) pflops.c -o pflops $(LDFLAGS) and here's the first few lines from one of the examples: #include #include #include #include #include void main(int argc, char *argv[]) { int r, i, cnf; double a, b, c; unsigned long long ct[2]; /* Run a bunch of these.. */ for (i=0; i < 4; i++) if (fork() == 0) break; else sleep(1); r = perf_reset(); if (r) { perror("perf_reset"); exit(1); } r = perf_set_config(0, PERF_FLOPS); [...] FWIW, same outcome when using egcs instead of gcc. Thanks for your time, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology Phone (937)255-3636 (DSN 785) AFIT/EN638 Lab x4606 Voicemail x6638 2950 P St, Box 4638 email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765 EngrBohn@aol.com http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* From prachya@science.gmu.edu Wed, 21 Oct 1998 14:39:31 -0400 Date: Wed, 21 Oct 1998 14:39:31 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: minimal Beowulf? Hi, On Thu, 22 Oct 1998, Fraser Farrell wrote: > G'day all, > > Recently I've acquired four old 386's with network cards and I'm considering > setting them up as a minimal Beowulf cluster. Being new to Linux in > general I'm wondering if these machines are even capable of running a > Beowulf? Here's the hardware: > > - 386 cpu, 4MB RAM > - 80 or 100MB HDD (DR-DOS 7 + Personal Netware installed for testing purposes) > - Hercules mono monitors (yes, the original 720x348 pixels) > - 10Mbps NICs (no docs, but all behave as NE2000's) > > I guess the HD space is going to be a problem. The RedHat installation > on my main computer ate about 200MB, although I've seen much smaller > examples on other machines. The HD space should be OK. On my slave nodes I used only about 50MB (Slakware 3.4 base + net + PVM + MPI). I put them in a tar ball that you can download, untar to your 80MB drive, boot with Slakware boot disk (mount root=/dev/hda1) , and the rerun lilo. The configuration expect to have /home mounted via nfs from the master node. Let me know if you want to try my "Beowulf Seed". I'm sure other beowulfers might have smaller size than what I have for their slave nodes. This works with my RH5.0 master node. Or you might want to try diskless option as an alternative. I think your 4MB RAM will be the major problem when trying to run some applications. Cheers :) --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From josip@icase.edu Wed, 21 Oct 1998 15:06:38 -0400 Date: Wed, 21 Oct 1998 15:06:38 -0400 From: Josip Loncaric josip@icase.edu Subject: Batch processing Can anyone suggest a batch processing system to handle job scheduling on a Beowulf class compute server? Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 403 http://www.icase.edu/~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From Jim.Mamay@UCHSC.edu Wed, 21 Oct 1998 15:32:29 -0400 Date: Wed, 21 Oct 1998 15:32:29 -0400 From: Jim.Mamay@UCHSC.edu Jim.Mamay@UCHSC.edu Subject: Extreme Linux install on Dell Optiplex Good day, I'm attempting to install Extreme Linux on a new dell optiplex. This unit has two internal scsi drives (no IDE). The SCSI Controller is a aha2940. The bios is AHA-2940U/UW Dual SCSI Bios V1.33S2. During, the standard install from cdrom, it does not pick up the scsi controller. If I select AHA2940 from the list - Fails to find the device. Regardless of module options it still fails. adapter is scsi id 7 IRQ : 10,14 i/o Port address : D800h, D400h controller bus : 00h device : 0Eh Drives and controller are functional. Any idea as to a course of action? Thanks for the help. Sincerely, Jim From mjabbur@microtecvision.com.br Wed, 21 Oct 1998 17:17:43 -0400 Date: Wed, 21 Oct 1998 17:17:43 -0400 From: Marlon Jabbur mjabbur@microtecvision.com.br Subject: Newbie Questions ? Hi List, I'm newbie to beowulf and like to know about some features, my interest is in High Availability of Software and Hardware Resources, can beowulf do this to me ?????? Thanks Marlon Jabbur From cklempay@acm.jhu.edu Wed, 21 Oct 1998 17:25:28 -0400 Date: Wed, 21 Oct 1998 17:25:28 -0400 From: Corbett J. Klempay cklempay@acm.jhu.edu Subject: Peak performance Our ACM has to put a blurb about our new cluster in the JHU CS newsletter...what would be a good performance thing to say about our cluster? It's 8 PII-350's connected by switched fast ethernet...each node with 128 mb of ram (master of 384), 6.4 gb on the compute nodes, and 2x9.1 GB USCSI drives on the master. ------------------------------------------------------------------------------ Corbett J. Klempay Quote of the Week: http://www2.acm.jhu.edu/~cklempay "Keep your faith in all beautiful things; in the sun when it is hidden, in the Spring when it is gone." PGP Fingerprint: 7DA2 DB6E 7F5E 8973 A8E7 347B 2429 7728 76C2 BEA1 ------------------------------------------------------------------------------ From deadline@plogic.com Wed, 21 Oct 1998 17:53:29 -0400 Date: Wed, 21 Oct 1998 17:53:29 -0400 From: Douglas Eadline deadline@plogic.com Subject: Peak performance On Wed, 21 Oct 1998, Corbett J. Klempay wrote: > Our ACM has to put a blurb about our new cluster in the JHU CS > newsletter...what would be a good performance thing to say about our > cluster? It's 8 PII-350's connected by switched fast ethernet...each node > with 128 mb of ram (master of 384), 6.4 gb on the compute nodes, and 2x9.1 > GB USCSI drives on the master. Running the nasa parallel benchmark suit would be a good thing to report - it proves you are running something in parallel. I do not have the url for it at my fingertips. Note that some of the benchmarks need a power of 2 CPUs so they will only work on 4 CPUs. I have been cloecting such data, but hav not had a chance to publish it. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From cklempay@acm.jhu.edu Wed, 21 Oct 1998 20:18:07 -0400 Date: Wed, 21 Oct 1998 20:18:07 -0400 From: Corbett J. Klempay cklempay@acm.jhu.edu Subject: Newbie Questions ? I sounds like (what you're looking for) != (Beowulf). Beowulf is a type of clustering, but it's not failover/fault tolerant/high availability. It's all about high computing throughput. ------------------------------------------------------------------------------ Corbett J. Klempay Quote of the Week: http://www2.acm.jhu.edu/~cklempay "Keep your faith in all beautiful things; in the sun when it is hidden, in the Spring when it is gone." PGP Fingerprint: 7DA2 DB6E 7F5E 8973 A8E7 347B 2429 7728 76C2 BEA1 ------------------------------------------------------------------------------ On Wed, 21 Oct 1998, Marlon Jabbur wrote: > Hi List, > > I'm newbie to beowulf and like to know about some features, my interest is > in High Availability of Software and Hardware Resources, can beowulf do > this to me ?????? > > Thanks > > Marlon Jabbur > From deadline@plogic.com Wed, 21 Oct 1998 20:31:01 -0400 Date: Wed, 21 Oct 1998 20:31:01 -0400 From: Douglas Eadline deadline@plogic.com Subject: Extreme Linux install on Dell Optiplex On Wed, 21 Oct 1998 Jim.Mamay@UCHSC.edu wrote: > Good day, > > I'm attempting to install Extreme Linux on a new > dell optiplex. This unit has two internal scsi > drives (no IDE). The SCSI Controller is a aha2940. The > bios is AHA-2940U/UW Dual SCSI Bios V1.33S2. > During, the standard install from cdrom, it does > not pick up the scsi controller. If I select AHA2940 > from the list - Fails to find the device. Regardless > of module options it still fails. > adapter is scsi id 7 > IRQ : 10,14 > i/o Port address : D800h, D400h > controller bus : 00h > device : 0Eh > > Drives and controller are functional. Get Rh 5.1. The EL install has some probles with SCSI. Besides the EL kernel is now kind of old. If you need Channel Bonding, you can hand patch dev.c There are some good things on the EL disk that you can use once you get a current Rh install complete. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.861.6960 115 Research Drive | PARALLEL | Fax:+610.861.8247 Bethlehem, PA 18017 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From jhpark@nurapt.kaist.ac.kr Wed, 21 Oct 1998 21:01:06 -0400 Date: Wed, 21 Oct 1998 21:01:06 -0400 From: jhpark@nurapt.kaist.ac.kr jhpark@nurapt.kaist.ac.kr Subject: Batch processing Visit, http://www.erc.msstate.edu/thrusts/cps/hector/main/Compare.html and Choose one On Wed, 21 Oct 1998, Josip Loncaric wrote: > Can anyone suggest a batch processing system to handle job scheduling on > a Beowulf class compute server? > > Sincerely, > Josip > > -- > Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu > ICASE, Mail Stop 403 http://www.icase.edu/~josip/ > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > From cbohn@afit.af.mil Wed, 21 Oct 1998 22:06:02 -0400 Date: Wed, 21 Oct 1998 22:06:02 -0400 From: Capt Bohn, Christopher A. cbohn@afit.af.mil Subject: "perf" patch on Extreme Linux CD Good day, Thanks for the reply. I understand why the perf patch wasn't precompiled into the kernel. Do I understand correctly, though, that even though the Pentium II has the same core as the Pentium Pro, perf will not work with the Pentium II? Thanks again, cb *-*-*-*-*-*-*-* Capt Christopher A. Bohn Graduate Student, Electrical (digital) Engineering Air Force Institute of Technology     Phone (937)255-3636 (DSN 785) AFIT/EN638                              Lab x4606   Voicemail x6638 2950 P St, Box 4638                         email cbohn@afit.af.mil Wright-Patterson AFB OH 45433-7765                 EngrBohn@aol.com                http://members.aol.com/EngrBohn/ *-*-*-*-*-*-*-* -----Original Message----- From: Erik Arjan Hendriks [mailto:hendriks@cesdis.gsfc.nasa.gov] Sent: Wednesday, October 21, 1998 12:40 PM To: Bohn, Christopher A. Subject: Re: "perf" patch on Extreme Linux CD On Wed, 21 Oct 1998, Bohn, Christopher A. wrote: > Good day, > > Now that I've got our system nominally operational (that is, operational > enough that we can work on our thesis efforts), I'm starting into the > "features" phase. > > The one that's stumping right now is the "perf" patch to the Linux kernel > that's part of the installation on the Extreme Linux CD. I'm attempting to > run the examples found in /usr/doc/libperf-0.3/examples directory. When I > attempt to run the examples, the system replies "perf_reset: function not > implemented". > > The two explanations I can think of are > a) There's something else (undocumented) I need to do in the source code, at > compile time, or at runtime. > b) The patch wasn't compiled into the kernel on the Extreme Linux CD. > > Naturally, I really hope it's (a). If so, what is it that I'm overlooking? Sorry, it's (b). There's a good reason for it though. That patch is specific to the Pentuim Pro. It adds steps to the context switching that save and restore the counter registers. I believe there are similar facilities on the Pentium and Pentuim II but they're almost certainly not the same. Since I don't know how this will behave on those chips or non-intel chips, it's not built in by default. To build it, you'll have to select PPro (686) as your processor type and then the performance counter support will become available as an option. > If it's (b), then before I start mucking around with recompiling the kernel, > are there any other Beowulf additions that weren't included in the kernel on > the CD (in particular, channel bonding), so I can hit them all in one > stroke? Every other patch should be turned on. (Channel bonding is in there.) You'll need the ifenslave utility available from our web site though. http://beowulf/software/bonding.html - Erik ------------------------------------------------------------ Erik Hendriks hendriks@cesdis.gsfc.nasa.gov From cklempay@acm.jhu.edu Wed, 21 Oct 1998 23:10:51 -0400 Date: Wed, 21 Oct 1998 23:10:51 -0400 From: Corbett J. Klempay cklempay@acm.jhu.edu Subject: Doing a new install When I get these new machines in the near future for our ACM cluster, how does this work? I have a copy of EL...but is it better to install RH 5.1? Can I install RH 5.1 and then just get a few RPMs for the Beowulf functionality? (off of the EL CD, or perhaps the Beowulf site?) ------------------------------------------------------------------------------ Corbett J. Klempay Quote of the Week: http://www2.acm.jhu.edu/~cklempay "Keep your faith in all beautiful things; in the sun when it is hidden, in the Spring when it is gone." PGP Fingerprint: 7DA2 DB6E 7F5E 8973 A8E7 347B 2429 7728 76C2 BEA1 ------------------------------------------------------------------------------ From wrankin@ee.duke.edu Thu, 22 Oct 1998 09:15:28 -0400 Date: Thu, 22 Oct 1998 09:15:28 -0400 From: William T. Rankin wrankin@ee.duke.edu Subject: Extreme Linux install on Dell Optiplex On Wed, 21 Oct 1998, Douglas Eadline wrote: > On Wed, 21 Oct 1998 Jim.Mamay@UCHSC.edu wrote: > > > Good day, > > > > I'm attempting to install Extreme Linux on a new > > dell optiplex. This unit has two internal scsi > > drives (no IDE). The SCSI Controller is a aha2940. > > Get Rh 5.1. The EL install has some probles with SCSI. Besides the EL > kernel is now kind of old. You will also most likely (I believe) need to download a new install disk with the new aic7xxx SCSI drivers. I believe that RedHat has it available on their ftp site (since Doug Ledford is now working for them) (let me bring up netscape and check here... ahhh, here we go!) ftp://ftp.redhat.com/pub/aic If this site is busy, you can also use the old site: ftp://ftp.dialnet.net/pub/linux/aic7xxx There are READMEs which explain the problems and fixes. Hope this helps, -bill From Jim.Mamay@UCHSC.edu Thu, 22 Oct 1998 10:27:41 -0400 Date: Thu, 22 Oct 1998 10:27:41 -0400 From: Jim.Mamay@UCHSC.edu Jim.Mamay@UCHSC.edu Subject: Extreme Linux install on Dell Optiplex - THANKS To all, thank you all for all your excellent responses. I'm online and operational. ;-) Thanks much sincerely, Jim From forrest@esd.ornl.gov Thu, 22 Oct 1998 11:17:20 -0400 Date: Thu, 22 Oct 1998 11:17:20 -0400 From: Forrest Hoffman forrest@esd.ornl.gov Subject: Commodity RAID pricing (Was Re: High availability?) This is a multi-part message in MIME format. --------------64D1E071DF316A82C98CD7E0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit John W. Cobb wrote: > What's been my biggest pain in bringing this system on line? > Well, the system is sold w/o disks. No problem, I just buy disks on the > open market. However, the disk carriers to hold the hot-swap disks in place > are a single source item. Only IBM sells it. Well, the reason the 704 is > currently so cheap is that IBM has discontinued it and resellers are > offering it marked down. IBM looked at the product as ending it's lifetime, > but they have currently been swamped with reseller's customers wanting a > dozen carriers per server --- a demand they did not anticipate (they were > discontinuing the servers right? ). Well, I have had to wait 4 weeks for > carriers to come in. Their price was only $22/carrier -- not too expensive, > although they are just a piece of plastic. However, the real hassle has > been the wait and not knowing when they would arrive. The situation was so > critical that I saw some carriers auctioned on Onsale for in excess of $100 > a piece -- sheesh. You might try finding mounting rails and brackets from Arrowfield International at http://www.arrowfieldinc.com/ They seem have almost everything imaginable, but I don't know how their pricing goes. -- Forrest Hoffman forrest@esd.ornl.gov Environmental Sciences Division http://www.esd.ornl.gov/~forrest Oak Ridge National Laboratory (423) 576-7680 MS 6036, Building 1505, Room 216 (423) 576-8543 fax P.O. Box 2008 36° 1' 35" N 84° 11' 55" W Oak Ridge TN 37831-6036 PGP fingerprint = 4F D4 F4 51 F4 C0 6C 10 01 58 01 84 10 B6 67 1E --------------64D1E071DF316A82C98CD7E0 Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Forrest Hoffman Content-Disposition: attachment; filename="vcard.vcf" begin: vcard fn: Forrest Hoffman n: Hoffman;Forrest org: ORNL Environmental Sciences Division adr: Building 1505, Room 216, MS 6036;;P.O. Box 2008;Oak Ridge;Tennessee;37831-6036;USA email;internet: forrest@esd.ornl.gov tel;work: 423-576-7680 tel;fax: 423-576-8543 note;quoted-printable:Check out my homepage at=0D=0A= http://www.esd.ornl.gov/~forrest=0D=0A= x-mozilla-cpt: ;0 x-mozilla-html: TRUE version: 2.1 end: vcard --------------64D1E071DF316A82C98CD7E0-- From prachya@science.gmu.edu Thu, 22 Oct 1998 11:54:59 -0400 Date: Thu, 22 Oct 1998 11:54:59 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Extreme Linux install on Dell Optiplex On Wed, 21 Oct 1998, Douglas Eadline wrote: > Get Rh 5.1. The EL install has some probles with SCSI. Besides the EL > kernel is now kind of old. If you need Channel Bonding, you can > hand patch dev.c There are some good things on the EL disk that ^^^^^^^^^^^^^^^^ Could you explain more? I'm interested to understand more about channel bonding.. There is also patches (for 2.0.35) and a boot image for aic7xxx at ftp://ftp.dialnet.net/pub/linux/aic7xxx/ The RH5.1 does not recognize our 2940U2W adapter. --Prachya --------------------------------------------------------------------------- Prachya Chalermwat George Mason University Graduate Research Assistant (703) 993-4322 Computational Sciences and Informatics (FAX) 993-1980 George Mason University Email: prachya@science.gmu.edu 4400 University Drv. MSN-5C3, Fairfax, VA 22030-4444 --------------------------------------------------------------------------- URL: http://spaceops.science.gmu.edu "Imagination is more important than knowledge." A. Einstein From prachya@science.gmu.edu Thu, 22 Oct 1998 12:15:01 -0400 Date: Thu, 22 Oct 1998 12:15:01 -0400 From: Prachya Chalermwat prachya@science.gmu.edu Subject: Doing a new install On Wed, 21 Oct 1998, Corbett J. Klempay wrote: > When I get these new machines in the near future for our ACM cluster, how > does this work? I have a copy of EL...but is it better to install RH 5.1? > Can I install RH 5.1 and then just get a few RPMs for the Beow