From jdmelo at leca.ufrn.br Fri Jun 2 06:10:54 2000 From: jdmelo at leca.ufrn.br (Jorge Dantas de Melo) Date: Sun Sep 7 01:00:20 2008 Subject: Mirinet and Beowulf Message-ID: <3937B25E.C556CD45@leca.ufrn.br> Hi, We are interested to build a Beowulf cluster using Mirinet. Could anyone give us some informations about research groups which have done the same? Thanks, Prof. Jorge Dantas de Melo Computing Engineering and Automation Laboratory Federal University of Rio Grande do Norte From fryman at lw.net Fri Jun 2 08:09:28 2000 From: fryman at lw.net (J. Fryman) Date: Sun Sep 7 01:00:20 2008 Subject: Jobs with Beowulf systems? Message-ID: <3937CE28.99FD99DB@lw.net> Hi all, Is there any location of jobs available working on/with/etc beowulf and similar cluster systems? I've check the obvious places, but haven't turned up anything positive. Tips and pointers would be welcome. Josh Fryman fryman@lw.net From glindahl at hpti.com Fri Jun 2 08:22:24 2000 From: glindahl at hpti.com (Greg Lindahl) Date: Sun Sep 7 01:00:20 2008 Subject: Jobs with Beowulf systems? In-Reply-To: <3937CE28.99FD99DB@lw.net> Message-ID: <003801bfcca6$5acd3aa0$0932fea9@hptilap.hpti.com> > Is there any location of jobs available working on/with/etc beowulf and > similar cluster systems? I've check the obvious places, but haven't > turned up anything positive. This would be a cool thing to have -- I'd like to be able to point my recruiting people at a website and say "advertise *here*, find candidates *here*..." -- greg From rgb at phy.duke.edu Fri Jun 2 08:59:16 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: Jobs with Beowulf systems? In-Reply-To: <003801bfcca6$5acd3aa0$0932fea9@hptilap.hpti.com> Message-ID: On Fri, 2 Jun 2000, Greg Lindahl wrote: > > Is there any location of jobs available working on/with/etc beowulf and > > similar cluster systems? I've check the obvious places, but haven't > > turned up anything positive. > > This would be a cool thing to have -- I'd like to be able to point my > recruiting people at a website and say "advertise *here*, find candidates > *here*..." Well, I personally certainly don't object to a limited amount of recruiting via the list (having used it myself to that effect in the past). Posting beowulf-specific job openings one time, or posting an announcement one time that you are beowulf-skilled and looking for work (see attached CV) should likely be tolerated, as the community in either direction is likely to be small and in many cases connected only via this list. However, I agree that a "beowulf bulletin board" would be a useful thing to have and is more appropriate in the long run. Dwight was planning to set one up on www.supercomputer.org; or it might be a good idea to add an associated pair of lists to the mailman server on scyld.com when this is all running. beowulf-jobs is a reasonable list to have, and because mailman archives the list in web-accessible form, it would be very simple for employers or jobseekers to post there or search for recent posts. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From feldy at myri.com Fri Jun 2 09:17:25 2000 From: feldy at myri.com (Bob Felderman) Date: Sun Sep 7 01:00:20 2008 Subject: Mirinet and Beowulf Message-ID: <200006021617.JAA21988@myri.com> => Hi, => We are interested to build a Beowulf cluster using Mirinet. Could anyone => give us some informations about research groups which have done the => same? => Thanks, => Prof. Jorge Dantas de Melo => Computing Engineering and Automation Laboratory => Federal University of Rio Grande do Norte Here's an old list of some projects http://www.myri.com/myrinet/customer_projects/index.html From vor+ at pitt.edu Fri Jun 2 16:09:44 2000 From: vor+ at pitt.edu (Victor Ortega) Date: Sun Sep 7 01:00:20 2008 Subject: Jobs with Beowulf systems? In-Reply-To: <003801bfcca6$5acd3aa0$0932fea9@hptilap.hpti.com> Message-ID: On Fri, 2 Jun 2000, Greg Lindahl wrote: > > Is there any location of jobs available working on/with/etc beowulf and > > similar cluster systems? I've check the obvious places, but haven't > > turned up anything positive. > > This would be a cool thing to have -- I'd like to be able to point my > recruiting people at a website and say "advertise *here*, find candidates > *here*..." I agree--sounds like a good idea. Perhaps in a corner on beowulf.org. Victor From jok707s at mail.smsu.edu Sat Jun 3 03:55:53 2000 From: jok707s at mail.smsu.edu (jok707s@mail.smsu.edu) Date: Sun Sep 7 01:00:20 2008 Subject: Stock Trading &c Message-ID: <39357206@caliber> Does anyone have info on the porting of stock trading software to clusters? For example, there is a list of financial/stock programs at: http://linux.com/links/Software/Financial/ How many of these programs are worth parallelizing? Who has actually tried it? Joel From deadline at plogic.com Sat Jun 3 07:53:16 2000 From: deadline at plogic.com (Douglas Eadline) Date: Sun Sep 7 01:00:20 2008 Subject: Jobs with Beowulf systems? In-Reply-To: Message-ID: On Fri, 2 Jun 2000, Victor Ortega wrote: > On Fri, 2 Jun 2000, Greg Lindahl wrote: > > > Is there any location of jobs available working on/with/etc beowulf and > > > similar cluster systems? I've check the obvious places, but haven't > > > turned up anything positive. > > > > This would be a cool thing to have -- I'd like to be able to point my > > recruiting people at a website and say "advertise *here*, find candidates > > *here*..." > > I agree--sounds like a good idea. Perhaps in a corner on beowulf.org. Or perhaps on Beowulf Underground Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From seth at hogg.org Sat Jun 3 09:46:07 2000 From: seth at hogg.org (Simon Hogg) Date: Sun Sep 7 01:00:20 2008 Subject: Mixed distros in one cluster Message-ID: <4.3.1.2.20000603174137.00b365d0@icex5.cc.ic.ac.uk> Is there an inherent drawback in using different distributions in one cluster (apart from more complicated maintenance)? They should all work together, anyway, right? Suse, Redhat and Debian is what I've got - are there any special considerations for this combination that anyone can think of? Of course, I will migrate everything to one distro at some time (probably Debian) but different people want to 'play' with different distros, and this is not a production cluster, so it might even make things more interesting! -- Simon Hogg From rbross at parl.ces.clemson.edu Sat Jun 3 11:07:18 2000 From: rbross at parl.ces.clemson.edu (Rob Ross) Date: Sun Sep 7 01:00:20 2008 Subject: Jobs with Beowulf systems? In-Reply-To: Message-ID: People are welcome to use the "Announcements" section of Beowulf Underground to announce job openings. Rob On Sat, 3 Jun 2000, Douglas Eadline wrote: > On Fri, 2 Jun 2000, Victor Ortega wrote: > > > On Fri, 2 Jun 2000, Greg Lindahl wrote: > > > > Is there any location of jobs available working on/with/etc beowulf and > > > > similar cluster systems? I've check the obvious places, but haven't > > > > turned up anything positive. > > > > > > This would be a cool thing to have -- I'd like to be able to point my > > > recruiting people at a website and say "advertise *here*, find candidates > > > *here*..." > > > > I agree--sounds like a good idea. Perhaps in a corner on beowulf.org. > > Or perhaps on Beowulf Underground > > Doug From rgb at phy.duke.edu Sat Jun 3 12:22:43 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: Mixed distros in one cluster In-Reply-To: <4.3.1.2.20000603174137.00b365d0@icex5.cc.ic.ac.uk> Message-ID: On Sat, 3 Jun 2000, Simon Hogg wrote: > Is there an inherent drawback in using different distributions in one > cluster (apart from more complicated maintenance)? > > They should all work together, anyway, right? > > Suse, Redhat and Debian is what I've got - are there any special > considerations for this combination that anyone can think of? Better you than me, is all I can say;-). Actually, I agree that it might be fun to play with and compare the different distros in a lab/beowulf setting -- if one had nothing else to do (like real work to do ON the clusters), and I've suggested to some Intel Brass that they consider funding such an effort at a public facility set up for that very purpose. However, I predict that you'll end up doing nearly three times as much work solving the same problems three different ways and building stuff for (possibly) three different library sets. Actually, RH and SuSE will probably coexist (both RPM based, similar libraries) but I think that Debian and RH/SuSE will fight in various ways that will require a lot of work, at least if you plan to make the software offerings and user environment identical on all the platforms. For truly large operations of any sort, heterogeneity is evil. The more that is different, the more that is nonstandard or custom, the more work you have to do to provide a degree of homogeneity to benighted and ignorant users. I fought this fight for years with different Unices (e.g. SunOS, Irix, AIX) in a single LAN and the distilled wisdom from the experience is summarized as: One person can do a pretty good job of installing, administering and maintaining one operating system on one LAN. If things are well set up (that is, set up scalably with a fair degree of automation and reasonably homogeneous hardware) the SIZE of the LAN can be pretty large (hundreds of hosts) and one person can still manage the hardware/software end of things. However, user support doesn't scale so well and a standalone systems person usually gets used up by users at the expense of hardware before getting to that many hosts (unless a lot of them are in a beowulf cluster so there are more machines than users). One person CAN usually do two OS's (or two LANs in different buildings/departments) but only if they do a less than perfect job on one. Too much to master, too much to duplicate, too much glue (or too far to go and one place/group of people that suits you better). One person can not generally do a good job with three. Usually, having three to keep running "acceptably" prevents one from having even one of them running "excellently well". Now with three Linuces you're not quite equivalent to three different general Unices. However, I'll bet that /etc is laid out differently, that startup scripts are different, that different variables are set and used, that they have different install tools, that different sets of things are provided in a "default" installation and that different packages are collected in different ways to support things like X, gnome, WM's in general, news and mail tools, and possibly even compilers and basic libraries. It won't do to have one version of Gnome running on RH and SuSE and a different one on Debian, or to have different compiler revisions or kernels or module sets. Just moving between Slackware and Red Hat, I had to learn a huge amount and make fundamental changes in the way I did various things. Mostly for the better, I might add, all though there are certainly still things that irritate me about Red Hat. > Of course, I will migrate everything to one distro at some time (probably > Debian) but different people want to 'play' with different distros, and > this is not a production cluster, so it might even make things more > interesting! Remember the Chinese curse: "May you live in interesting times";-) I personally hope that your experience is interesting in only the best of ways... rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From gordan at dcs.rhbnc.ac.uk Sat Jun 3 16:37:40 2000 From: gordan at dcs.rhbnc.ac.uk (Gordan Bobic) Date: Sun Sep 7 01:00:20 2008 Subject: Stock Trading &c Message-ID: > Does anyone have info on the porting of stock trading > software to clusters? > For example, there is a list of financial/stock programs at: > > http://linux.com/links/Software/Financial/ > > How many of these programs are worth parallelizing? Who > has actually tried > it? It depends on your exact needs, really. The software on the page you have mentioned is all for monitoring performance of stocks. As such, it requires very little processing power, so clusters are not really a terribly useful platform to be porting it to. There are other things you can do on a cluster, though. I am currently working on a stock market trading and signalling system, and when you think about it the right way, the parallelism is very obvious. If you consider that there are in excess of 10,000 companies being traded world wide, then analysing the trends in those can be performed in parallel as 10,000 jobs running at the same time, each using whatever your method of choice is, be it ridge/lease squares regression, support vector machines, or neural networks. The point is, if that is the sort of thing you are working on, then you could quite simply run all of these in parallel. The tasks involved in detailed analysis, such as the methods mentioned above, are extremely CPU intensive, but cause very little IO traffic, to the disk, and hence the network. This means that your spawning/migration times are going to be negligible compared to CPU time consumed. Seen as that is the case, you might as well just slap a few machines together and use Mosix to load ballance the tasks. If you are comparing the performance of companies, and comparing each one of them with each of the others, then you again have the situation where you are running a bunch of identical tasks in parallel on different data. What you could potentially save on is using the same code section with varying data section in your program, and using this to minimize memory usage. This is often quite effective in conserving memory on a single CPU system, but when you start trying to spread the program over the entire cluster, you need the program code to be running on all machines, so you will either not save anything, or you will cause enough IO traffic between machines to make the whole exercise not worth your while due to horrendous overheads. As far as the stock trading problem goes, the explanation given here is rather trivial, but I hope that it does illustrate the kind of problem you are likely to be facing. Hope this helps. Gordan From lkchu at cs.ucsb.edu Sat Jun 3 22:05:28 2000 From: lkchu at cs.ucsb.edu (Lingkun Chu) Date: Sun Sep 7 01:00:20 2008 Subject: Multicast on channel-bonding Message-ID: <016801bfcde2$80b93e70$017610ac@sweeper> Hi all, Our beowulf cluster has recently got channel bonded on the latest kernel 2.2.15. Everything seems okay except the IP multicasting. By "tcpdump -i bond0", I find the multicast packets do reach the related nodes. But the application can not always receive the corresponding packets. Most of packets are dropped. It happens when I connect a socket to a MC group, and then use send. When I use sendto and specify group/port, things work fine. Any comments are appreciated. Thank you. -Lingkun -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20000603/3a5f765a/attachment.html From covenant at dirac.org Sun Jun 4 12:24:09 2000 From: covenant at dirac.org (Peter Jay Salzman) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes Message-ID: dear all, i'd like to: edit /etc/profile to include /sbin and /usr/sbin in PATH adduser jobrun on our 40 nodes. is there a way of doing this without telnetting 40 times? thanks! pete From jakob at ostenfeld.dk Sun Jun 4 16:23:44 2000 From: jakob at ostenfeld.dk (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: References: Message-ID: <20000605012344.V770@ostenfeld.dk> On Sun, 04 Jun 2000, Peter Jay Salzman wrote: > dear all, > > i'd like to: > > edit /etc/profile to include /sbin and /usr/sbin in PATH > adduser jobrun > > on our 40 nodes. is there a way of doing this without telnetting 40 times? If you haven't already, you should setup SSH on all the nodes. Then you can put your public key in ~root/.ssh/authorized_keys to allow instant login from anywhere provided your passphrase is entered correctly at your workstation. If you don't know about SSH, Secure Shell, you should read about it (good pointers anyone ?) Once you've done that, it should be a simple matter to do what you asked: Provided the names of all your hosts are in the file /etc/hostfile: [start up a shell under ssh-agent, type in passphrase to ssh-add] for i in `cat /etc/hostfile`; do ssh -l root $i perl -pi -e 's/(PATH=\"[^"]+)\"/$1:\/usr\/sbin:\/sbin\"/' /etc/profile ssh -l root $i adduser jobrun done You might want to experiment with copies of /etc/profile when doing tricks like that.... This time I actually managed to get it right at first shot, but your mileage might vary ;) -- ................................................................ : jakob@ostenfeld.dtu.dk : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From covenant at dirac.org Sun Jun 4 17:11:58 2000 From: covenant at dirac.org (Peter Jay Salzman) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: <20000605012344.V770@ostenfeld.dk> Message-ID: hi jakob, we have ssh on the beowulf frontend, but not on the nodes. any ideas on automating installing ssh on the nodes? i haven't seen redhat 6.1 ssh rpms, i guess that's a remnant of the USA's moronic crypto export policy (which i understand was mostly lifted). i've tried to use mandrake's ssh packages on a redhat 6.1, but redhat balked at the mandrake rpms. btw, i didn't know ssh had the capability to run stuff non-interactively. that is a very cool thing to know! thank you very much! pete > Date: Mon, 5 Jun 2000 01:23:44 +0200 > From: "[iso-8859-1] Jakob Østergaard" > To: Peter Jay Salzman > Cc: Beowulf Mailing List > Subject: Re: automating commands on nodes > > On Sun, 04 Jun 2000, Peter Jay Salzman wrote: > > > dear all, > > > > i'd like to: > > > > edit /etc/profile to include /sbin and /usr/sbin in PATH > > adduser jobrun > > > > on our 40 nodes. is there a way of doing this without telnetting 40 times? > > If you haven't already, you should setup SSH on all the nodes. Then you can > put your public key in ~root/.ssh/authorized_keys to allow instant login from > anywhere provided your passphrase is entered correctly at your workstation. > > If you don't know about SSH, Secure Shell, you should read about it (good > pointers anyone ?) > > Once you've done that, it should be a simple matter to do what you asked: > Provided the names of all your hosts are in the file /etc/hostfile: > > [start up a shell under ssh-agent, type in passphrase to ssh-add] > > for i in `cat /etc/hostfile`; do > ssh -l root $i perl -pi -e 's/(PATH=\"[^"]+)\"/$1:\/usr\/sbin:\/sbin\"/' /etc/profile > ssh -l root $i adduser jobrun > done > > You might want to experiment with copies of /etc/profile when doing tricks > like that.... This time I actually managed to get it right at first shot, > but your mileage might vary ;) From jakob at ostenfeld.dk Sun Jun 4 18:19:17 2000 From: jakob at ostenfeld.dk (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: References: <20000605012344.V770@ostenfeld.dk> Message-ID: <20000605031917.W770@ostenfeld.dk> On Sun, 04 Jun 2000, Peter Jay Salzman wrote: > hi jakob, > > we have ssh on the beowulf frontend, but not on the nodes. any ideas on > automating installing ssh on the nodes? i haven't seen redhat 6.1 ssh rpms, > i guess that's a remnant of the USA's moronic crypto export policy (which i > understand was mostly lifted). I was wondering about the restrictions myself, as RH6.2 seems to ship with Kerberos5... Anyway, you can find RedHat-Crypto directories at your favourite FTP site holding ssh-1.2.27-7i as src.rpm which works nicely with RH6.1 and 6.2 at least. I was considering OpenSSH, now that it supports ssh-2 protocol. Never got around to migrate to ssh-2 because of the lame license. OpenSSH may well be worth investigating if you're about to install SSH on a lot of machines anyway. No I don't know how to automate SSH installation on a lot of nodes where you don't have remote access (except for telnet). Maybe you could write up an expect script to telnet into a node and run rpm -U /mnt/somewhere/ssh-... Actually, even if you haven't got the faintest idea about how to write an expect script, the autoexpect program should get you started. I managed to write an expect script for logging into a Cisco and pulling BGP tables in some 5 minutes or so, without _ever_ having used expect before. The autogenerated script will need light editing, but that should be fairly easily once you have the basic script written all for you. Check out autoexpect. > > i've tried to use mandrake's ssh packages on a redhat 6.1, but redhat balked > at the mandrake rpms. > > btw, i didn't know ssh had the capability to run stuff non-interactively. > that is a very cool thing to know! thank you very much! They provide the same features as rsh (but in a secure manner!), and then some. Really nice. -- ................................................................ : jakob@ostenfeld.dtu.dk : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From karsten.petersen at informatik.tu-chemnitz.de Sun Jun 4 23:32:15 2000 From: karsten.petersen at informatik.tu-chemnitz.de (Karsten Petersen) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: On Sun, 4 Jun 2000, Peter Jay Salzman wrote: > i haven't seen redhat 6.1 ssh rpms, you can find crypto-related RPMS for RedHat 6.1 and 6.2 on the FTP of RedHat Germany: ftp.redhat.de besides ssh there are pgp, gpg, openssh, stunnel, openssl, mod_ssl, ... Greets, Karsten -- ,-, Student of Computer Science at Chemnitz University of Technology ,-, | | EMail: Karsten@kapet.de WWW: http://www.kapet.de/ | | '-' Home: kapet@dollerup.csn V72 / 230 Phone: +49-177-82 35 136 '-' From c.best at fz-juelich.de Mon Jun 5 06:07:16 2000 From: c.best at fz-juelich.de (Christoph Best) Date: Sun Sep 7 01:00:20 2008 Subject: Benchmarking L2 cache on the Alpha 21264 Message-ID: <14651.41221.672255.829879@verne.local> Hi everybody, I am having a problem benchmarking the L2 cache performance on some Alpha 21264 systems from our clusters and wondering if anybody else has seen this. We use a benchmark that models the kernel of our main application (computational physics/lattice gauge theory). When running in L1 cache or beyond L2 cache, it gives perfectly consistent readings with deviations of 1% or less. But in L2 cache, the numbers from different runs may be off by as much as 20%, for which I cannot find a good explanation. If I plot performance vs. memory footprint, there is a clear shoulder from the L1 cache (64 KB), but then a kind of logarithmic behavior (double the memory use loses 30 MFlops). The benchmark consists of a completely deterministic set of floating-point operations, and I use a version that accesses memory completely consecutively. The systems are Compaq DS10 (466 MHz single proc.), ES40 (666 MHz 4-proc.), and API UP2000 (666 MHz dbl. proc.) under Linux. I did not see this effect under Tru64 on a XP1000 (666 MHz single proc.). The question is: Is there anything either in Linux or the 21264 that could account for such behavior? Could the cache be polluted by other processes that effectively? (The machines were basically idle during benchmarks). In particular, it seems that code running just inside the L2 cache (4 MB on the UP2000 and ES40) is not performing much better than code in main memory, which would be a pity. We expect cache performance to be a major determinant of total performance for our application: in L1 cache, the performance is about 600 MFlops, outside L2 cache it drops to about 200 MFlops. Inside L2 it varies between 300 and 450 MFlops. Thanks -Chris -- Christoph Best c.best@computer.org John von Neumann Institute for Computing/DESY http://www.oche.de/~cbest From RSchilling at affiliatedhealth.org Mon Jun 5 07:59:49 2000 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Sun Sep 7 01:00:20 2008 Subject: Beowulf metric postings. Message-ID: <51FCCCF0C130D211BE550008C724149E8FEC9A@mail1.affiliatedhealth.org> Some months ago, there was a discussion about hosting beowulf metrics. I offered to host them on my private web site, but in the midst of that discussion, I changed jobs (got a promotion!). So, I lost track. Have metrics been posted, and if not is there still an interest? I'd still be happy to host the list. Richard Schilling -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20000605/fd4d6a78/attachment.html From glindahl at hpti.com Mon Jun 5 08:10:59 2000 From: glindahl at hpti.com (Greg Lindahl) Date: Sun Sep 7 01:00:20 2008 Subject: Benchmarking L2 cache on the Alpha 21264 In-Reply-To: <14651.41221.672255.829879@verne.local> Message-ID: <000601bfcf00$41706da0$f69cfea9@hptilap.hpti.com> > But in L2 cache, the numbers from > different runs may be off by as much as 20%, for which I cannot find a > good explanation. Page coloring? > In particular, it seems that code running just inside the L2 cache (4 > MB on the UP2000 and ES40) is not performing much better than code in > main memory, which would be a pity. Which would be a smoking gun. -- greg From hjin at ceng.usc.edu Mon Jun 5 10:54:07 2000 From: hjin at ceng.usc.edu (Hai Jin) Date: Sun Sep 7 01:00:20 2008 Subject: CC-TEA'2000, Las Vegas - Online Proceedings Message-ID: <393BE93F.CF79A5FC@ceng.usc.edu> Dear All, The program and online proceedings of: The 2000 International Workshop on "Cluster Computing - Technologies, Environments, and Applications (CC-TEA'2000)" to be held in conjunction with PDPTA-2000 Las Vegas, Nevada, USA, June 26th-29th, 2000 In Co-operation with the "IEEE Task Force on Cluster Computing (TFCC)" can be found at: http://www.dgs.monash.edu.au/~rajkumar/CC-TEA2000/ http://www.dgs.monash.edu.au/~rajkumar/CC-TEA2000/program.html OR: http://ceng.usc.edu/~hjin/cc-tea2000.html Happy reading. -- Best wishes, CC-TEA'2000 organisers Raj, Hai, Toni From salim at ee.fit.edu Mon Jun 5 13:01:32 2000 From: salim at ee.fit.edu (Salim Mounir AlAoui) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: for remote commands you can go to: ftp.remotesensing.org then you go to /pub/sadm/rpms there you will find cfm rpm which after installed will permit you to use "scmd" command. You type scmd "" , it will run whatever command you want on any node of the beowulf. cfm is also very usefull to manage and keep track of your beowulf modifications. -------------------------------------------------------------------------- Salim Mounir Alaoui salim@ee.fit.edu Computer Science Dept. salaoui@cs.fit.edu Research Assistant. salim@ieee.org Florida Institute of Technology Melbourne, Florida Voice: (407) 537-8025. -------------------------------------------------------------------------- From goebel at his.com Mon Jun 5 17:04:01 2000 From: goebel at his.com (John Goebel) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: On Sun, 4 Jun 2000, Peter Jay Salzman wrote: > dear all, > > i'd like to: > > edit /etc/profile to include /sbin and /usr/sbin in PATH > adduser jobrun > > on our 40 nodes. is there a way of doing this without telnetting 40 times? > Take a look at cfengine. You can maintain system state better than just pushing out mistakes, you can pull changes instead of pushing changes from each node, and the syntax (AlthoughRatherSelfDocumenting) is straight forward. You can also do it through a des encrypted transfere. Also, prsh is handy. It beats writing 'for' and 'foreach' shell scripts. Or you can use rsync (ssh -e rync). The world is your oyster. John From joysarkar at jncasr.ac.in Tue Jun 6 10:04:29 2000 From: joysarkar at jncasr.ac.in (Mr.Joy Sarkar) Date: Sun Sep 7 01:00:20 2008 Subject: Announcing the existence of kamadhenu@jncasr, INDIA. Message-ID: Hi Folks, This is to announce the birth of kamadhenu, the first beowulf cluster at JNCASR, India. Its a 8 node Pentium III cluster built for molecular dynamics simulation. We have been successful with the project for which we thank the Open Source Community! For more info, you can visit kamadhenu at: http://www.jncasr.ac.in/kamadhenu Expecting you! Joy and Bala. *********************************************************************** Joy Sarkar Currently: Summer Research Fellow, Beowulf Cluster Project and Brillouin Scattering Lab, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore. INDIA. Also(!) : Student, Dept of Physics, Indian Institute of Technology, Kharagpur, INDIA-721302. *********************************************************************** From covenant at dirac.org Tue Jun 6 00:17:49 2000 From: covenant at dirac.org (Peter Jay Salzman) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: hi salim! this looks really good -- but i'm having trouble finding references to cfm on the net. before i install it, i'd like to take a look at some man pages and/or documentation. can't find it on freshmeat or gnu.org. can you give me its homepage? thanks! pete On Mon, 5 Jun 2000, Salim Mounir AlAoui wrote: > Date: Mon, 5 Jun 2000 16:01:32 -0400 (EDT) > From: Salim Mounir AlAoui > To: beowulf@beowulf.org > Subject: Re: automating commands on nodes > > > > for remote commands you can go to: > ftp.remotesensing.org then you go to /pub/sadm/rpms > there you will find cfm rpm which after installed will permit you to use > "scmd" command. You type scmd "" , it will run > whatever command you want on any node of the beowulf. cfm is also very > usefull to manage and keep track of your beowulf modifications. From wildfire at progsoc.uts.edu.au Tue Jun 6 00:41:41 2000 From: wildfire at progsoc.uts.edu.au (Anand Kumria) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: ; from covenant@dirac.org on Sun, Jun 04, 2000 at 05:11:58PM -0700 References: <20000605012344.V770@ostenfeld.dk> Message-ID: <20000606174141.J8460@ftoomsh.progsoc.uts.edu.au> On Sun, Jun 04, 2000 at 05:11:58PM -0700, Peter Jay Salzman wrote: > hi jakob, > > we have ssh on the beowulf frontend, but not on the nodes. any ideas on > automating installing ssh on the nodes? i haven't seen redhat 6.1 ssh rpms, Unless all of your nodes are exposed on the public internet, do you need ssh on them? I wouldn't have thought so. > i guess that's a remnant of the USA's moronic crypto export policy (which i > understand was mostly lifted). For source code, mostly. Binaries are still troublesome. > i've tried to use mandrake's ssh packages on a redhat 6.1, but redhat balked > at the mandrake rpms. oh well. so much for a single packaging system. > btw, i didn't know ssh had the capability to run stuff non-interactively. > that is a very cool thing to know! thank you very much! Something the original poster hasn't taken into account is that sometimes some programs wil require a pty allocated and you'll need to use the -t switch to ssh. Anand From wildfire at progsoc.uts.edu.au Tue Jun 6 01:24:11 2000 From: wildfire at progsoc.uts.edu.au (Anand Kumria) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: ; from covenant@dirac.org on Tue, Jun 06, 2000 at 12:17:49AM -0700 References: Message-ID: <20000606182411.L8460@ftoomsh.progsoc.uts.edu.au> On Tue, Jun 06, 2000 at 12:17:49AM -0700, Peter Jay Salzman wrote: > hi salim! > > this looks really good -- but i'm having trouble finding references to cfm > on the net. before i install it, i'd like to take a look at some man pages > and/or documentation. can't find it on freshmeat or gnu.org. > > can you give me its homepage? www.remotesensing.org; go the CVS repository, choose sadm then cfm. Anand From he at Physics.usyd.edu.au Tue Jun 6 02:23:23 2000 From: he at Physics.usyd.edu.au (Hao He) Date: Sun Sep 7 01:00:20 2008 Subject: Invited speaker request Message-ID: Dear Beowulf experts, There will be a Conference on Computational Physics (CCP2000) at the end of this year in Queensland, Australia. As the Chair of Open Source Session, I need to find an invited speaker from the open source community urgently. There will be a guy from MS talking about NT clustering. It would be great if we have someone from open source talking about Linux clustering or any similar open source projects. The conference will pay this person's return air ticket to Australia and accommodations. If you are interested to be this person or would like to recommend someone, please email me at he@physics.usyd.edu.au. For more information about CCP2000, visit www.physics.uq.edu.au/CCP2000. Thank you. Dr. Hao He From gerry at cs.tamu.edu Tue Jun 6 04:50:41 2000 From: gerry at cs.tamu.edu (Gerry Creager N5JXS) Date: Sun Sep 7 01:00:20 2008 Subject: Invited speaker request References: Message-ID: <393CE591.7BEF14EC@cs.tamu.edu> Hao He wrote: > > Dear Beowulf experts, > > There will be a Conference on Computational Physics (CCP2000) at the end > of this year in Queensland, Australia. As the Chair of Open Source > Session, I need to find an invited speaker from the open source community > urgently. There will be a guy from MS talking about NT clustering. It > would be great if we have someone from open source talking about Linux > clustering or any similar open source projects. The conference will pay > this person's return air ticket to Australia and accommodations. If you > are interested to be this person or would like to recommend someone, > please email me at he@physics.usyd.edu.au. > > For more information about CCP2000, visit www.physics.uq.edu.au/CCP2000. Greg? RGB? You guys sound like naturals. -- Gerry Creager gerry@cs.tamu.edu, gerry@page4.cs.tamu.edu Network Engineering |Geodesy Computer Science Department |Satellite Geodesy and Control Texas A&M University | 979.458.4020 From Tim.Tenhave at compaq.com Tue Jun 6 05:21:27 2000 From: Tim.Tenhave at compaq.com (Tenhave, Tim) Date: Sun Sep 7 01:00:20 2008 Subject: Benchmarking L2 cache on the Alpha 21264 Message-ID: <21ECC6E090DCD21180D20000F809A18B03B7C2BF@exctay-02.tay.dec.com> Hi Chris, I posed your question to some folks in Compaq. The resounding answer was lack of page coloring in Linux. There are some linker optimizations in Tru64 UNIX, but page coloring was the most possible reason. I was also told that Greg Lindahl and Joe Martin have posted patches to help fix this. Sorry I do not have a link right now. I could find one if you cannot. Hope this helps, Tim From rgb at phy.duke.edu Tue Jun 6 06:36:58 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: <20000606174141.J8460@ftoomsh.progsoc.uts.edu.au> Message-ID: On Tue, 6 Jun 2000, Anand Kumria wrote: > On Sun, Jun 04, 2000 at 05:11:58PM -0700, Peter Jay Salzman wrote: > > hi jakob, > > > > we have ssh on the beowulf frontend, but not on the nodes. any ideas on > > automating installing ssh on the nodes? i haven't seen redhat 6.1 ssh rpms, > > Unless all of your nodes are exposed on the public internet, do you need > ssh on them? I wouldn't have thought so. Goodness. We just had an extended discussion of this, and it should be in the archives from just last week or two weeks ago. The answer is "no, but it often won't matter and makes good sense". ssh provides certain services (notably forwarding of ports and a universally portable environment in /etc/environment) that can be very useful to a beowulf user at the expense of about 0.15 seconds per connection (plus any time spend encrypting traffic, which is usually negligible for small files). In terms of net load, bproc (being actively worked on by scyld.com) is by far the most efficient way to run remote shell commands and so forth on a beowulf (I haven't yet tested it personally but they report times of 0.01 seconds for a file copy, IIRC from last week), but integrates deeply with the kernel to accomplish this and so isn't for everyone. rsh costs ~0.1[1-5] seconds for a (small) file copy (or any other kind of connection) but provides "no" security, no cross-network encryption, no forwarding or ports or preloading of environment. ssh costs ~0.2[5-9] for a small file copy. If you are running ssh on the head node (presumably bundled into an RPM or ready-to-install tarball) then the effort required to install it on the nodes via e.g. kickstart, rsync, or whatever is essentially zero. If all you use remote shells for is to synchronize a few /etc files, enable MPI and PVM to (infrequently) spawn remote processes, allow login access to the nodes from hosts outside the gateway node and so forth there is really no reason to avoid using ssh and (strictly IMHO) there are several good reasons to use it. If you use remote shells a LOT for a LARGE true beowulf, you should almost certainly use bproc as it is likely to be on a track that will evolve into a true distributed beowulf kernel (peering into my crystal ball with a wink at the Scyld folks) and you can probably contribute to the development. Perhaps there is some ground in between for rsh, but I personally would like to see it killed dead as it is a brainless and obsolete security incident waiting to happen IN ADDITION TO having been designed back when issues like the passing of environments and forwarding of ports hadn't yet come to the foreground. Even if you configure ssh to use no encryption and not to verify connections at all (making it "just like" rsh) you still get /etc/environment and port forwarding. > > > i guess that's a remnant of the USA's moronic crypto export policy (which i > > understand was mostly lifted). > > For source code, mostly. Binaries are still troublesome. There are several issues associated with ssh distribution. One is the RSA patent that is due to expire in September. However, I've heard that they've applied for an extension and that extensions are usually knee-jerk granted. Hopefully this time sanity will prevail and the knee won't jerk. The RSA patent is NOT international because it is directly based on work published almost 100 years ago, and international patents are not granted for ideas based on published work. Finally yes, there was/is the USA's moronic crypto policy. For all of these reasons, many crypt-concerned software companies are finding it expedient to become multinational (even if they are totally home-grown) and to distribute their encryption software from a European office. IBM has just played this trick. Looks like Red Hat is right in there. It is perfectly legal for them to produce and distribute RSA-based software in Europe. I actually have no idea if one is breaking the law (nominally) if one purchases RH or SuSE linux "packaged in Germany" that contains ssh with all the RSA stuff included, or if one downloads it from a European site. I must say that I don't much care, either -- US software patents are often nonsense because the folks in the patent office are utterly ignorant of what is de facto in the public domain. At this moment I could do something like say: "Hmmm, perhaps neural networks can be used to identify clown faces in bank cameras". I can go find and build and train an utterly prosaic NN for that purpose. If I then file a patent for a "NN clown-face identification engine for use in the banking industry" there is an excellent chance that it will be granted. If suddenly the banking world realizes that nearly fifty percent of their customers in clown faces are there to rob the bank and not to make a deposit after working a kid's birthday party and my company "CF-ID Inc." takes off, I can then squash possible competitors when they go to the SAME books I went to to build my NN to duplicate the idea. It doen't matter that the patent is stupid and indefensible. Unless a big player tries to get into the market and has the capital for a court fight, I'm pretty safe and can run my own little monopoly for many, many years. Think it can't happen? It has. The "idea" of using NN's in credit card fraud detection is patented this very day, in spite of it being an utterly prosaic application of the NN. Although it is indefensible, it worked long enough for the company that obtained the patent to build themselves a de facto monopoly that still has very few competitors. Probably oversimplified, but I assure you -- if Sterling, Becker et. al. had tried to PATENT the beowulf concept, the pre-existence of PVM and MPI and/or Gnu and Linux would very likely not have been enough to keep it from being granted. Companies like paralogic and alta tech would have to license the "technology" from S&B Inc. A software patent is much stronger protection, in its way, than a software copyright, as one can generally reverse engineer a copyrighted software product from an API, but one has to really fight to show that a patent, once granted, is invalid. > > > i've tried to use mandrake's ssh packages on a redhat 6.1, but redhat balked > > at the mandrake rpms. > > oh well. so much for a single packaging system. The issue is usually how they interface with e.g. pam. ssh is pretty complicated stuff. A "perfectly built" RPM would probably remain portable, but a sloppily built one might well fail simply because it has dependencies that weren't correctly established (by the builder) at build time. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From marini at pcmenelao.mi.infn.it Tue Jun 6 06:55:21 2000 From: marini at pcmenelao.mi.infn.it (Francesco Marini) Date: Sun Sep 7 01:00:20 2008 Subject: Problems with MPICH 1.2 and Beowulf/Linux Message-ID: <200006061355.PAA26685@pcmenelao.mi.infn.it> Hi all, I've got a really weird problem with MPICH 1.2. The system consists of a server and 16 computing nodes, all diskless, mounting root via NFS from the server. It works very well with pvm and LAM-MPI. Now, I'm trying to compile the latest source of MPICH, the make process goes well, but when I try to "make testing" I get this output (repeated for all tests using more than 1 machine) : *** Testing MPI_Test *** pcwalhalla : Mon May 29 16:27:09 CEST 2000 /work/staff/marini/mpich-1.2.0/bin/mpicc -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR -DFORTRANUNDERSCORE -DHAVE_MPICHCONF_H -DHAVE_STDLIB_H=1 -DUSE_STDARG=1 -DHAVE_LONG_DOUBLE=1 -DHAVE_LONG_LONG_INT=1 -DHAVE_PROTOTYPES=1 -DHAVE_SIGNAL_H=1 -DHAVE_SIGACTION=1 -c persistent.c /work/staff/marini/mpich-1.2.0/bin/mpicc -o persistent persistent.o *** Testing MPI_Recv_init *** Differences in persistent.out 2,5c2,8 < rm_3383: p4_error: rm_start: net_conn_to_listener failed: 3165 < p0_20161: p4_error: Timeout in making connection to remote process on node1: 0 < bm_list_20162: p4_error: interrupt SIGINT: 2 < rm_l_1_20168: p4_error: interrupt SIGINT: 2 --- > Receiving message 1 > Received message 1 > Receiving message 2 > Received message 2 > Receiving message 3 > Received message 3 > Completed all receives 7d9 < rm_20167: p4_error: interrupt SIGINT: 2 pcwalhalla : Mon May 29 16:32:12 CEST 2000 /work/staff/marini/mpich-1.2.0/bin/mpicc -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR -DFORTRANUNDERSCORE -DHAVE_MPICHCONF_H -DHAVE_STDLIB_H=1 -DUSE_STDARG=1 -DHAVE_LONG_DOUBLE=1 -DHAVE_LONG_LONG_INT=1 -DHAVE_PROTOTYPES=1 -DHAVE_SIGNAL_H=1 -DHAVE_SIGACTION=1 -c persist.c /work/staff/marini/mpich-1.2.0/bin/mpicc -o persist persist.o *** Testing MPI_Startall/Request_free *** Differences in persist.out 2,5c2 < rm_3388: p4_error: rm_start: net_conn_to_listener failed: 3171 < p0_20318: p4_error: Timeout in making connection to remote process on node1: 0 < bm_list_20319: p4_error: interrupt SIGINT: 2 < rm_l_1_20325: p4_error: interrupt SIGINT: 2 --- > No errors 7d3 < rm_20324: p4_error: interrupt SIGINT: 2 pcwalhalla : Mon May 29 16:37:14 CEST 2000 /work/staff/marini/mpich-1.2.0/bin/mpicc -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR -DFORTRANUNDERSCORE -DHAVE_MPICHCONF_H -DHAVE_STDLIB_H=1 -DUSE_STDARG=1 -DHAVE_LONG_DOUBLE=1 -DHAVE_LONG_LONG_INT=1 -DHAVE_PROTOTYPES=1 -DHAVE_SIGNAL_H=1 -DHAVE_SIGACTION=1 -c persist2.c /work/staff/marini/mpich-1.2.0/bin/mpicc -o persist2 persist2.o *** Testing MPI_Startall(Bsend)/Request_free *** Differences in persist2.out 2,5c2 < rm_3391: p4_error: rm_start: net_conn_to_listener failed: 3177 < p0_20473: p4_error: Timeout in making connection to remote process on node1: 0 < bm_list_20474: p4_error: interrupt SIGINT: 2 < rm_l_1_20480: p4_error: interrupt SIGINT: 2 --- Seems like MPICH cannot start the remote process or cannot establish the connection. The crazy thing is that with pvm and LAM-MPI all goes well. Any idea ? Second : I've got some prob compiling ScaLapack with LAM-MPI, gcc and pgf77 (f77 compiler from Portland Group), it gives a lot of unresolved symbols regarding MPI. Anyone succeded in compiling them under same configuration ? Thank you all in advance, Franz Marini --------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : marini@pcmenelao.mi.infn.it --------------------------------------------- From c.best at fz-juelich.de Tue Jun 6 07:27:52 2000 From: c.best at fz-juelich.de (Christoph Best) Date: Sun Sep 7 01:00:20 2008 Subject: Benchmarking L2 cache on the Alpha 21264 In-Reply-To: <200006052111.RAA10427@orourke.mclinux.com> References: <14651.41221.672255.829879@verne.local> <200006052111.RAA10427@orourke.mclinux.com> Message-ID: <14653.2007.88779.708786@verne.local> Hi everybody, thanks for all the help. I think the problem I am seeing is the lack of page coloring. I will try Joseph Martin's kernel patch asap - we are very interested in making efficient use of the L2 cache as it is so big (4 MB on some of our machines). In particular, page coloring should be a very good idea for cluster nodes where we do not care about the actual performance of the kernel page allocator (just running one process a long time in a fixed page setup), but the penalties for cache misses are very high. We easily see a factor of three in MFlops numbers between L1 cache and memory. BTW, we use the Compaq compiler which gives about 20% more MFlops than the gnu compiler in L1 cache. Thanks again -Chris -- Christoph Best c.best@computer.org John von Neumann Institute for Computing/DESY http://www.oche.de/~cbest From glindahl at hpti.com Tue Jun 6 07:33:13 2000 From: glindahl at hpti.com (Greg Lindahl) Date: Sun Sep 7 01:00:20 2008 Subject: Benchmarking L2 cache on the Alpha 21264 In-Reply-To: <21ECC6E090DCD21180D20000F809A18B03B7C2BF@exctay-02.tay.dec.com> Message-ID: <001b01bfcfc4$255f3780$f69cfea9@hptilap.hpti.com> > I was also told that Greg Lindahl and Joe Martin have posted > patches to help > fix this. And btw, here is our status: My patch doesn't quite work right, but I think I know how to fix it. Joe's patch, different approach, doesn't work quite right either. He probably has some ideas... We know what the right answer is (from Tru64), we know some tests that reveal if it is working well or not. What we could use would be a volunteer to drive this thing home. I'm way too busy. -- g From c.best at fz-juelich.de Tue Jun 6 07:42:10 2000 From: c.best at fz-juelich.de (Christoph Best) Date: Sun Sep 7 01:00:20 2008 Subject: Benchmarking L2 cache on the Alpha 21264 In-Reply-To: <393D0BAD.E26F1301@quadrics.com> References: <14651.41221.672255.829879@verne.local> <200006052111.RAA10427@orourke.mclinux.com> <14653.2007.88779.708786@verne.local> <393D0BAD.E26F1301@quadrics.com> Message-ID: <14653.3273.414147.224625@verne.local> Hi, I have been asked to post where I found the patch. Joseph Martin posted this to the linux-kernel list on April 18: http://www.uwsg.indiana.edu/hypermail/linux/kernel/0004.2/0503.html But if he should be reading this, maybe he has a more recent version? -Chris -- Christoph Best c.best@computer.org John von Neumann Institute for Computing/DESY http://www.oche.de/~cbest From vor+ at pitt.edu Tue Jun 6 08:58:05 2000 From: vor+ at pitt.edu (Victor Ortega) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: On Tue, 6 Jun 2000, Robert G. Brown wrote: > Even if you configure ssh to use no encryption and not to verify > connections at all (making it "just like" rsh) you still get > /etc/environment and port forwarding. I'm glad someone said it. I was going to say it myself otherwise. Although I'll admit I haven't done this, it should be possible to configure ssh such that outside connections to the head node are encrypted, but connections within the cluster are unencrypted (for the sake of those worried about performance degradation within the cluster due to ssh). Internal authentication need not be TOTALLY disabled; simply set up public and private keys on all the nodes and there'll still be a level of security--even some bad guy who brings in a computer and attaches it to the internal network will not be able to just log into the other nodes without at least having a public key. Also, the security and convenience features of ssh make it almost a must for those wishing to connect to a cluster from an external location; at that point, having just ssh (and not both ssh and rsh) will make administration and configuration of the cluster easier. I will give that those who absolutely refuse to have ssh on their system can still get away with using SRP for secure connections to the cluster and then use rsh within the cluster (and therefore still have both security and high performance), but again, that's still two packages that need to be maintained instead of just one. Victor p.s. check out http://srp.stanford.edu/srp/ for information on SRP, a backwards-compatible, secure replacement for telnet and ftp. From glindahl at hpti.com Tue Jun 6 09:20:03 2000 From: glindahl at hpti.com (Greg Lindahl) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: <002101bfcfd3$11b6b960$f69cfea9@hptilap.hpti.com> > Although I'll admit I haven't done this, it should be possible to > configure ssh such that outside connections to the head node are > encrypted, but connections within the cluster are unencrypted This is a pain. You have to recompile sshd to allow unencrypted connections. Then there is no existing policy option to enforce external connections being encrypted. Gaah. -- greg From bnh at dimension6.com Tue Jun 6 10:32:26 2000 From: bnh at dimension6.com (brad) Date: Sun Sep 7 01:00:20 2008 Subject: alpha multia beowulf cluster -- ideas Message-ID: Hello, I was considering building a beowulf cluster based on alpha multia's. has anyone tried this? what kind of performance can i generally expect? does anyone know of any resources online regarding this? Thanks, Brad From rgb at phy.duke.edu Tue Jun 6 11:49:37 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: On Tue, 6 Jun 2000, Victor Ortega wrote: > On Tue, 6 Jun 2000, Robert G. Brown wrote: > > Even if you configure ssh to use no encryption and not to verify > > connections at all (making it "just like" rsh) you still get > > /etc/environment and port forwarding. > > I'm glad someone said it. I was going to say it myself otherwise. > > Although I'll admit I haven't done this, it should be possible to > configure ssh such that outside connections to the head node are > encrypted, but connections within the cluster are unencrypted (for the > sake of those worried about performance degradation within the cluster > due to ssh). Internal authentication need not be TOTALLY disabled; > simply set up public and private keys on all the nodes and there'll > still be a level of security--even some bad guy who brings in a > computer and attaches it to the internal network will not be able to > just log into the other nodes without at least having a public key. I agree, although my measurements (published last week on the list) do show that the bulk of the "cost" of ssh relative to rsh comes from the original RSA handshake, not from the encryption. If ssh is build with --with-none defined, one can call ssh as ssh -c none whereever whatever to skip crypting the net traffic. I believe that you are right, though, in that ssh could be set up to do full RSA authentication on connections to the head node and then do basically no host authentication and no encryption between nodes on the private network (in)side. I'll see if I can work out the appropriate configuration files and/or wrappers and if I can I'll publish them back to the list and in the book under construction. I should probably do an rshbench of ssh when RSA host authentication is turned off anyway to see what fraction of the overhead is associated with reading /etc/environment and managing any forwarded ports. I agree with the rest of your note as well. Net snooping has been responsible for the bulk of the successful cracks into our department over the last fifteen years or so. It is easiest to maintain just one of ssh/rsh (and not both) and given this choice, ssh is the obvious one. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Tue Jun 6 11:51:14 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: <002101bfcfd3$11b6b960$f69cfea9@hptilap.hpti.com> Message-ID: On Tue, 6 Jun 2000, Greg Lindahl wrote: > > Although I'll admit I haven't done this, it should be possible to > > configure ssh such that outside connections to the head node are > > encrypted, but connections within the cluster are unencrypted > > This is a pain. You have to recompile sshd to allow unencrypted connections. > Then there is no existing policy option to enforce external connections > being encrypted. Gaah. And the marginal gain in performance is very small unless you are regularly using ssh to send large files. Most of the cost is in the original RSA connection, not the encryption. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From vor+ at pitt.edu Tue Jun 6 12:06:26 2000 From: vor+ at pitt.edu (Victor Ortega) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: On Tue, 6 Jun 2000, Robert G. Brown wrote: > I agree, although my measurements (published last week on the list) do > show that the bulk of the "cost" of ssh relative to rsh comes from the > original RSA handshake, not from the encryption. But I believe that your benchmarks were done with copying small files; I am worried that forwarding a full X connection, encrypted, over ssh from some internal node (ssh into head node, ssh into some internal node, load up some big GUI) will incur a big performance penalty. I tried this yesterday with a simple two-hop connection, and the GUI was twice as slow (it was slow enough already with just a single encrypted X connection going over our external 10base-T network). Unfortunately I have no benchmarks for this. Aside from that, I agree that the cost of encrypting communications within the internal network is probably negligible. Victor From rgb at phy.duke.edu Tue Jun 6 12:08:18 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: diskless alphalinux nodes In-Reply-To: <393D3B4C.32DD7025@okstate.edu> Message-ID: On Tue, 6 Jun 2000, Mathew Lee wrote: > ....additionally I searched the archives for diskless and found a reference from > Oct. 1998, where you talk about a diskless booting sequence ...I have attached it > to refresh your memory. I was wondering if the diskless.tar.gz is still available, > and/or if it has been updated.....also, is there a place that I could find > additional information on diskless booting...mounting root-nfs or by other > means....(ramdisk or coda possibly) I largely abandoned this particular approach because new kernels came out that supported much better methods. The most intriguing is Greg Warnes NFS hack that permits the normal installation of just one server to support N nodes without creating any host-specific export directories at all -- I think he posted it last week. However, I believe that there are other packages out there as well. There are three different levels of problems to solve setting up and running diskless systems. The first is getting a kernel to load (via the net with special proms on a NIC or from a boot floppy). The second is getting the kernel you boot to NFS mount your root (and other) file system(s). The third is efficiently laying out exports on a server so that you provide writeability where a given system really has to have it. Pretty much all unixoid systems will be unhappy unless they can write /var, /tmp, /etc and /dev, although one can often rig /etc with symlinks to writeable space in e.g. /var/etc to fake it. Greg's NFS hack allows a single fs to be exported but gives writability and remapped identity to files via an IP-based tag, so e.g. /etc/ld.so.cache as mounted on host xxxxxxxx is really exported as /etc/ld.so.cache_xxxxxxxx on the server. IIRC, that is (he may correct me). I'd expect that his changes are moderately portable since they are likely well above the machine hardware layer of the kernel. However, once you've figured out how to build a NFS lilo boot floppy (which isn't that difficult from the current howtos and e.g. mkinitrd) it is also pretty simple to go diskless by just giving each node e.g. /exports/[b1,b2,b3...] exported to each host as its root and then cloning everything BUT /usr into it (that is, make /usr a separate filesystem on the server, usually, and export it RO to all the hosts). This wastes a bit of space but space is cheap. You'll still need to periodically rsync the node roots with a carefully determined exclusion list, as otherwise e.g. RPM installs on the server won't properly propagate to the nodes. I might tackle an NFS/diskless installation again one day, but if I do I'm almost certainly going to work from Greg's or one of the others that have been posted/advertised on the list in the last few months. Use the search engine to find them. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From warnes at biostat.washington.edu Tue Jun 6 12:40:18 2000 From: warnes at biostat.washington.edu (Gregory R. Warnes) Date: Sun Sep 7 01:00:20 2008 Subject: diskless alphalinux nodes In-Reply-To: Message-ID: On Tue, 6 Jun 2000, Robert G. Brown wrote: RGB>> On Tue, 6 Jun 2000, Mathew Lee wrote: RGB>> RGB>> > ....additionally I searched the archives for diskless and found a reference from RGB>> > Oct. 1998, where you talk about a diskless booting sequence [snip] RGB>> RGB>> I largely abandoned this particular approach because new kernels came RGB>> out that supported much better methods. The most intriguing is Greg RGB>> Warnes NFS hack that permits the normal installation of just one server RGB>> to support N nodes without creating any host-specific export directories RGB>> at all -- I think he posted it last week. The NFS mod that Robert mentions is called ClusterNFS and has a home page at http://ClusterNFS.sourceforge.net RGB>> Greg's NFS hack allows a single fs to be exported but gives writability RGB>> and remapped identity to files via an IP-based tag, so e.g. RGB>> /etc/ld.so.cache as mounted on host xxxxxxxx is really exported as RGB>> /etc/ld.so.cache_xxxxxxxx on the server. IIRC, that is (he may correct RGB>> me). I'd expect that his changes are moderately portable since they are RGB>> likely well above the machine hardware layer of the kernel. Actually, ClusterNFS runs entirely in userspace, so that *no* kernel modifications are required on either the server or the client. Since ClusterNFS is a simple extension to the standard Universal-NFS server, which is reported to work on a wide variety of OS's and CPU's, I expect it will compile and work out-of-the-box with Alpha-Linux. (Of course, I haven't actually tried anything but Intel Linux. Let me know if something doesn't work.) RGB>> However, once you've figured out how to build a NFS lilo boot floppy RGB>> (which isn't that difficult from the current howtos and e.g. mkinitrd) RGB>> it is also pretty simple to go diskless by just giving each node e.g. RGB>> /exports/[b1,b2,b3...] exported to each host as its root and then RGB>> cloning everything BUT /usr into it (that is, make /usr a separate RGB>> filesystem on the server, usually, and export it RO to all the hosts). RGB>> This wastes a bit of space but space is cheap. You'll still need to RGB>> periodically rsync the node roots with a carefully determined exclusion RGB>> list, as otherwise e.g. RPM installs on the server won't properly RGB>> propagate to the nodes. I created ClusterNFS explicitly to get away from the need to keep separate directories for each client (either on NFS or on the client itself). Keeping {track of, propagating} changes to all of the appropriate directories gets hairy fast. Even if you use rsync, it is quite difficult to figure out everything that should be {in,ex}cluded. I used to do this on our cluster, and every couple of months I'd discover that something else needed to be excluded that wasn't. In addition, particularly painful things happen when rsync tries to update the libraries it is using on the clients.... -Greg From rgb at phy.duke.edu Tue Jun 6 13:18:50 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: automating commands on nodes In-Reply-To: Message-ID: On Tue, 6 Jun 2000, Victor Ortega wrote: > On Tue, 6 Jun 2000, Robert G. Brown wrote: > > I agree, although my measurements (published last week on the list) do > > show that the bulk of the "cost" of ssh relative to rsh comes from the > > original RSA handshake, not from the encryption. > > But I believe that your benchmarks were done with copying small files; Big ones too. I tested a 1M file copy at 0.67 sec for ssh (using default idea encryption) vs 0.2 sec for rsh. I also tested e.g. blowfish and one can interpolate a bit and still get encryption. All this also depends strongly on the speed of the CPUs. A rough estimate of 1 (extra) second for each 2 MB sent is probably not unreasonable, although you might get 3 MB in a second on a good day or even four or five with blowfish. Beyond that you're approaching wirespeed. > I am worried that forwarding a full X connection, encrypted, over ssh > from some internal node (ssh into head node, ssh into some internal > node, load up some big GUI) will incur a big performance penalty. I > tried this yesterday with a simple two-hop connection, and the GUI was > twice as slow (it was slow enough already with just a single encrypted > X connection going over our external 10base-T network). Unfortunately > I have no benchmarks for this. Hmm, hadn't thought about this, as I try not to run graphics-heavy X apps over any kind of shell connection -- with linux one can usually run them locally -- although e.g. xterms and simple Tk-ish apps work fine. I'll have to see if I can set this up to measure it. However, at an extra 0.5 seconds per megabyte, I agree that you won't want to play a hi-res video game this way and that netscape should be significantly delayed. "Simple" X apps, though (e.g. xterm) should be ok, and I can't think of why one would need to run e.g. netscape on a node. I also have no idea if a double ssh doubles this overhead. It might be that a->b is encrypted and then b->c is REencrypted. Or it might be that the b->c transfer forwards the keys (so to speak). I'll try to test this as well. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jakob at ostenfeld.dk Tue Jun 6 14:39:16 2000 From: jakob at ostenfeld.dk (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Sun Sep 7 01:00:20 2008 Subject: [Announce] The jobd Load Balancer Message-ID: <20000606233916.E770@ostenfeld.dk> Hi all! You may remember that I asked for a simple load balancing system for parallel make jobs, about a week (or two) ago. Some suggestions came up, and I was even offered the opportunity to beta-test a commercial queuing system. I decided to first play around with some of the queuing systems already freely available out there, as well as various parallel make variants. I was certain that parallel makes was something a lot of people did and therefore there would be mature and well functioning tools for the job - as usual. I need GNU Make, so BSD pmake is out. Customs GNU Make doesn't work properly and isn't maintained. PVM GNU Make may work, but didn't for me, I also have the feeling that this is too much of a hack to be relied upon. GNU Queue was close, except that it breaks under load. Generic NQS was too big (too slow for short jobs, to complex). Fixing GNU Queue wasn't an option for me, with all due respect that is by far the ugliest code I've seen in a long time. So I did what I originally wanted to avoid: Wrote up a new load balancing system from scratch. It's very simple, providing an rsh like command ``jsh'' which instead of taking a hostname argument (as rsh does) takes a job-type argument. It communicates with the jobd daemon running on the local host, and finds the best host for the job-type given. The job is then executed on this best host for the job. For example, running the hostname command as a gcc-type job: [joe@eagle joe]$ jsh -t gcc hostname eagle [joe@eagle joe]$ jsh -t gcc hostname albatros It's simple, efficient, and even somewhat secure. (I believe it is secure if the network is physically secure and and nodes in the /etc/jobd.hosts file can be trusted). It is available at http://ostenfeld.dk/~jakob/jobd/ The current version is 0.1, which should indicate that there is still work to be done. However, the system seems to work for me, and I'll be using it at work the next few days to see how it fares. There will be one major update for the resource handling soon, but all in all I think the system is ready for some use and feedback. Hence this notice :) So if anyone besides me is sick and tired of waiting for those half-hour C++ compilations, here's a chance to justify a Beowulf for your boss ;) Cheers, -- ................................................................ : jakob@ostenfeld.dtu.dk : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From jgscribner at riversidepaper.com Wed Jun 7 07:48:43 2000 From: jgscribner at riversidepaper.com (Justin Scribner) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe Message-ID: <11251BCC86FCD1118CFA00805F6F91B92C0CCF@CBC> I truly apologize for posting this of message to the list but feel that I have no other recourse. I have been trying to unsubscribe for weeks by sending messages to both beowulf-request@beowulf.gsfc.nasa.gov and Majordomo@beowulf.gsfc.nasa.gov but get nothing but undeliverable messages (even when sent from other completely disparate addresses). All messages to other addresses work fine and I doubt there is a problem with the aforementioned addresses. I tried to unsubscribe from the web page, but haven't received a response from there either. I do appreciate the discussions thus far and learned that I need to consider Mosix rather than Beowulf. Thank you in advance and my sincerest apologies to those who have seen one-too-many unsubscribes posted to mailing-lists. Justin G. Scribner MIS - Technician jgscribner@riversidepaper.com From rgb at phy.duke.edu Wed Jun 7 08:23:05 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe In-Reply-To: <11251BCC86FCD1118CFA00805F6F91B92C0CCF@CBC> Message-ID: On Wed, 7 Jun 2000, Justin Scribner wrote: > I truly apologize for posting this of message to the list but feel that I > have no other recourse. I have been trying to unsubscribe for weeks by > sending messages to both beowulf-request@beowulf.gsfc.nasa.gov and > Majordomo@beowulf.gsfc.nasa.gov but get nothing but undeliverable messages > (even when sent from other completely disparate addresses). All messages to > other addresses work fine and I doubt there is a problem with the > aforementioned addresses. I tried to unsubscribe from the web page, but > haven't received a response from there either. I do appreciate the > discussions thus far and learned that I need to consider Mosix rather than > Beowulf. Thank you in advance and my sincerest apologies to those who have > seen one-too-many unsubscribes posted to mailing-lists. It isn't working because the beowulf.gsfc.nasa.gov address is defunct and obsolete. This is one (of many) reasons to use the proper domain address: www.beowulf.org. This is a "portable" entity and has followed Don Becker, Erik Hendriks, and many of the rest of the NASA Goddard folks to Scyld. I believe that www.beowulf.org is currently actually at scyld.com but I'm not sure and the reason for using the domain name is that it won't matter. Whereever it really is, that's where you'll go. So, try sending your unsubscribe message to majordomo@beowulf.org. I believe that we are VERY close to having the beowulf list managed by mailman, which will be a very Good Thing (tm). If/when this finally occurs, you can subscribe and unsubscribe and generally control the flow of list traffic directly from a password protected web interface. This is a very desirable thing... rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From josip at icase.edu Wed Jun 7 08:35:13 2000 From: josip at icase.edu (Josip Loncaric) Date: Sun Sep 7 01:00:20 2008 Subject: Athlon + PC133: no ECC? Message-ID: <393E6BB1.2557EF4@icase.edu> Athlons do well on floating point, so we've been looking at building some Athlon nodes for our cluster, using PC133 memory of course. This requires VIA's KX133 chipset (now) or KT133 (near future) or AMD's 760 (more distant future). On June 5th, AMD finally released Athlons with full speed on-chip cache (see http://www.amd.com/news/prodpr/20108.html). These will come in OEM 'Slot A' packaging for the existing Athlon motherboards (e.g. those based on VIA's KX133 chipset), but 'Socket A' packaging will be preferable. The 'Socket A' Athlons will require the KT133 chipset from VIA (see http://www.viatech.com/news/00kt133launch.htm), at least until AMD gets its 760 chipset out the door. So far so good. Unfortunately, while VIA's KX133 datasheet at least mentioned 'optional' ECC capability, the KT133 datasheet (VT8363 North Bridge Controller, see http://www.viatech.com/pdf/productinfo/kt133.pdf) makes no pretense of having any ECC features. Our applications require a lot of RAM (16-32GB or so), and we expect individual node uptimes of several months. Windows users who reboot their 128MB machines daily would not even see a problem, but we need ECC. It makes me very uneasy to even think about tracking down an intermittent memory problem in 32GB of RAM without ECC capability. Am I correct in concluding that the new 'Socket A' chipset KT133 will have *no* DRAM data integrity features? Does anyone know if the current motherboards based on the KX133 (the 'Slot A' chipset) actually *use* ECC? My reading of the Asus K7V manual is that while this motherboard will accept an ECC memory module, there is *no* way to tell BIOS to use DRAM ECC (only an L2 cache ECC mode is mentioned). Moreover, the datasheets talk about '64-bit system memory interface' in both cases, so it seems that the KX133 optional ECC feature is external to the VIA VT8371 chip. Do any KX133 motherboards actually implement ECC on DRAM? If ECC is indeed unavailable on VIA's chipsets, and AMD's 760 chipset remains unavailable, things do not look so good for Athlons at our end. How concerned should we be about the lack of ECC with fast Athlons? This issue may even force us to go back to Pentiums. BTW, some Linux compatibility issues with Athlons were also reported, such as the MTRR setup and even DMA problems with certain ATA drives, but unlike the ECC situation, those compatibility issues are presumably resolvable in software. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From deadline at plogic.com Wed Jun 7 09:54:14 2000 From: deadline at plogic.com (Douglas Eadline) Date: Sun Sep 7 01:00:20 2008 Subject: Athlon + PC133: no ECC? In-Reply-To: <393E6BB1.2557EF4@icase.edu> Message-ID: On Wed, 7 Jun 2000, Josip Loncaric wrote: > Athlons do well on floating point, so we've been looking at building > some Athlon nodes for our cluster, using PC133 memory of course. This > requires VIA's KX133 chipset (now) or KT133 (near future) or AMD's 760 > (more distant future). > > On June 5th, AMD finally released Athlons with full speed on-chip cache > (see http://www.amd.com/news/prodpr/20108.html). These will come in > OEM 'Slot A' packaging for the existing Athlon motherboards (e.g. those > based on VIA's KX133 chipset), but 'Socket A' packaging will be > preferable. The 'Socket A' Athlons will require the KT133 chipset from > VIA (see http://www.viatech.com/news/00kt133launch.htm), at least until > AMD gets its 760 chipset out the door. > As I understand it, the "new Athlons" will only be available as socket A parts to the general public. Slot A parts will only be sold to OEMs and will not work with with the KX chipset in any case. (This was my understanding anyway, perhaps I am wrong) -snip- > > If ECC is indeed unavailable on VIA's chipsets, and AMD's 760 chipset > remains unavailable, things do not look so good for Athlons at our end. > How concerned should we be about the lack of ECC with fast Athlons? > This issue may even force us to go back to Pentiums. BTW, some > Linux compatibility issues with Athlons were also reported, such as the > MTRR setup and even DMA problems with certain ATA drives, but unlike the > ECC situation, those compatibility issues are presumably resolvable in > software. ECC is nice. We are integrating ECC capabilities in our monitoring tool to detect possible problems. Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From SeanWard at msn.com Wed Jun 7 10:15:54 2000 From: SeanWard at msn.com (Sean Ward) Date: Sun Sep 7 01:00:20 2008 Subject: Athlon + PC133: no ECC? References: Message-ID: <003f01bfd0a4$09cff1e0$120010ac@alex1.va.home.com> I'm currently using two Athlon 750 systems based on the VIA KX133 (abit KA7 mobos). It does indeed support ECC RAM, although finding PC133 ECC ram is rather difficult, as the Athlons are very particular about the ram they use. Several name brand rams, including Mushkin and Crucial turned out to be unstable. However, if you can find a good provider of PC133 ECC ram, the ~450 mbps stream figures the KA7 and an Athlon 750 turns in are not bad for commodity parts. As for other items, once you get decent ram in the systems, they are rock solid. My one box has been up for 30 days so far (thats when I built it). As far as the MTTR and DMA drive support, just grab the newest patches from www.linux-ide.org for ULTRA66/100 support, and use a recent kernel revision (such as 2.2.15) to have athlon MTTR support. -Sean ----- Original Message ----- From: Douglas Eadline To: Josip Loncaric Cc: Beowulf mailing list Sent: Wednesday, June 07, 2000 12:54 PM Subject: Re: Athlon + PC133: no ECC? > On Wed, 7 Jun 2000, Josip Loncaric wrote: > > > Athlons do well on floating point, so we've been looking at building > > some Athlon nodes for our cluster, using PC133 memory of course. This > > requires VIA's KX133 chipset (now) or KT133 (near future) or AMD's 760 > > (more distant future). > > > > On June 5th, AMD finally released Athlons with full speed on-chip cache > > (see http://www.amd.com/news/prodpr/20108.html). These will come in > > OEM 'Slot A' packaging for the existing Athlon motherboards (e.g. those > > based on VIA's KX133 chipset), but 'Socket A' packaging will be > > preferable. The 'Socket A' Athlons will require the KT133 chipset from > > VIA (see http://www.viatech.com/news/00kt133launch.htm), at least until > > AMD gets its 760 chipset out the door. > > > > As I understand it, the "new Athlons" will only be available > as socket A parts to the general public. Slot A parts will > only be sold to OEMs and will not work with with the KX chipset > in any case. > (This was my understanding anyway, perhaps I am wrong) > > -snip- > > > > If ECC is indeed unavailable on VIA's chipsets, and AMD's 760 chipset > > remains unavailable, things do not look so good for Athlons at our end. > > How concerned should we be about the lack of ECC with fast Athlons? > > This issue may even force us to go back to Pentiums. BTW, some > > Linux compatibility issues with Athlons were also reported, such as the > > MTRR setup and even DMA problems with certain ATA drives, but unlike the > > ECC situation, those compatibility issues are presumably resolvable in > > software. > > ECC is nice. We are integrating ECC capabilities in our monitoring > tool to detect possible problems. > > Doug > > ------------------------------------------------------------------- > Paralogic, Inc. | PEAK | Voice:+610.814.2800 > 130 Webster Street | PARALLEL | Fax:+610.814.5844 > Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com > ------------------------------------------------------------------- > > > _______________________________________________ > Beowulf mailing list > Beowulf@beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf > From rgb at phy.duke.edu Wed Jun 7 11:00:55 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: Easier said than done In-Reply-To: <200006071720.e57HKpm18031@axe1.med.upenn.edu> Message-ID: On Wed, 7 Jun 2000 axelsen@axe1.med.upenn.edu wrote: > > Dear Robert, > > I have also been trying to post, and failing that, to unsubscribe > and resubscribe so that I could post a message. I've sent messages > to beowulf-admin@beowulf.org about this, but no response. Ah, this is a good thing (really!). I went to the beowulf page at www.beowulf.org and lo, the beowulf list is ALREADY a mailman mediated list. This is infinitely better than it being a majordomo list. SO, here are the revised instructions for getting on (or off) the beowulf list. The "website" of the mailman-mediated beowulf mailing list (not the beowulf website per se, just that of the list) is now: http://www.beowulf.org/mailman/listinfo/beowulf Start by visiting this site with your favorite browser. To subscribe, well, subscribe. To unsubscribe (if you've already been on the list a while and want to get off) scroll down to the "subscribers" section (since you are already subscribed:-). Enter your email address EXACTLY AS IT WAS GIVEN IN YOUR ORIGINAL SUBSCRIPTION and click the "edit options" button. This puts you into your own personal configuration page. Right there before you is the unsubscribe option. HOWEVER, to unsubscribe you need a password. Generally speaking, there is a maintenance script that runs on the mailman server that mails all subscribed persons a reminder of their passwords once a month, but if it has been set up by the list administrators it obviously hasn't been run yet. No matter. Right there underneath the unsubscribe panel is a "Forgotten your Password?" panel. If you click the "Email my password to me" button, you will get the standard password/instruction set mailed to you in a second or so. Go retrieve it in your mail program and you can unsubscribe. BUT, think -- do you really want to? Note all the options below on this page. One of them is to disable mail (while remaining subscribed). You can stay on the list and turn it on and off like a spigot with this option. You can then re-enable list delivery, ask a question, stay online for a week to get all the responses, and when the thread plays out turn the list "off" again. It actually might be more efficient to work this way than to subscribe and unsubscribe over and over again to get a week's worth of traffic when you need it. Then there is digest mode. If you "like" getting the list traffic but just can't handle getting mail every twenty minutes (or whatever the MTBM is), you can try this for a while. In digest mode, you get the entire day's traffic in a single message, once a day, with a header/table of contents. If you see anything in the TOC that interests you, you can read it. Otherwise, hit the ol' "d" button and move on. I digest all the lists I'm not myself active on, which cuts their effective burden on me to near zero. You can even control whether or not you want to receive MIME messages or plaintext only. Stuff like this is what makes mailman a very nice thing indeed. Even those without procmail installed (which can simulate parts of this, poorly) can now control the delivery of list traffic very nicely, although you'll still need procmail to filter out certain prolific contributors (like rgb:-) or the occasional spammer that targets the list if they annoy you... Note that EVERYBODY on the list can (and at their convenience probably should) check out their subscription options and retrieve/save their password information (as well as bookmark the subscription page). The URL above is also the most direct route for subscribing to the beowulf list at this point. rgb > > > |>>> From beowulf-admin@beowulf.org Wed Jun 7 11:28:58 2000 > |>>> > |>>> It isn't working because the beowulf.gsfc.nasa.gov address is defunct > |>>> and obsolete. This is one (of many) reasons to use the proper domain > |>>> address: www.beowulf.org. This is a "portable" entity and has followed > |>>> Don Becker, Erik Hendriks, and many of the rest of the NASA Goddard > |>>> folks to Scyld. I believe that www.beowulf.org is currently actually at > |>>> scyld.com but I'm not sure and the reason for using the domain name is > |>>> that it won't matter. Whereever it really is, that's where you'll go. > |>>> > |>>> So, try sending your unsubscribe message to majordomo@beowulf.org. > > > > When I did this, I got the following back ... > > > > |>>> From MAILER-DAEMON@axe1.med.upenn.edu Wed Jun 7 13:13:13 2000 > |>>> Received: from localhost (localhost) > |>>> by axe1.med.upenn.edu (8.10.0/8.10.1) id e57HDDN17984; > |>>> Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> Date: Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> From: Mail Delivery Subsystem > |>>> Message-Id: <200006071713.e57HDDN17984@axe1.med.upenn.edu> > |>>> To: axelsen@axe1.med.upenn.edu > |>>> MIME-Version: 1.0 > |>>> Content-Type: multipart/report; report-type=delivery-status; > |>>> boundary="e57HDDN17984.960397993/axe1.med.upenn.edu" > |>>> Subject: Returned mail: see transcript for details > |>>> Auto-Submitted: auto-generated (failure) > |>>> > |>>> This is a MIME-encapsulated message > |>>> > |>>> --e57HDDN17984.960397993/axe1.med.upenn.edu > |>>> > |>>> The original message was received at Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> from axelsen@localhost > |>>> > |>>> ----- The following addresses had permanent fatal errors ----- > |>>> majordomo@beowulf.org > |>>> (reason: 550 ... User unknown) > |>>> > |>>> ----- Transcript of session follows ----- > |>>> ... while talking to blueraja.scyld.com.: > |>>> >>> RCPT To: > |>>> <<< 550 ... User unknown > |>>> 550 5.1.1 majordomo@beowulf.org... User unknown > |>>> > |>>> --e57HDDN17984.960397993/axe1.med.upenn.edu > |>>> Content-Type: message/delivery-status > |>>> > |>>> Reporting-MTA: dns; axe1.med.upenn.edu > |>>> Arrival-Date: Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> > |>>> Final-Recipient: RFC822; majordomo@beowulf.org > |>>> Action: failed > |>>> Status: 5.1.1 > |>>> Remote-MTA: DNS; blueraja.scyld.com > |>>> Diagnostic-Code: SMTP; 550 ... User unknown > |>>> Last-Attempt-Date: Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> > |>>> --e57HDDN17984.960397993/axe1.med.upenn.edu > |>>> Content-Type: message/rfc822 > |>>> > |>>> Return-Path: > |>>> Received: (from axelsen@localhost) > |>>> by axe1.med.upenn.edu (8.10.0/8.10.1) id e57HDCO17982 > |>>> for majordomo@beowulf.org; Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> Date: Wed, 7 Jun 2000 13:13:13 -0400 (EDT) > |>>> From: axelsen > |>>> Message-Id: <200006071713.e57HDCO17982@axe1.med.upenn.edu> > |>>> To: majordomo@beowulf.org > |>>> > |>>> > |>>> unsubscribe > |>>> > > ---------------------------------------------------------------------------- > > Paul H. Axelsen MD, Associate Professor .... ..... . . . . > Departments of Pharmacology and . . . .. . .. . > Medicine, Infectious Diseases Section .... .... . . . . . . > University of Pennsylvania School of Medicine . . . .. . .. > Rooms 130/131 John Morgan Bldg . ..... . . . . > 3620 Hamilton Walk > Philadelphia, PA 19104-6084 -------------------------- > 215-898-9238 (office) > Email: axe@pharm.med.upenn.edu 215-898-9766 (lab) > WWW: http://axe2.med.upenn.edu 215-573-2236 (fax) > > ---------------------------------------------------------------------------- > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From axelsen at axe1.med.upenn.edu Wed Jun 7 11:39:55 2000 From: axelsen at axe1.med.upenn.edu (axelsen@axe1.med.upenn.edu) Date: Sun Sep 7 01:00:20 2008 Subject: Scalability of CHARMM on various architectures Message-ID: <200006071839.e57Idtx18234@axe1.med.upenn.edu> We are designing a cluster for which the most important code will be the computational chemistry program, CHARMM. In our preliminary tests on an existing cluster, we have confirmed our expectation that interthread communications will be our first bottleneck. A test run on a typical problem did not scale beyond 4 nodes on a cluster composed of P2-450 processors and fast ethernet interconnections. In contrast, the same code scales almost perfectly up to at least 12 processors on an SGI-Unix SMP machine. CHARMM uses PVM. I would appreciate contact from anyone who has run CHARMM on a cluster and has considered the best way to make this code scale better. From anyone, I would appreciate general guidance on several issues: * If we go with pentium II/III processors, how far is Myrinet likely to permit us to scale these calculations? * With Myrinet, would alpha nodes tend to scale any better than pentium nodes? (a single 500 MHz alpha processor is about 1.6-fold faster than a single 450 MHz P2 on a typical problem, but it is about 3-fold the cost. With this question, I am looking for any additional advantages of alpha to justify this cost) * Are there differences between different pentium chip sets that will impact this problem? * Is there any advantage to buying dual-processor machines, either alpha or pentium, with respect to scaling? Any advantage to dedicating one processor in each dual-box to communications? If we dedicate processors in this way, would both processors have to have the same clock speed? ------------- axe@pharm.med.upenn.edu ----------------- Paul H. Axelsen .... .... . . . . Department of Pharmacology . . . .. . .. . University of Pennsylvania .... ... . . . . . . 3620 Hamilton Walk . . . .. . .. Philadelphia, PA 19104-6084 . .... . . . . ------------------------------------------------------- From glindahl at hpti.com Wed Jun 7 11:56:01 2000 From: glindahl at hpti.com (Greg Lindahl) Date: Sun Sep 7 01:00:20 2008 Subject: Scalability of CHARMM on various architectures In-Reply-To: <200006071839.e57Idtx18234@axe1.med.upenn.edu> Message-ID: <005701bfd0b2$0647fb40$f69cfea9@hptilap.hpti.com> > I would appreciate contact from anyone who has run CHARMM on a cluster > and has considered the best way to make this code scale better. From > anyone, I would appreciate general guidance on several issues: > > * If we go with pentium II/III processors, how far is Myrinet likely to > permit us to scale these calculations? > > * With Myrinet, would alpha nodes tend to scale any better than > pentium nodes? I only have *old* Alpha/Myrinet numbers, but they're considerably better on scaling than Intel ethernet, not that this is a surprise. Michael Crowley was supposed to send me a new copy of the code so I could do a complete comparison, but it hasn't happened yet. http://legion.virginia.edu/centurion/Applications.html -- greg From alan at dasher.wustl.edu Wed Jun 7 12:11:49 2000 From: alan at dasher.wustl.edu (Alan Grossfield) Date: Sun Sep 7 01:00:20 2008 Subject: Scalability of CHARMM on various architectures In-Reply-To: Your message of "Wed, 07 Jun 2000 14:39:55 EDT." <200006071839.e57Idtx18234@axe1.med.upenn.edu> Message-ID: <200006071911.OAA0000473920@dasher.wustl.edu> :We are designing a cluster for which the most important code will be the :computational chemistry program, CHARMM. In our preliminary tests on :an existing cluster, we have confirmed our expectation that interthread :communications will be our first bottleneck. A test run on a typical :problem did not scale beyond 4 nodes on a cluster composed of P2-450 :processors and fast ethernet interconnections. In contrast, the same :code scales almost perfectly up to at least 12 processors on an SGI-Unix :SMP machine. CHARMM uses PVM. Actually, all modern versions of charmm use mpi. The scaling behavior is more or less well-known, though, and quite frustrating. You can do a bit better if you use Josip Loncharic's TCP patches (8 processors is maybe 6x faster than 1 processor for MD using PME with ~10K atoms), but that's about it. : * If we go with pentium II/III processors, how far is Myrinet likely to : permit us to scale these calculations? It should be better -- I know Bernie Brooks' lab is using gigabit ethernet to connect LoBoS (they're one of the primary sites for CHARMM development), but I haven't checked their benchmarks recently. : : * With Myrinet, would alpha nodes tend to scale any better than pentium nodes -:? : Probably, because of the better bus speeds, but I haven't seen data on this (maybe the paralogic guys can comment, if they've done benchmarks). Alan Grossfield ------------------------------------- | New email: alan@dasher.wustl.edu | | Update accordingly | ------------------------------------- From cgreer1 at midsouth.rr.com Wed Jun 7 21:12:36 2000 From: cgreer1 at midsouth.rr.com (Chris Greer) Date: Sun Sep 7 01:00:20 2008 Subject: managing user accounts without NIS References: Message-ID: <393F1D34.6870A997@midsouth.rr.com> We are in the process of migrating away from NIS to an rsync based system. We've got some scripts to help manage a centralized password system but each machine only gets the specific "political groups" of users that are assigned to it. You change password via a web interface. I know this has some people probably cringing, I was myself on the idea for a while, but the web interface allows us to take things a step or two further. We are working on scripts that will also integrate into the Novell/NT side of our Lan so that we truly have a single account system. The PC side is still in the works, and obviously if you are just reading this group for the beowulf aspects this isn't important to you, but I deal not only with a beowulf type setup from an admin perspective, but we also have 100+ UNIX servers of varying flavors not including our 20 node cluster. Chris G. Another option we used at a previous site was a smart script that would gather the password files from all the nodes, figure out if you changed it on any of them, update the password map with the changed password, and then re-push out the new passowrd map to all of the servers. It ran once an hour, so that changes weren't immediate, but were propagated in a reasonable time. Of course if you are using a beowulf for high end computing, you probably don't want to interrupt things every hour just to see if things changed and such. I haven't had experience with kerberos, but it might help you. I don't know if it can be used in place of the password authentication for user accounts though. Victor Ortega wrote: > > I have looked at the archives searching for a good way to manage user > accounts on a beowulf cluster. Some people suggested using rsync, but > my question is, how? rsync is nothing more than an efficient version > of rcp; it doesn't really "synchronize" files--by that I mean that as > soon as (or soon after) one file gets modified, the other files get > updated. In particular, I want my users to be able to change their > passwords or their login shells from any node and have the relevant > files in /etc updated on all nodes, without the users having to do > anything else on their part (like running some "update" script). I > would really rather not write setuid-root wrappers to passwd and chsh, > as I don't want to inadvertently introduce a security hole to my > system. I have considered writing a PAM module, but I don't think > this would cover the chsh case. I also don't want to hack the kernel > or the file system to manage user accounts. Any suggestions? > > Victor > > _______________________________________________ > Beowulf mailing list > Beowulf@beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf From cgreer1 at midsouth.rr.com Wed Jun 7 21:17:39 2000 From: cgreer1 at midsouth.rr.com (Chris Greer) Date: Sun Sep 7 01:00:20 2008 Subject: managing user accounts without NIS References: <39278C80.9CBEF081@supercomputer.org> Message-ID: <393F1E63.576C317B@midsouth.rr.com> rsync -e ssh is the option we use. It's rsync over ssh. dwight wrote: > > Victor Ortega wrote: > > > NIS and NFS are insecure and incur performance penalties. I'm looking > > for better alternatives. My idea of setuid-root wrappers (using rsync > > for distribution of relevant files) already provides a more secure, > > high-performance, high-availability alternative; I just want to make > > sure that there isn't something better out there already, and that I'm > > not overlooking some potential security hole. > > Just using rsync per se might well subject you to a man-in-the-middle > attack, or a spoofing attack. ssh/scp would be a better tool. > > Or just set up Kerberos and simply use it for authentication. > > Best Regards, > > -dwight- > > --------------------------------------------------------------------------- > The Beowulf Mailing list archives can now be searched by visiting: > http://www.supercomputer.org/Search/ > The Calendar of Events in supercomputering can be found at: > http://www.supercomputer.org/calendar/ > > _______________________________________________ > Beowulf mailing list > Beowulf@beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf From covenant at dirac.org Wed Jun 7 23:16:57 2000 From: covenant at dirac.org (Peter Jay Salzman) Date: Sun Sep 7 01:00:20 2008 Subject: managing user accounts without NIS In-Reply-To: <393F1D34.6870A997@midsouth.rr.com> Message-ID: chris, i'm about to configure NIS on our cluster. i'd be very interested in hearing why your group is moving away from NIS. we have a very homogeneous 40 node cluster which is pretty secure at the moment. before continuing with the NIS howto, i'd love to hear your comments. :) pete > Date: Wed, 07 Jun 2000 23:12:36 -0500 > From: Chris Greer > To: Victor Ortega > Cc: Beowulf mailing list > Subject: Re: managing user accounts without NIS > > We are in the process of migrating away from NIS to an rsync based > system. We've got some scripts to help manage a centralized password > system but each machine only gets the specific "political groups" of > users that are assigned to it. You change password via a web interface. > I know this has some people probably cringing, I was myself on the idea > for a while, but the web interface allows us to take things a step > or two further. We are working on scripts that will also integrate > into the Novell/NT side of our Lan so that we truly have a single > account system. The PC side is still in the works, and obviously > if you are just reading this group for the beowulf aspects this > isn't important to you, but I deal not only with a beowulf type > setup from an admin perspective, but we also have 100+ UNIX servers > of varying flavors not including our 20 node cluster. > > Chris G. > > Another option we used at a previous site was a smart script that would > gather the password files from all the nodes, figure out if you changed > it on any of them, update the password map with the changed password, > and then re-push out the new passowrd map to all of the servers. It > ran once an hour, so that changes weren't immediate, but were propagated > in a reasonable time. Of course if you are using a beowulf for high end > computing, you probably don't want to interrupt things every hour just > to see if things changed and such. > > I haven't had experience with kerberos, but it might help you. I don't > know if it can be used in place of the password authentication for user > accounts though. > > > Victor Ortega wrote: > > > > I have looked at the archives searching for a good way to manage user > > accounts on a beowulf cluster. Some people suggested using rsync, but > > my question is, how? rsync is nothing more than an efficient version > > of rcp; it doesn't really "synchronize" files--by that I mean that as > > soon as (or soon after) one file gets modified, the other files get > > updated. In particular, I want my users to be able to change their > > passwords or their login shells from any node and have the relevant > > files in /etc updated on all nodes, without the users having to do > > anything else on their part (like running some "update" script). I > > would really rather not write setuid-root wrappers to passwd and chsh, > > as I don't want to inadvertently introduce a security hole to my > > system. I have considered writing a PAM module, but I don't think > > this would cover the chsh case. I also don't want to hack the kernel > > or the file system to manage user accounts. Any suggestions? > > > > Victor From brua at paralline.com Thu Jun 8 04:29:50 2000 From: brua at paralline.com (Pierre Brua) Date: Sun Sep 7 01:00:20 2008 Subject: [Announce] The jobd Load Balancer References: <20000606233916.E770@ostenfeld.dk> Message-ID: <393F83AE.D23B9255@paralline.com> Jakob ?stergaard wrote: > It is available at http://ostenfeld.dk/~jakob/jobd/ Won't work. The good one is http://www.ostenfeld.dk/~jakob/jobd/ > So if anyone besides me is sick and tired of waiting for those half-hour C++ > compilations, here's a chance to justify a Beowulf for your boss ;) There may be a naming problem with the jobd at http://bond.imm.dtu.dk/jobd/. Maybe this other jobd already solve your problems and is more mature than yours ? Hope it helps, -- Pierre Brua PARALLINE Sarl Parallelism & Linux Solutions 71,av. des Vosges Phone:+33 388 141 740 mailto:brua@paralline.com F-67000 STRASBOURG Fax:+33 388 141 741 http://www.paralline.com From demeler at bioc09.v19.uthscsa.edu Thu Jun 8 07:45:58 2000 From: demeler at bioc09.v19.uthscsa.edu (Borries Demeler) Date: Sun Sep 7 01:00:20 2008 Subject: [Announce] The jobd Load Balancer In-Reply-To: <20000606233916.E770@ostenfeld.dk> from "=?iso-8859-1?Q?Jakob_=D8stergaard?=" at Jun 06, 2000 11:39:16 PM Message-ID: <200006081445.JAA25167@bioc09.v19.uthscsa.edu> > I need GNU Make, so BSD pmake is out. Customs GNU Make doesn't work properly > and isn't maintained. PVM GNU Make may work, but didn't for me, I also have the > feeling that this is too much of a hack to be relied upon. GNU Queue was > close, except that it breaks under load. Generic NQS was too big (too slow for > short jobs, to complex). > > Fixing GNU Queue wasn't an option for me, with all due respect that is by far > the ugliest code I've seen in a long time. > I haven't tried it, but maybe someone else has and can comment: Doesn't Mosix allow for automatic process migration such that you could invoke a compilation with make -j and have it compile in parallel? In any case, I would like to know if this is a feasable route for parallel compilation. Has anybody tried this and how would it compare in speed to something like jobd? Thanks for all responses! -Borries From josip at icase.edu Thu Jun 8 14:58:45 2000 From: josip at icase.edu (Josip Loncaric) Date: Sun Sep 7 01:00:20 2008 Subject: TCP patch for Red Hat 6.2 kernel 2.2.14-12 Message-ID: <39401715.904DF2D@icase.edu> Hello, my TCP patch is now available for Red Hat 6.2 kernel 2.2.14-12: http://www.icase.edu./~josip/tcp-patch-for-2.2.14-12 The web page http://www.icase.edu/coral/LinuxTCP2.html explains what this patch does. Patches for older Red Hat 6.2 kernels are: http://www.icase.edu./~josip/tcp-patch-for-2.2.14-6.0.1 http://www.icase.edu./~josip/tcp-patch-for-2.2.14-5.0 Patches for Linux kernels 2.2.12 and 2.2.13 are: http://www.icase.edu./~josip/tcp-patch-for-2.2.13 http://www.icase.edu./~josip/tcp-patch-for-2.2.12 Please verify the md5 checksum after downloading these files to make certain they did not get corrupted in trasit. Here are the correct md5 checksums: 3fc16704ac99651a18e47b7a3eccc675 *tcp-patch-for-2.2.12 f72305c7800552b2449d8288bc63b975 *tcp-patch-for-2.2.13 4841c4c21a3bc10e5fa5d04cfd6288ac *tcp-patch-for-2.2.14-12 4a2d599a5b07676808fe3c6e1769efea *tcp-patch-for-2.2.14-5.0 4138e1c13fd6c3895e56ac6b97773e40 *tcp-patch-for-2.2.14-6.0.1 To apply the patch, download the patch file to /tmp then do the following: (1) create a new kernel source tree from original kernel files, e.g. cp -a /usr/src/linux-2.2.14 /usr/src/linux-2.2.14-12tcp (2) cd /usr/src/linux-2.2.14-12tcp patch -p1 /proc/sys/net/ipv4/tcp_delack_strategy fi if [ -f /proc/sys/net/ipv4/tcp_faster_timeouts ]; then echo 1 >/proc/sys/net/ipv4/tcp_faster_timeouts fi # # Some generally useful network features # if [ -f /proc/sys/net/core/netdev_max_backlog ]; then echo 1000 >/proc/sys/net/core/netdev_max_backlog fi if [ -f /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts ]; then echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts fi (7) Re-check your work, run /sbin/lilo, then reboot with the new kernel After reboot, files /proc/sys/net/ipv4/tcp_delack_strategy and /proc/sys/net/ipv4/tcp_faster_timeouts should exist and have the values you specified in your rc.local script. All TCP sockets which turn on the TCP_NODELAY socket option (e.g. MPI sockets) will activate the patch, while all other connections should remain unaffected. BTW, tcp_delack_strategy=10 and tcp_faster_timeouts=0 turn off the patch completely. These are the defaults after boot, so the patch will not be active unless these values are changed. Sincerely, Josip -- Dr. Josip Loncaric, Senior Staff Scientist mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From jsquyres at lsc.nd.edu Thu Jun 8 15:12:57 2000 From: jsquyres at lsc.nd.edu (Jeff Squyres) Date: Sun Sep 7 01:00:20 2008 Subject: Problems with MPICH 1.2 and Beowulf/Linux In-Reply-To: <200006061355.PAA26685@pcmenelao.mi.infn.it> Message-ID: On Tue, 6 Jun 2000, Francesco Marini wrote: > Second : I've got some prob compiling ScaLapack with LAM-MPI, gcc > and pgf77 (f77 compiler from Portland Group), it gives a lot of > unresolved symbols regarding MPI. Anyone succeded in compiling them > under same configuration ? I'm afraid that I can't help you with your MPICH problem, but for instructions for compiling ScaLAPACK with LAM, see: http://www.mpi.nd.edu/lam/3rd-party/scalapack.php3 {+} Jeff Squyres {+} squyres@cse.nd.edu {+} Perpetual Obsessive Notre Dame Student Craving Utter Madness {+} "I came to ND for 4 years and ended up staying for a decade" From alex at santafe.edu Thu Jun 8 17:56:16 2000 From: alex at santafe.edu (Alex Lancaster) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe References: Message-ID: >>>>> "RB" == Robert G Brown writes: [...] RB> I believe that we are VERY close to having the beowulf list RB> managed by mailman, which will be a very Good Thing (tm). If/when RB> this finally occurs, you can subscribe and unsubscribe and RB> generally control the flow of list traffic directly from a RB> password protected web interface. This is a very desirable RB> thing... Yep, I agree it is a very desirable thing for most folks. *Provided* one thing: that you can still [un]subscribe via majordomo if you so desire. I'm loathe to start up a web browser just to do mailing list management. I know most people have trouble with majordomo, but when you've been using it for as long as I have, you get used to its quirks, and at least I can manage my mailing lists using `gnus' inside emacs slogged-in to a terminal over a modem line without having to fire up lynx or netscape... Here's hoping majordomo doesn't go away completely... My $0.02. A. -- Alex Lancaster * alex@santafe.edu * www.santafe.edu/~alex * 505 984-8800 x242 Santa Fe Institute (www.santafe.edu) & Swarm Development Group (www.swarm.org) From deadline at plogic.com Fri Jun 9 04:30:39 2000 From: deadline at plogic.com (Douglas Eadline) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe In-Reply-To: Message-ID: FYI: http://www.crn.com/dailies/digest/breakingnews.asp?ArticleID=17350 Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From Gianluca.Cecchi at Italy.ACNielsen.com Fri Jun 9 04:59:20 2000 From: Gianluca.Cecchi at Italy.ACNielsen.com (Cecchi, Gianluca) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe Message-ID: <67288CD5E8C0D211B1B30001FAD4F04AB31274@ACN039MILMSX01> FYI too: http://www.hptechcomp.com/index.asp?sessionid=560396771316524984562&navi=5&a rnr=0600_042_linux Gianluca Cecchi -----Original Message----- From: Douglas Eadline [mailto:deadline@plogic.com] Sent: venerd? 9 giugno 2000 13:31 To: beowulf@beowulf.org Subject: Re: Please help me unsubscribe FYI: http://www.crn.com/dailies/digest/breakingnews.asp?ArticleID=17350 Doug ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- _______________________________________________ Beowulf mailing list Beowulf@beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri Jun 9 05:19:21 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe In-Reply-To: Message-ID: On Fri, 9 Jun 2000, Douglas Eadline wrote: > > FYI: > > http://www.crn.com/dailies/digest/breakingnews.asp?ArticleID=17350 Wolff said he questions whether security issues will be totally solved as Linux scales upward. "I'm always afraid to jump on any fad until I see where it plays," he said. "Linux is a great platform for some things, but IBM jumping in will help." I would have written this as "IBM was once a great company in a lot of ways, and adopting linux across the board will help." rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Jose_Maria_Gonzalez at dell.com Fri Jun 9 07:04:46 2000 From: Jose_Maria_Gonzalez at dell.com (Jose_Maria_Gonzalez@dell.com) Date: Sun Sep 7 01:00:20 2008 Subject: network performance tool Message-ID: <06E1DE556A23D411825A0090273BF1C82AC49E@LIMXMMF204> Hi there, My sincere apologise for this silly question, but does anybody know any either tool,program,script, or native tool in RedHat to measure the real network traffic on my network. I have set up a COW (8 nodes) and I am using MPICH 1.1.2 to run a parallel program. When I run above program the network traffic is very high so I just wonder if it could be a bottleneck. I have used netperf and the latest version of SCMS which displays some network device information too, but it does not really help me much. Any ideas? I would really appreciate any information. Thank you very much. Sincerely Yours, Jose _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ Jose Maria Gonzalez Martin, System Engineer, ASC Lab, Dell Computer Corporation, Castletroy, Limerick, Ireland. > & 353 61 502100 Jose_Maria_Gonzalez@dell.com http://www.dell.com/asc <<...>> _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ From wsb at paralleldata.com Fri Jun 9 10:44:49 2000 From: wsb at paralleldata.com (W Bauske) Date: Sun Sep 7 01:00:20 2008 Subject: Please help me unsubscribe References: Message-ID: <39412D11.B5419830@paralleldata.com> "Robert G. Brown" wrote: > > > > I would have written this as "IBM was once a great company in a lot of > ways, and adopting linux across the board will help." > > I guess the fact businesses spend $60-70 billion on them each year makes them a has been and Linux will add huge amounts more revenue to their pitiful bottom line. Get real... Wes Bauske From rgb at phy.duke.edu Fri Jun 9 11:02:32 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:21 2008 Subject: network performance tool In-Reply-To: <06E1DE556A23D411825A0090273BF1C82AC49E@LIMXMMF204> Message-ID: On Fri, 9 Jun 2000 Jose_Maria_Gonzalez@dell.com wrote: > Hi there, > > My sincere apologise for this silly question, but does anybody know any > either tool,program,script, or native tool in RedHat to measure the real > network traffic on my network. I have set up a COW (8 nodes) and I am using > MPICH 1.1.2 to run a parallel program. When I run above program the network > traffic is very high so I just wonder if it could be a bottleneck. > > I have used netperf and the latest version of SCMS which displays some > network device information too, but it does not really help me much. I'm not sure whether you are asking for tools like netperf or netpipe for measuring your network's capacity or tools for monitoring the real network traffic while you're program is running. Netperf (or netpipe) is about as good as it gets for measuring raw performance -- set up a socket connection and measure what you can jam through it is basically what either one do (with various flags controlling this and that). I believe that both MPICH and PVM come with some examples and tools for measuring effective network performance, but I'm not certain if those tools (at least as described in the postscript guide) made it into the Red Hat powertools mpich RPM. One could presumably retrieve the sources (which is all that you need) from a full MPICH tarball easily enough. PVM's examples do include timing programs with the RH installation (on the main Red Hat CD these days). If you're looking for tools to measure the packet flow during dynamical operation, there are BOTH tracing tools (upshot/nupshot for MPICH and more, xpvm for PVM), some commercial tools (check the MPI/PVM websites) and a variety of tools for monitoring raw network loads on independent computers (e.g. procmeters of various sorts). There are remarkably few and poor load meters that come with RH, for whatever reason, even including the powertools cd. You can always try shopping on rufus http://rufus.w3.org/linux/RPM/ which is what I do when I want a program for some task. This server has 100+ GB of RPM's in a massive, cross-referenced database. You should likely shop the beowulf underground site as well, as they may have other tools that have been registered that are more beowulf specific. Finally, there is procstatd, which comes with a simple perl-tk tool and has a template web interface in either of its source packages (it's hard to package the web interface because webserver setups vary so widely). Eventually I hope to add a few other interfaces (or hope that somebody else does for me, being lazy). procstatd has a simple interface you can use to build your own GUI or tty tool(s) from any scripting or programming language. As it is currently set up, it lets you monitor up to four ethernet interfaces (virtually every aspect of the interface as recorded in /proc) on a whole network of hosts simultaneously. The provided perlTk tool (watchman) is adequate for monitoring 8-32 hosts at once (depending on the resolution of your display) and of course you can have multiple instances of watchman running to do more. If I ever have time I'll build a scrolling display, probably abandoning perlTk for Gtk and C. You can find the current rpm, a source rpm, and a ready-to-make tarball, on http://www.phy.duke.edu/brahma (look for the procstatd links, which point to symlinks to the current release). You will need to get and install a perlTk rpm on top of your existing perl -- one that should work is provided on the brahma page but isn't guaranteed to be particularly current. Hope this helps. If it doesn't, be a bit more specific about what you are looking for. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Fri Jun 9 11:39:36 2000 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun Sep 7 01:00:21 2008 Subject: Please help me unsubscribe In-Reply-To: <39412D11.B5419830@paralleldata.com> Message-ID: On Fri, 9 Jun 2000, W Bauske wrote: > "Robert G. Brown" wrote: > > > > > > > > I would have written this as "IBM was once a great company in a lot of > > ways, and adopting linux across the board will help." > > > > > > I guess the fact businesses spend $60-70 billion on them each year makes > them a has been and Linux will add huge amounts more revenue to their > pitiful bottom line. Get real... Whoa, I was just kidding, in a wry sort of way. That'll teach ME to be terse;-) To explain my remark further, I really do think that Linux is an essential part of IBM's strategy for maintaining those huge revenues. OS/2 tanked after they were basically betrayed by Microsoft, and it is difficult and expensive for IBM to maintain their own "private" systems group with non-mainstream operating systems that are not compatible or portable across all their various platforms, however lucrative the fish they have shot in these particular barrels have been in the past. Ask DEC, Honeywell, etc (long list) just how long a multibillion dollar mostly-hardware company lasts when their software is too nonstandard or their price point too non-competitive. I love IBM. I bought IBM stock back when I was nine or ten. Learned to type on an IBM Selectric typewriter. Learned to program on IBM mainframes with IBM fortran IV and HASP on IBM card punches and IBM card readers, programmed mastermind in APL on an IBM 5100 (still have to program on an archaic tape somewhere), owned a 64K motherboard IBM PC (and am still kicking myself for donating the aged husk of a chassis to a school as it would have been a kick to refill it with modern motherboards and use it as a desktop). I think of them as the huge, immensely rich and powerful, multinational monopoly with a heart (just kidding again!). Seriously, IBM has succeded in reinventing themselves a number of times where their competitors have failed and fallen by various waysides. I'm very pleased that they've overwhelmingly adopted linux and that the adoption appears to be migrating quite agressively from their small computer and netfinity business into their other small mainframe and supercomputing operations. I live not far from their Research Triangle operation, which used to be home to OS/2 development -- folks out there are often militant about linux, these days (as is a lot of their netfinity group). On the other hand, IBM does nothing except in the hope of making money (while providing good services, of course) and aren't moving to linux out of dreams of revenge on Microsoft or because their other OS's aren't currently profitable. They're betting on the horse they think will win the race and preserve those lovely revenues, while (I'm sure) anticipating that in the long run they can eventually save a lot of moneyby NOT having a multiple competing incompatible mainline software operations. IBM's hardware has always been excellent, but their software over the years has not infrequently left something to be desired and some of it would never have sold at all if their hardware customers had had a choice. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From wsb at paralleldata.com Fri Jun 9 13:20:55 2000 From: wsb at paralleldata.com (W Bauske) Date: Sun Sep 7 01:00:21 2008 Subject: Please help me unsubscribe References: Message-ID: <394151A7.32FD3273@paralleldata.com> Erik Mullinix wrote: > > I read this post for educated information. > The real your looking for is that indeed IBM has the sense to adopt what is becoming comonplace before it looses it's edge. Thier AS400 arena has realy helped out in the data storage and processing arena.. However even they admit the mini computers are a bit pricy.. and the Beowulf is a great arena to step into especialy bringing thier experiance with parallel platform design into the mix. > My point was IBM is quite healthy as is. Linux is an interesting horse and I don't discount it doing well. I use Linux on Alphas myself and I see no reason not to have IBM get linux working on mainframes or whatever they like. I seriously doubt many people knew that an IBM mainframe could run vast numbers of Linux images. They have some really good technology and many good qualtities that businesses appreciate and vote on using their pocketbooks. As to pricing, you just have to know how to get the deal done. Enough about IBM, now back to beowulfs please. Wes > Erik Mullinix > > >>> "W Bauske" 06/09/00 01:44PM >>> > "Robert G. Brown" wrote: > > > > > > > > I would have written this as "IBM was once a great company in a lot of > > ways, and adopting linux across the board will help." > > > > > > I guess the fact businesses spend $60-70 billion on them each year makes > them a has been and Linux will add huge amounts more revenue to their > pitiful bottom line. Get real... > > Wes Bauske > > _______________________________________________ > Beowulf mailing list > Beowulf@beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------------------------------------------------------------------------- > > TEXT.htmName: TEXT.htm > Type: Plain Text (text/plain) From david.lombard at mscsoftware.com Fri Jun 9 13:47:20 2000 From: david.lombard at mscsoftware.com (David Lombard) Date: Sun Sep 7 01:00:21 2008 Subject: Please help me unsubscribe References: Message-ID: <394157D8.C00B2EAF@mscsoftware.com> "Robert G. Brown" wrote: > > On the other hand, IBM does nothing except in the hope of making money > (while providing good services, of course) ... I'm thinking this is a common motivation among *all* non-profit orgs... ;^) -- David N. Lombard MSC.Software From wsb at paralleldata.com Fri Jun 9 13:55:31 2000 From: wsb at paralleldata.com (W Bauske) Date: Sun Sep 7 01:00:21 2008 Subject: IBM (was Re: Please help me unsubscribe) References: Message-ID: <394159C3.E9A61FEA@paralleldata.com> "Robert G. Brown" wrote: > > On Fri, 9 Jun 2000, W Bauske wrote: > > > "Robert G. Brown" wrote: > > > > > > > > > > > > I would have written this as "IBM was once a great company in a lot of > > > ways, and adopting linux across the board will help." > > > > > > > > > > I guess the fact businesses spend $60-70 billion on them each year makes > > them a has been and Linux will add huge amounts more revenue to their > > pitiful bottom line. Get real... > > Whoa, I was just kidding, in a wry sort of way. That'll teach ME to be > terse;-) > OK. Just pointing out how your comment sounds to those folks who look at computing from a business perspective. > To explain my remark further, I really do think that Linux is an > essential part of IBM's strategy for maintaining those huge revenues. > OS/2 tanked after they were basically betrayed by Microsoft, and it is > difficult and expensive for I