From rgb at phy.duke.edu Mon Apr 1 05:27:43 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:17 2010 Subject: DHCP Help In-Reply-To: Message-ID: On Sat, 30 Mar 2002, Adrian Garcia Garcia wrote: > Hello everybody, I'm a beginner and I have been having problems with my > dhcp server, I cant assign the ip's to the clients, I dont know exactly > if the server is not working or the client. I am working with Red Hat 7.1 > and my dhcp client is dhcpcd because I tried with pump but It was not > work. Please, Please, can anybody give some halp, what can I do???? Sorry > for my poor english, In fact I speak spanish. Pleas help. Thanks a lot. > > ________________________________________________________________________________ > Join the world?s largest e-mail service with MSN Hotmail. Click Here > _______________________________________________ Beowulf mailing list, > Beowulf@beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Sr Garcia, Por favor, encuentre un ejemplo de mi configuracion que yo uso a mi casa por mi beowulf privada. Esto es por dhcpd, en /etc/dhcpd.conf, y es por una red interna privada con IP numeros 192.168. Nota bien los tres secciones. Esto funciona bien por computadores que boot en Windows o Linux o otro con clientes dhcp -- algunos de mi computadores a casa boot ambos. Nota tambien los direcciones: range 192.168.1.192 192.168.1.224; solamente estos estan usado para computadores no conocido por el servidor con numeros ethernet registrado y direcciones staticos. Espero que esto se ayuda on poquito. Y desculpame de mi Espanol malo; es (estoy seguro) peor que su Ingles, pero yo necesito la practica. rgb ############################################################################## # # /etc/dhcpd.conf - configuration file for our DHCP/BOOTP server # ########################################################### # Global Parameters ########################################################### option domain-name "rgb.private.net"; option domain-name-servers 152.3.250.1; option subnet-mask 255.255.255.0; option broadcast-address 192.168.1.255; use-host-decl-names on; ########################################################### # Subnets ########################################################### shared-network RGB { subnet 192.168.1.0 netmask 255.255.255.0 { range 192.168.1.192 192.168.1.224; default-lease-time 43200; max-lease-time 86400; option routers 192.168.1.1; option domain-name "rgb.private.net"; option domain-name-servers 152.3.250.1; option broadcast-address 192.168.1.255; option subnet-mask 255.255.255.0; } } ########################################################### # Static IP addresses managed by DHCP server ########################################################### # Personal Computers (MSDOS/Win-3.x/WfW/Win-95/Win-NT/MacOS) #host hostname { # hardware ethernet xx:xx:xx:xx:xx:xx; # fixed-address 152.3.xxx.xxx; # option host-name hostname; # option routers 152.3.xxx.250; #} # UNIX systems #host hostname { # hardware ethernet xx:xx:xx:xx:xx:xx; # fixed-address 152.3.xxx.xxx; # option host-name hostname; # option routers 152.3.xxx.250; #} # adam future gateway redux? 300MHz Celeron host adam { hardware ethernet 00:20:18:58:27:1a; fixed-address 192.168.1.1; next-server 192.168.1.131; option domain-name "rgb.private.net"; option host-name "adam"; } # caine (Linux/Windows workstation) # (Linux/Windows workstation) host tyrial { hardware ethernet 00:a0:cc:59:45:9b; fixed-address 192.168.1.134; next-server 192.168.1.131; option routers 192.168.1.1; option domain-name "rgb.private.net"; option host-name "tyrial"; } etc... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From eugen at leitl.org Mon Apr 1 13:19:31 2002 From: eugen at leitl.org (Eugen Leitl) Date: Tue Mar 16 01:02:17 2010 Subject: FY;) Google's secret clustering technology Message-ID: http://www.google.com/technology/pigeonrank.html From emiller at techskills.com Mon Apr 1 17:41:18 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:17 2010 Subject: Syntax for executing Message-ID: Hey all, got a five-node cluster up running 27-z9, preparing for a 30 node cluster. - What is the syntax to run an executable in the cluster environment? For example, I run NP=5 mpi-mandel to run the test fractal program. How would I execute say, SETI, using the cluster? Assume that the SETI executable is in the PATH. Also, the older version of Scyld had some test code in /usr/mpi-beowulf/*. Is that gone? - What would cause all but one of the processors to show usage in beostatus? The node shows "up" in every other way: hardware identical, memory, swap, network, etc....just when I run something, only that one processor on one node shows no % usage. -ETM .~. /V\ // \\ /( )\ ^'~'^ From hanzl at noel.feld.cvut.cz Tue Apr 2 00:28:38 2002 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Tue Mar 16 01:02:17 2010 Subject: Newest RPM's? In-Reply-To: <002c01c1d933$ec02a3c0$c31fa6ac@xp> References: <1017612125.19271.20.camel@vhwalke.mathsci.usna.edu> <002c01c1d933$ec02a3c0$c31fa6ac@xp> Message-ID: <20020402102838A.hanzl@unknown-domain> > I am using RH7.2 on my master node and would like to RPM the latest stable > version of Scyld, instead of using the CD (I have 27Bz-7, based on RH6.2) I am not sure there is RH7.2 based Scyld system already available, thought it is quite possible I missed something. You may consider Clustermatic - it is similar to Scyld but smaller (and therefore easier), rpm install on top of RH7.2 works great and you may download iso images if you want. http://www.clustermatic.org See my previous post "Clustermatic: smooth upgrade to new version" for rpm-install microhowto: http://www.beowulf.org/pipermail/beowulf/2002-March/002969.html HTH Vaclav Hanzl From daniel.kidger at quadrics.com Tue Apr 2 01:10:20 2002 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Tue Mar 16 01:02:17 2010 Subject: FY;) Google's secret clustering technology References: Message-ID: <002601c1da27$10987090$0100a8c0@spot> ----- Original Message ----- Eugen Leit" wrote: >To: >Sent: Monday, April 01, 2002 10:19 PM >Subject: FY;) Google's secret clustering technology > > http://www.google.com/technology/pigeonrank.html > This is a very interesting article. However there is no mention of them using the Quadrics Interconnect, nor that matter Myrinet, Scali or even plain ethernet. I can only assume the whole cluster is run by just using cereal lines. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From hanzl at noel.feld.cvut.cz Tue Apr 2 01:51:11 2002 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Tue Mar 16 01:02:17 2010 Subject: 27z-9 ? (Was: Syntax for executing ) In-Reply-To: References: Message-ID: <20020402115111Y.hanzl@unknown-domain> > Hey all, got a five-node cluster up running 27-z9 I can see just a few files at ftp://ftp.scyld.com/pub/beowulf/27z-9/ Please can anybody comment on status of 27z-9 ? Thanks Vaclav From rbw at ahpcrc.org Tue Apr 2 07:48:50 2002 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue Mar 16 01:02:17 2010 Subject: Uptime data/studies/anecdotes ... ? Message-ID: <200204021548.g32Fmod13276@mycroft.ahpcrc.org> All, What information is available on typical uptimes of large-scale, clusters ... say greater than 256 processors and running a multi-user workload. What gains do single-point-of-administration tools like SCYLD provide? Clearly, there are a great number of things one can do to maximize uptime/utilization (not the same thing really). What are the essentials from the lists point of view? If a good figure is, say, 80% utilization over a 8760 hour year today, what will this number be in three years? Annual utilization for the 1088 processor T3E we run here is about 95%. How long until a similarly sized cluster typically yields the same value? Regards, rbw #--------------------------------------------------- # # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com # #--------------------------------------------------- # "What you can do, or dream you can, begin it; # Boldness has genius, power, and magic in it." # -Goethe #--------------------------------------------------- # "Without mystery, there can be no authority." # -Charles DeGaulle #--------------------------------------------------- # "Why waste time learning when ignornace is # instantaneous?" -Thomas Hobbes #--------------------------------------------------- From roger at ERC.MsState.Edu Tue Apr 2 08:15:00 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue Mar 16 01:02:17 2010 Subject: Uptime data/studies/anecdotes ... ? In-Reply-To: <200204021548.g32Fmod13276@mycroft.ahpcrc.org> Message-ID: We currently run an average of about 75% utilization on our 586 processor (293 node) cluster. We probably have about one node per week crash and hang for various reasons. We have occasional problems with memory leaks or PBS hangups which require large scale reboots of the cluster. (Actually, PBS just died as I'm typing this, but our pbs heartbeat script should restart it automatically in a few minutes). I'd say we have to do a full reboot of the cluster about every 3-4 months. For a bunch of PC hardware running a free OS, this seems like a pretty good number to me. It's not in the same class as our Sun servers (nor even our SGIs!), but then, none of those systems are this large, either. On Tue, 2 Apr 2002, Richard Walsh wrote: > > All, > > What information is available on typical uptimes > of large-scale, clusters ... say greater than 256 > processors and running a multi-user workload. What > gains do single-point-of-administration tools like > SCYLD provide? Clearly, there are a great number > of things one can do to maximize uptime/utilization > (not the same thing really). What are the essentials > from the lists point of view? > > If a good figure is, say, 80% utilization over a > 8760 hour year today, what will this number be in > three years? Annual utilization for the 1088 processor > T3E we run here is about 95%. How long until a similarly > sized cluster typically yields the same value? > > Regards, > > rbw > > #--------------------------------------------------- > # > # Richard Walsh > # Project Manager, Cluster Computing, Computational > # Chemistry and Finance > # netASPx, Inc. > # 1200 Washington Ave. So. > # Minneapolis, MN 55415 > # VOX: 612-337-3467 > # FAX: 612-337-3400 > # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com > # > #--------------------------------------------------- > # "What you can do, or dream you can, begin it; > # Boldness has genius, power, and magic in it." > # -Goethe > #--------------------------------------------------- > # "Without mystery, there can be no authority." > # -Charles DeGaulle > #--------------------------------------------------- > # "Why waste time learning when ignornace is > # instantaneous?" -Thomas Hobbes > #--------------------------------------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From rbw at ahpcrc.org Tue Apr 2 10:24:22 2002 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue Mar 16 01:02:17 2010 Subject: Uptime data/studies/anecdotes ... ? Message-ID: <200204021824.g32IOMa14409@mycroft.ahpcrc.org> On Tue, 2 Apr 2002 10:15:00 Roger Smith wrote: >We currently run an average of about 75% utilization on our 586 processor >(293 node) cluster. We probably have about one node per week crash and >hang for various reasons. > >We have occasional problems with memory leaks or PBS hangups which require >large scale reboots of the cluster. (Actually, PBS just died as I'm typing >this, but our pbs heartbeat script should restart it automatically in a >few minutes). I'd say we have to do a full reboot of the cluster about >every 3-4 months. >For a bunch of PC hardware running a free OS, this seems like a pretty >good number to me. It's not in the same class as our Sun servers (nor >even our SGIs!), but then, none of those systems are this large, either. Thanks for the estimate. Do you use SCYLD or another pseudo-single-system- image tool? I assume that 75% is a steady state number ... how long did it take your group to reach that state? If a full reboot is required only every 3-4 months then is singel node failure your main source of cycle loss? Or are other things like inefficient scheduling and lack of check-point/restart, etc. important? 75% does seem like a reasonably good number. rbw From roger at ERC.MsState.Edu Tue Apr 2 10:46:07 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue Mar 16 01:02:17 2010 Subject: Uptime data/studies/anecdotes ... ? In-Reply-To: <200204021824.g32IOMa14409@mycroft.ahpcrc.org> Message-ID: On Tue, 2 Apr 2002, Richard Walsh wrote: > Thanks for the estimate. Do you use SCYLD or another pseudo-single-system- > image tool? Nope, We use RH 7.2, PBS, and MPI/Pro, MPICH, and LAM MPI. > I assume that 75% is a steady state number ... how long did > it take your group to reach that state? Our users are a bit "bursty". The cluster rarely drops below 50%. Looking back through my records, it hasn't been below 140 processors in use in several weeks, and has spent most of its time with 400+ in use. As we near project deadlines, we often have jobs waiting in the queue. I've seen as many as 1100 processors in use, or requested and waiting. When we upgraded from 324 to 586 processors, the users were banging on my door wanting to know when the new nodes were available. Within an hour or releasing the new nodes (and without any notification to the users), they were already using over 500 processors. I'm currently working on an expansion to about 1036 processors, and I fully expect to see it slammed within a few days of release. > If a full reboot is required > only every 3-4 months then is singel node failure your main source of > cycle loss? Or are other things like inefficient scheduling and lack of > check-point/restart, etc. important? PBS is our leading cause of cycle loss. We now run a cron job on the headnode that checks every 15 minutes to see if the PBS daemons have died, and if so, it automatically restarts them. About 75% of the time that I have a node fail to accept jobs, it is because its pbs_mom has died, not because there is anything wrong with the node. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From gabriel.weinstock at dnamerican.com Tue Apr 2 13:00:03 2002 From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock) Date: Tue Mar 16 01:02:17 2010 Subject: Maui scheduler error Message-ID: <21193187508190@DNAMERICAN.COM> Hi, We are having trouble getting the Maui scheduler to work. We have no problem starting the server/scheduler and drone programs. (For testing, we are not starting the drone on every node in the cluster; is this a problem?) The set up is 'mauictl start' on the head node, followed by 'nodectl start' on 2 compute nodes. 'showq' works correctly. The log files show all three nodes processing correctly; right up until a user submits a job, at which point the server node spits out the following message to its log file and exits: - log file - 4/02 15:21:25 (Sched.java:299) iteration 36 04/02 15:21:25 (Wiki.java:392) Wiki loop event 04/02 15:21:25 (BackfillMod.java:147) backfill scheduling 04/02 15:21:25 (ReservationsMod.java:105) handling reservations 04/02 15:21:25 (JobChecker.java:220) checkpointing... 04/02 15:21:25 (Sched.java:311) scheduling interval took 0.016 seconds 04/02 15:21:29 (BasicWorker.java:430) mauisubmit 04/02 15:21:29 (MauiSubmit.java:96) mauisubmit 04/02 15:21:29 (MauiSubmit.java:128) LRM cmdfile 04/02 15:21:29 (CMD.java:280) Removing envvar HOSTNAME 04/02 15:21:29 (CMD.java:280) Removing envvar MACHTYPE 04/02 15:21:29 (CMD.java:280) Removing envvar HOSTTYPE 04/02 15:21:29 (CMD.java:280) Removing envvar OSTYPE 04/02 15:21:29 (CMD.java:280) Removing envvar _ 04/02 15:21:29 (MauiMySQL.java:268) Changing romeda's job account to no-account 04/02 15:21:29 (MauiSubmit.java:199) checking job on RM=Node 04/02 15:21:29 (BasicPolicy.java:111) pre debiting bank for 7200 slotsecs for job=romeda:1017778889:0 04/02 15:21:29 (MauiXMLHandlerImpl.java:284) FATAL: org.xml.sax.SAXParseException: Illegal XML character: �. 04/02 15:21:29 (BasicWorker.java:244) Ignoring SAX freak-out: Illegal XML character: �. 04/02 15:21:30 (Sched.java:326) ---------------------------------------------------- 04/02 15:21:30 (Sched.java:299) iteration 37 04/02 15:21:30 (Wiki.java:392) Wiki loop event 04/02 15:21:30 (BackfillMod.java:147) backfill scheduling 04/02 15:21:30 (BackfillMod.java:164) contemplating job romeda:1017778889:0 04/02 15:21:30 (Sched.java:330) java.lang.ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException at unm.maui.rm.SimpleMatcher.getNodeAvailSlotIDs(SimpleMatcher.java:563) at unm.maui.rm.SimpleMatcher.getNodesSlots(SimpleMatcher.java:377) at unm.maui.rm.SimpleMatcher.getNodesSlots(SimpleMatcher.java:256) at unm.maui.rm.SimpleMatcher.findNodesSlots(SimpleMatcher.java:79) at unm.maui.sched.BackfillMod.makeReservation(BackfillMod.java:240) at unm.maui.sched.BackfillMod.event(BackfillMod.java:169) at unm.maui.sched.Sched.fireLoop(Sched.java:922) at unm.maui.sched.Sched.run(Sched.java:306) at java.lang.Thread.run(Thread.java:484) 04/02 15:21:30 (Sched.java:347) checkpointing scheduler. 04/02 15:21:30 (Wiki.java:385) shutting down RM=Node 04/02 15:21:30 (Sched.java:359) scheduler finished - end - If I try to restart the server daemon after this crash, it immediately exits again with the message in iteration 37 (ArrayIndexOutOfBoundsException.) The only way to restart the daemon is to create the mySQL database again (wiping whatever was in it.) Here is my .cmd file, which I run with 'mauisubmit maui_job.cmd': - maui_job.cmd - IWD == "/tmp" WCLimit == 3600 Account == "WWGD190053X" Tasks == 2 Nodes == 2 TaskPerNode == 1 Arch == x86 OS == Linux JobType == "mpi.ch_gm" Exec == "/export/mauisched-1.2/bin/runmpi_gm" Args == "/export/home/romeda/cpi" Output == "/tmp/$(MAUI_JOB_USER)2x3gm$(MAUI_JOB_ID).out" Error == "/tmp/$(MAUI_JOB_USER)2x3gm$(MAUI_JOB_ID).err" Log == "/tmp/$(MAUI_JOB_USER)2x3gm$(MAUI_JOB_ID).log" Input == "/dev/null" - end - Is the XML error related to the out of bounds array exception? We compiled with the Sun jdk 1.3.1-02 and JavaCC 2.1. There is no information about this error on the web. Any help would be greatly appreciated. Thanks, Gabe From emiller at techskills.com Tue Apr 2 13:34:27 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:17 2010 Subject: Syntax for executing In-Reply-To: Message-ID: disregard. SETI is not available in an MPI-enabled format. My apologies. Can anyone direct me to an URL that lists some available programs that I can execute on the cluster? Preferably something with a continuous (looping?) graphical output (e.g. SETI). This is a display for students to visualize and promote educational programs for Linux, like a museum peice. >>>>>>>>>>>>>>>>>>>>>>>>>> Hey all, got a five-node cluster up running 27-z9, preparing for a 30 node cluster. - What is the syntax to run an executable in the cluster environment? For example, I run NP=5 mpi-mandel to run the test fractal program. How would I execute say, SETI, using the cluster? Assume that the SETI executable is in the PATH. Also, the older version of Scyld had some test code in /usr/mpi-beowulf/*. Is that gone? - What would cause all but one of the processors to show usage in beostatus? The node shows "up" in every other way: hardware identical, memory, swap, network, etc....just when I run something, only that one processor on one node shows no % usage. -ETM .~. /V\ // \\ /( )\ ^'~'^ From gropp at mcs.anl.gov Tue Apr 2 13:48:19 2002 From: gropp at mcs.anl.gov (William Gropp) Date: Tue Mar 16 01:02:17 2010 Subject: Syntax for executing In-Reply-To: References: Message-ID: <5.1.0.14.2.20020402154730.01bdc3b8@localhost> At 04:34 PM 4/2/2002 -0500, Eric Miller wrote: >disregard. SETI is not available in an MPI-enabled format. > >My apologies. Can anyone direct me to an URL that lists some available >programs that I can execute on the cluster? Preferably something with a >continuous (looping?) graphical output (e.g. SETI). This is a display for >students to visualize and promote educational programs for Linux, like a >museum peice. pmandel in the MPICH distribution has a -loop option for just this purpose. See the README in mpich/mpe/contrib/mandel . Bill From aby_sinha at yahoo.com Tue Apr 2 19:19:42 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Tue Mar 16 01:02:17 2010 Subject: apic problems Message-ID: <3CAA74CE.2070109@yahoo.com> Hi All I am using dual processors with a Tyan Tiger 2505 T board and having so many problems with the APIC on the machine . I have looked around on the newsgroups and mailing list..with no hints... Does the return code in the end of the message 00(02) > APIC error on CPU0: 00(08) > APIC error on CPU0: 08(08) > APIC error on CPU0: 08(08) > APIC error on CPU1: 02(02) > APIC error on CPU1: 02(08) mean that this particular board i m using is crappy or the whole 2505T series cannot handle these kinds of requests I am pasting the dmesg from the server below Linux version 2.4.7-10smp ( bhcompile@stripples.devel.redhat.com ) (gcc > version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Sep 6 > 17:09:31 EDT 2001 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) > BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) > BIOS-e820: 000000007fff0000 - 000000007fff3000 (ACPI NVS) > BIOS-e820: 000000007fff3000 - 0000000080000000 (ACPI data) > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) > Scanning bios EBDA for MXT signature > 1151MB HIGHMEM available. > found SMP MP-table at 000f5660 > hm, page 000f5000 reserved twice. > hm, page 000f6000 reserved twice. > hm, page 000f1000 reserved twice. > hm, page 000f2000 reserved twice. > On node 0 totalpages: 524272 > zone(0): 4096 pages. > zone(1): 225280 pages. > zone(2): 294896 pages. > Intel MultiProcessor Specification v1.4 > Virtual Wire compatibility mode. > OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000 > Processor #0 Pentium(tm) Pro APIC version 17 > Processor #1 Pentium(tm) Pro APIC version 17 > I/O APIC #2 Version 17 at 0xFEC00000. > Processors: 2 > Kernel command line: ro root=/dev/hda2 > Initializing CPU#0 > Detected 864.238 MHz processor. > Console: colour VGA+ 80x25 > Calibrating delay loop... 1723.59 BogoMIPS > Memory: 2056920k/2097088k available (1396k kernel code, 37736k reserved, > 102k data, 240k init, 1179584k highmem) > Dentry-cache hash table entries: 262144 (order: 9, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > Mount-cache hash table entries: 32768 (order: 6, 262144 bytes) > Buffer-cache hash table entries: 131072 (order: 7, 524288 bytes) > Page-cache hash table entries: 524288 (order: 10, 4194304 bytes) > CPU: Before vendor init, caps: 0387fbff 00000000 00000000, vendor = 0 > CPU: L1 I cache: 16K, L1 D cache: 16K > CPU: L2 cache: 256K > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > CPU: After vendor init, caps: 0387fbff 00000000 00000000 00000000 > CPU serial number disabled. > CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 > CPU: Common caps: 0383fbff 00000000 00000000 00000000 > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... done. > Checking 'hlt' instruction... OK. > POSIX conformance testing by UNIFIX > mtrr: v1.40 (20010327) Richard Gooch ( rgooch@atnf.csiro.au ) > mtrr: detected mtrr type: Intel > CPU: Before vendor init, caps: 0383fbff 00000000 00000000, vendor = 0 > CPU: L1 I cache: 16K, L1 D cache: 16K > CPU: L2 cache: 256K > Intel machine check reporting enabled on CPU#0. > CPU: After vendor init, caps: 0383fbff 00000000 00000000 00000000 > CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 > CPU: Common caps: 0383fbff 00000000 00000000 00000000 > CPU0: Intel Pentium III (Coppermine) stepping 0a > per-CPU timeslice cutoff: 730.77 usecs. > enabled ExtINT on CPU#0 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Booting processor 1/1 eip 2000 > Initializing CPU#1 > masked ExtINT on CPU#1 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Calibrating delay loop... 1723.59 BogoMIPS > CPU: Before vendor init, caps: 0387fbff 00000000 00000000, vendor = 0 > CPU: L1 I cache: 16K, L1 D cache: 16K > CPU: L2 cache: 256K > Intel machine check reporting enabled on CPU#1. > CPU: After vendor init, caps: 0387fbff 00000000 00000000 00000000 > CPU serial number disabled. > CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 > CPU: Common caps: 0383fbff 00000000 00000000 00000000 > CPU1: Intel Pentium III (Coppermine) stepping 0a > Total of 2 processors activated (3447.19 BogoMIPS). > ENABLING IO-APIC IRQs > ...changing IO-APIC physical APIC ID to 2 ... ok. > init IO_APIC IRQs > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 > not connected. > ..TIMER: vector=0x31 pin1=2 pin2=0 > number of MP IRQ sources: 19. > number of IO-APIC #2 registers: 24. > testing the IO APIC....................... > > IO APIC #2...... > .... register #00: 02000000 > ....... : physical APIC id: 02 > .... register #01: 00178011 > ....... : max redirection entries: 0017 > ....... : IO APIC version: 0011 > WARNING: unexpected IO-APIC, please mail > to linux-smp@vger.kernel.org > .... register #02: 00000000 > ....... : arbitration: 00 > .... IRQ redirection table: > NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: > 00 000 00 1 0 0 0 0 0 0 00 > 01 003 03 0 0 0 0 0 1 1 39 > 02 003 03 0 0 0 0 0 1 1 31 > 03 003 03 0 0 0 0 0 1 1 41 > 04 003 03 0 0 0 0 0 1 1 49 > 05 003 03 0 0 0 0 0 1 1 51 > 06 003 03 0 0 0 0 0 1 1 59 > 07 003 03 0 0 0 0 0 1 1 61 > 08 003 03 0 0 0 0 0 1 1 69 > 09 003 03 0 0 0 0 0 1 1 71 > 0a 003 03 1 1 0 1 0 1 1 79 > 0b 003 03 1 1 0 1 0 1 1 81 > 0c 003 03 1 1 0 1 0 1 1 89 > 0d 003 03 0 0 0 0 0 1 1 91 > 0e 003 03 0 0 0 0 0 1 1 99 > 0f 003 03 0 0 0 0 0 1 1 A1 > 10 000 00 1 0 0 0 0 0 0 00 > 11 000 00 1 0 0 0 0 0 0 00 > 12 000 00 1 0 0 0 0 0 0 00 > 13 000 00 1 0 0 0 0 0 0 00 > 14 000 00 1 0 0 0 0 0 0 00 > 15 000 00 1 0 0 0 0 0 0 00 > 16 000 00 1 0 0 0 0 0 0 00 > 17 000 00 1 0 0 0 0 0 0 00 > IRQ to pin mappings: > IRQ0 -> 0:2 > IRQ1 -> 0:1 > IRQ3 -> 0:3 > IRQ4 -> 0:4 > IRQ5 -> 0:5 > IRQ6 -> 0:6 > IRQ7 -> 0:7 > IRQ8 -> 0:8 > IRQ9 -> 0:9 > IRQ10 -> 0:10 > IRQ11 -> 0:11 > IRQ12 -> 0:12 > IRQ13 -> 0:13 > IRQ14 -> 0:14 > IRQ15 -> 0:15 > .................................... done. > Using local APIC timer interrupts. > calibrating APIC timer ... > ..... CPU clock speed is 864.2437 MHz. > ..... host bus clock speed is 132.9603 MHz. > cpu: 0, clocks: 1329603, slice: 443201 > CPU0 > cpu: 1, clocks: 1329603, slice: 443201 > CPU1 > checking TSC synchronization across CPUs: passed. > mtrr: your CPUs had inconsistent variable MTRR settings > mtrr: probably your BIOS does not setup all CPUs > PCI: PCI BIOS revision 2.10 entry at 0xfb3e0, last bus=1 > PCI: Using configuration type 1 > PCI: Probing PCI hardware > Unknown bridge resource 0: assuming transparent > Unknown bridge resource 1: assuming transparent > Unknown bridge resource 2: assuming transparent > PCI: Using IRQ router VIA [1106/0686] at 00:07.0 > PCI->APIC IRQ transform: (B0,I6,P0) -> 12 > PCI->APIC IRQ transform: (B0,I7,P3) -> 12 > PCI->APIC IRQ transform: (B0,I7,P3) -> 12 > PCI->APIC IRQ transform: (B0,I13,P0) -> 10 > PCI->APIC IRQ transform: (B0,I14,P0) -> 11 > PCI: Enabling Via external APIC routing > isapnp: Scanning for PnP cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Initializing RT netlink socket > apm: BIOS version 1.2 Flags 0x07 (Driver version 1.14) > apm: disabled - APM is not SMP safe. > mxt_scan_bios: enter > Starting kswapd v1.8 > allocated 64 pages and 64 bhs reserved for the highmem bounces > VFS: Diskquotas version dquot_6.5.0 initialized > Detected PS/2 Mouse Port. > pty: 2048 Unix98 ptys configured > Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT > SHARE_IRQ SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > Real Time Clock Driver v1.10d > block: queued sectors max/low 1365629kB/1234557kB, 4032 slots per queue > RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz PCI bus speed for PIO modes; override with idebus=xx > VP_IDE: IDE controller on PCI bus 00 dev 39 > VP_IDE: chipset revision 6 > VP_IDE: not 100% native mode: will probe irqs later > ide: Assuming 33MHz PCI bus speed for PIO modes; override with idebus=xx > VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1 > ide0: BM-DMA at 0xd400-0xd407, BIOS settings: hda:DMA, hdb:pio > ide1: BM-DMA at 0xd408-0xd40f, BIOS settings: hdc:DMA, hdd:pio > hda: QUANTUM FIREBALLlct20 40, ATA DISK drive > hdc: CDU5211, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > ide1 at 0x170-0x177,0x376 on irq 15 > hda: 78177792 sectors (40027 MB) w/418KiB Cache, CHS=4866/255/63, UDMA(33) > ide-floppy driver 0.97 > Partition check: > hda: hda1 hda2 hda3 > FDC 0 is a post-1991 82077 > ide-floppy driver 0.97 > md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md: Autodetecting RAID arrays. > md: autorun ... > md: ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 16384 buckets, 128Kbytes > TCP: Hash tables configured (established 524288 bind 65536) > Linux IP multicast router 0.06 plus PIM-SM > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > RAMDISK: Compressed image found at block 0 > Freeing initrd memory: 324k freed > VFS: Mounted root (ext2 filesystem). > Journalled Block Device driver loaded > EXT3-fs: INFO: recovery required on readonly filesystem. > EXT3-fs: write access will be enabled during recovery. > kjournald starting. Commit interval 5 seconds > EXT3-fs: recovery complete. > EXT3-fs: mounted filesystem with ordered data mode. > Freeing unused kernel memory: 240k freed > Adding Swap: 2040244k swap-space (priority -1) > usb.c: registered new driver usbdevfs > usb.c: registered new driver hub > usb-uhci.c: $Revision: 1.259 $ time 17:18:11 Sep 6 2001 > usb-uhci.c: High bandwidth mode enabled > usb-uhci.c: USB UHCI at I/O 0xd800, IRQ 12 > usb-uhci.c: Detected 2 ports > usb.c: new USB bus registered, assigned bus number 1 > hub.c: USB hub found > hub.c: 2 ports detected > usb-uhci.c: USB UHCI at I/O 0xdc00, IRQ 12 > usb-uhci.c: Detected 2 ports > usb.c: new USB bus registered, assigned bus number 2 > hub.c: USB hub found > hub.c: 2 ports detected > usb-uhci.c: v1.251:USB Universal Host Controller Interface driver > EXT3 FS 2.4-0.9.8, 25 Aug 2001 on ide0(3,2), internal journal > kjournald starting. Commit interval 5 seconds > EXT3 FS 2.4-0.9.8, 25 Aug 2001 on ide0(3,1), internal journal > EXT3-fs: mounted filesystem with ordered data mode. > parport0: PC-style at 0x378 [PCSPP,EPP] > parport0: cpp_daisy: aa5500ff(38) > parport0: assign_addrs: aa5500ff(38) > parport0: cpp_daisy: aa5500ff(38) > parport0: assign_addrs: aa5500ff(38) > parport_pc: Via 686A parallel port: io=0x378 > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:E0:81:20:55:CC, IRQ > 10. Board assembly 567812-052, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > eth1: Intel Corporation 82557 [Ethernet Pro 100] (#2), 00:E0:81:20:55:CD, > IRQ 11. > Board assembly 567812-052, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > APIC error on CPU1: 00(02) > APIC error on CPU0: 00(08) > APIC error on CPU0: 08(08) > APIC error on CPU0: 08(08) > APIC error on CPU1: 02(02) > APIC error on CPU1: 02(08) PLEASE HELP abby From raysonlogin at yahoo.com Tue Apr 2 20:07:19 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Tue Mar 16 01:02:17 2010 Subject: Uptime data/studies/anecdotes ... ? In-Reply-To: Message-ID: <20020403040719.14849.qmail@web11408.mail.yahoo.com> --- "Roger L. Smith" wrote: > We currently run an average of about 75% utilization on our 586 > processor (293 node) cluster. We probably have about one node per > week crash and hang for various reasons. The OpenPBS backfilling algorithm is really bad. If you are running parallel jobs, you should use PBS+Maui. > We have occasional problems with memory leaks or PBS hangups which > require large scale reboots of the cluster. (Actually, PBS just died > as I'm typing this, but our pbs heartbeat script should restart it > automatically in a few minutes). I'd say we have to do a full reboot > of the cluster about every 3-4 months. One bigger problem is (or was, I haven't been looking at PBS code since last fall) that in each scheduling cycle, the scheduler tries to contact each MOM in the cluster to get resource information, but if one of the MON dies, then the scheduler hangs... and then timeout & restarts. You may try the "Cplant Fault Recovery Patch" and several other patches if you want to stay with PBS. > For a bunch of PC hardware running a free OS, this seems like a > pretty good number to me. It's not in the same class as our Sun > servers (nor even our SGIs!), but then, none of those systems are > this large, either. Another problem (at least in OpenPBS 2.3.12) is that there are some hard limit that is defined in the source (like "#define PBS_ACCT_MAX_RCD 4095", "#define PBS_NET_MAX_CONNECTIONS 256", which may not work in large clusters) If you want something free, then you may try SGE. It scales quite nicely (SGE improved a lot in 5.3), it's open source, and integrates with Maui. I like SGE better than OpenPBS. -- at least when one (or more?) of your nodes dies, the cluster continues to operate, and SGE even re-runs the job for you. Another feature is the shadow master, which restarts the master daemon on other machines if your master node dies. I think someone on this list is planning to tell us his experience with SGE on his beowulf? Rayson P.S. links: OpenPBS public home: http://www-unix.mcs.anl.gov/openpbs/ SGE : http://gridengine.sunsource.net Maui : http://www.supercluster.org __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From Drake.Diedrich at anu.edu.au Tue Apr 2 23:11:26 2002 From: Drake.Diedrich at anu.edu.au (Drake Diedrich) Date: Tue Mar 16 01:02:17 2010 Subject: pvm povray help In-Reply-To: <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> References: <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> Message-ID: <20020403171126.B26086@duh.anu.edu.au> On Thu, Mar 21, 2002 at 11:39:24AM +0100, Luc Vereecken wrote: > >a very large project. > > That shouldn't be a very large project at all. Read the inputfile The very large part would be in broadcasting the parsed object tree, so as to limit the serial overhead of parsing to just one node, rather than duplicate that effort on all nodes. From opengeometry at yahoo.ca Tue Apr 2 23:31:52 2002 From: opengeometry at yahoo.ca (William Park) Date: Tue Mar 16 01:02:17 2010 Subject: Hyperthreading in P4 Xeon (question) Message-ID: <20020403023152.A2972@node0.opengeometry.ca> What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not versed in the latest CPU trends. Does it mean that dual-P4Xeon will behave like 4-way SMP? -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From didonato at bigpond.net.au Wed Apr 3 00:35:03 2002 From: didonato at bigpond.net.au (Christian Di Donato) Date: Tue Mar 16 01:02:17 2010 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: <000301c1daea$76b06a40$99ca8490@claptop> There is a Whitepaper on the Xeon Processor concerning Hyperthreading over at the intel site http://www.intel.com/eBusiness/products/server/processor/xeon/wp020901_s um.htm -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org] On Behalf Of William Park Sent: Wednesday, 3 April 2002 5:32 PM To: beowulf@beowulf.org Subject: Hyperthreading in P4 Xeon (question) What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not versed in the latest CPU trends. Does it mean that dual-P4Xeon will behave like 4-way SMP? -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Luc.Vereecken at chem.kuleuven.ac.be Wed Apr 3 03:32:39 2002 From: Luc.Vereecken at chem.kuleuven.ac.be (Luc Vereecken) Date: Tue Mar 16 01:02:17 2010 Subject: pvm povray help In-Reply-To: <20020403171126.B26086@duh.anu.edu.au> References: <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> Message-ID: <3.0.6.32.20020403133239.008b8720@arrhenius.chem.kuleuven.ac.be> At 17:11 3/04/02 +1000, Drake Diedrich wrote: >On Thu, Mar 21, 2002 at 11:39:24AM +0100, Luc Vereecken wrote: >> >a very large project. >> >> That shouldn't be a very large project at all. Read the inputfile > > The very large part would be in broadcasting the parsed object tree, so >as to limit the serial overhead of parsing to just one node, rather than >duplicate that effort on all nodes. Would that duplication avoidance gain you anything ? Current case (IIRC, I haven't used pvm povray recently): Every node reads the inputfile (possibly from an inefficient NFS mounted volume), and parses. New Case 1 : Read the inputfiles on master, broadcast these N bytes, parse for Q seconds on all nodes. User gained : no need to have the input file on all nodes. Developer gained : easy to implement. New Case 2 : Read the inputfile on master, parse for Q' seconds on master node, broadcast M bytes for parsed object tree. User gained: no need to have the input file on all nodes. In the second case, you have NODES-1 nodes doing nothing, but you might not be able to do anything with that free time, as they e.g. are already allocated to that job, or whatever, especially since the parsing is fairly short compared to the rendering. Assuming identical nodes, the walltime of the parsing is the same everywhere (Q=Q'), and duplicating that effort doesn't require extra walltime (so irrelevant unless you're charged per used cpu time, or if you have multiple jobs per processor (e.g. SMP) to reclaim the idle time). If so, it then depends on whether the parsed object tree (M bytes) is larger or smaller than the text inputfiles and other required files (N bytes). If M > N, it takes longer to broadcast the parsed tree, if N > M, then it is quicker to broadcast the parsed tree. If the Master node is faster than the others, it's parsing time might be shorter than the slowest of the other nodes (Q' < Q), and then it is possible that even with M > N, it might be faster to distribute the parsed tree rather than the inputfiles. The basic question is therefore : how large is (typically) the parsed tree compared to the original input file ? Standard povray include files should be assumed predistributed as they should/can be installed on each node together with the executable. To be honest, I have no idea about this ratio. Luc From didonato at bigpond.net.au Wed Apr 3 04:29:25 2002 From: didonato at bigpond.net.au (Christian Di Donato) Date: Tue Mar 16 01:02:17 2010 Subject: Testing Message-ID: <000401c1db0b$3438b7a0$99ca8490@claptop> Can someone just reply to this list and confirm that they are indeed receiving this. Only one person needs to reply. I'm getting e-mails bouncing back every time I try to send something to beowulf@beowulf.org Thanks in Advance And Kind Regards Christian Di Donato From walke at usna.edu Wed Apr 3 04:46:46 2002 From: walke at usna.edu (LT V. H. Walke) Date: Tue Mar 16 01:02:17 2010 Subject: Testing In-Reply-To: <000401c1db0b$3438b7a0$99ca8490@claptop> References: <000401c1db0b$3438b7a0$99ca8490@claptop> Message-ID: <1017838087.30683.1.camel@vhwalke.mathsci.usna.edu> I read you loud and clear. Vann On Wed, 2002-04-03 at 07:29, Christian Di Donato wrote: > Can someone just reply to this list and confirm that they are indeed > receiving this. Only one person needs to reply. I'm getting e-mails > bouncing back every time I try to send something to beowulf@beowulf.org > > > Thanks in Advance > > And Kind Regards > > > Christian Di Donato > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- ---------------------------------------------------------------------- Vann H. Walke Office: Chauvenet 341 Computer Science Dept. Ph: 410-293-6811 572 Holloway Road, Stop 9F Fax: 410-293-2686 United States Naval Academy email: walke@usna.edu Annapolis, MD 21402-5002 http://www.cs.usna.edu/~walke ---------------------------------------------------------------------- From Daniel.Kidger at quadrics.com Wed Apr 3 05:56:28 2002 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Tue Mar 16 01:02:17 2010 Subject: Testing Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2D37@stegosaurus.bristol.quadrics.com> Christian Di Donato [mailto:didonato@bigpond.net.au] wrote: >Can someone just reply to this list and confirm that they are indeed >receiving this. Only one person needs to reply. I'm getting e-mails >bouncing back every time I try to send something to beowulf@beowulf.org So why cant that someone reply just to you rather than the whole list? and more importantly - how can anyone know that they are the said 'one person' ! Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From math at velocet.ca Wed Apr 3 06:13:23 2002 From: math at velocet.ca (Velocet) Date: Tue Mar 16 01:02:17 2010 Subject: Testing In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA74D2D37@stegosaurus.bristol.quadrics.com>; from Daniel.Kidger@quadrics.com on Wed, Apr 03, 2002 at 02:56:28PM +0100 References: <010C86D15E4D1247B9A5DD312B7F5AA74D2D37@stegosaurus.bristol.quadrics.com> Message-ID: <20020403091323.J69845@velocet.ca> On Wed, Apr 03, 2002 at 02:56:28PM +0100, Daniel Kidger's all... > > Christian Di Donato [mailto:didonato@bigpond.net.au] wrote: > > >Can someone just reply to this list and confirm that they are indeed > >receiving this. Only one person needs to reply. I'm getting e-mails > >bouncing back every time I try to send something to beowulf@beowulf.org > > So why cant that someone reply just to you rather than the whole list? > > and more importantly > - how can anyone know that they are the said 'one person' ! Because when he asks for 'only one person' there's an implicit semaphore called in the operation. Didnt you heed it? Now look what you've done! :) This would all be funnier if it was still Apr 1. /kc > > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From timm at fnal.gov Wed Apr 3 06:27:42 2002 From: timm at fnal.gov (Steven Timm) Date: Tue Mar 16 01:02:17 2010 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: I have one such test machine that we are evaluating at the moment. It's a dual cpu machine but under Linux it shows up looking like it has four cpu's. Haven't actually tried yet to see if it really can run four loads just as well... the specimen we have has DDR SDRAM and already gets bogged down going with two processes at once. Steve Timm ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Operating Systems Support Scientific Computing Support Group--Computing Farms Operations On Wed, 3 Apr 2002, William Park wrote: > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > behave like 4-way SMP? > > -- > William Park, Open Geometry Consulting, > 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From hahn at physics.mcmaster.ca Wed Apr 3 07:50:06 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:17 2010 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > behave like 4-way SMP? for some value of "behave like" ;) that is, it will definitely NOT get twice as fast. but it will appear to have 4 CPUs, and can run 4 threads/procs at once (for values of "once" > 1 clock cycle ;) we did a quick test on a dual-prestonia here, and saw a ~5% speedup on a probably cache-friendly, compute-bound task. From jurgen at botz.org Wed Apr 3 10:25:31 2002 From: jurgen at botz.org (Jurgen Botz) Date: Tue Mar 16 01:02:17 2010 Subject: Linux Software RAID5 Performance In-Reply-To: Message from mprinkey@aeolusresearch.com (Michael Prinkey) of "Sun, 31 Mar 2002 14:33:59 EST." Message-ID: <18878.1017858331@localhost> Michael Prinkey wrote: > Again, performance (see below) is remarkably good, especially considering > all of the strikes against this configuration: EIDE instead of SCSI, UDMA66 > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave drives on > each port instead of a single drive per port. With regard to the master/slave config... I note that your performance test is a single reader/writer... in this config with RAID5 I would expect the performance to be quite good even with 2 drives per IDE controller. But if you have several processes doing disk I/O simultaneously you should see a rather more precipitous drop in performance than you would with a single drive per IDE controller. I'm working on testing a very similar config right now and that's one of my findings (which I had expected) but our application for this is not very performance sensitive so it's not a big deal. A more important issue for me is reliability, and I'm somewhat concerned about failure modes. For example, can an IDE drive fail in such a way that if will disable the controller or the other drive on the same controller? If so, that would seriously limit the usefulness of RAID5 in this config. In general how good is Linux software RAID's failure handling? Etc. :j -- J?rgen Botz | While differing widely in the various jurgen@botz.org | little bits we know, in our infinite | ignorance we are all equal. -Karl Popper From ron_chen_123 at yahoo.com Wed Apr 3 11:02:19 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:18 2010 Subject: Fwd: FreeBSD port of SGE Message-ID: <20020403190219.24496.qmail@web14701.mail.yahoo.com> FreeBSD hackers and Beowulf users, I am porting SGE (a software for the compute farms, or the so-called batch systems) to *BSDs, and I am wondering if someone can take over some of the ports. I just started porting the code to *BSDs. Currently, I can get the code compiled on *BSDs with "#ifdef BSD"s. I am starting the system specific part, mainly to get the load, cpu, and stuff like that. I am not done yet, but I just want to tell you that it is getting there :-) Someone also started the SGE port to FreeBSD (which means duplicated work), so you are interested, or if you want to be the maintainer of the ports (currently, we have FreeBSD, NetBSD, OpenBSD, Darwin/MacOSX), please contact me. More info: gridengine.sunsource.net Thanks, -Ron --- I wrote: > Status of the port(s): > > - compiled on FreeBSD, NetBSD, OpenBSD. > - coding routines to get the load: > load: getloadavg(3), kvm_getloadavg(3) > #cpu: sysctl(3) hw.ncpu > mem : sysctl(3) vm.stats_vm.* > proc info: kvm_getprocs(3) > > -Ron > > --- Andy Schwierskott > wrote: > > Ron, > > > > > OK, I think I should write a porting-HOWTO. > > > > > > Once I am done, can you also include in the > > "HowTo" > > > page? > > > > Of course, we (and certainly many developers) > would > > be more than happy to > > add such a page;-) > > > > Andy > > > dev-unsubscribe@gridengine.sunsource.net > For additional commands, e-mail: > dev-help@gridengine.sunsource.net > __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Wed Apr 3 11:16:28 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:18 2010 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: Message-ID: > I can amplify that point. A commercial CFD application ran significantly > slower using 4 threads vs 2 on a dual Prestonia system. Anything memory > limited will probably behave the same way. well, it's an interesting issue. afaikt, the benefit of HT depends on what degree your app leaves idle resources. for instance, if everything you run is thrashing your dram bandwidth (big arrays, perhaps), then forget HT - it doesn't add extra dimms! similarly, if the CPU has just one fsqrt unit, and that's your bottleneck, HT doesn't add more units. there are other resource nonlinearities, like cache hitrate - the same effect that gives rise to superlinear SMP speedup will slaughter some apps run on HT... but if there's other work to be done while one thread is spinning sqrt's, ie, there are idle resources, then a thread that uses them will show HT profit... in some sense, HT works precisely when the system's resources *don't* match the optimal set your app wants. I wonder if/when Intel will start pouring in hordes of extra functional units, since another 50M transistors will only improve the cache hit rate a little bit... of course, it's also true that HT makes bigger TLB's and more associative caches attractive... From garcia_garcia_adrian at hotmail.com Wed Apr 3 11:14:06 2002 From: garcia_garcia_adrian at hotmail.com (Adrian Garcia Garcia) Date: Tue Mar 16 01:02:18 2010 Subject: DHCP Help Again Message-ID: An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020403/4e0c70c2/attachment.html From crhea at mayo.edu Wed Apr 3 13:04:12 2002 From: crhea at mayo.edu (Cris Rhea) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... Message-ID: <200204032104.PAA23347@sijer.mayo.edu> What are folks doing about keeping hardware running on large clusters? Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... Sure seems like every week or two, I notice dead fans (each RS-1200 has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). My last fan failure was a CPU fan that toasted the CPU and motherboard. How are folks with significantly more nodes than mine dealing with constant maintenance on their nodes? Do you have whole spare nodes sitting around- ready to be installed if something fails, or do you have a pile of spare parts? Did you get the vendor (if you purchased prebuilt systems) to supply a stockpile of warranty parts? One of the problems I'm facing is that every time something croaks, Racksaver is very good about replacing it under warranty, but getting the new parts delivered usually takes several days. For some things like fans, they sent extras for me to keep on-hand. For my last fan/CPU/motherboard failure, the node pair will be down ~5 days waiting for parts. Comments? Thoughts? Ideas? Thanks- --- Cris ---- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 From fraser5 at cox.net Wed Apr 3 13:37:56 2002 From: fraser5 at cox.net (Jim Fraser) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <000901c1db57$d222ac90$0300005a@papabear> Sounds to me like you have a heat problem. dual ultra thin's generally run pretty hot. good luck with it. There is just no room for any serious air to move thru that case. The fan diameter is so small that they require ridiculous rpms to move the needed volume making them noisy and prone to fail, add to that the high heat and you accelerate the mtbf to tomorrow. Most fans fail quickly in high heat conditions. I think the basic rack design concept while rugged and strong is fundamentally flawed and over priced. I would invest in a serious rack fan that moves major air out of that case somehow. good luck with it. jim -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Cris Rhea Sent: Wednesday, April 03, 2002 4:04 PM To: beowulf@beowulf.org Subject: How do you keep clusters running.... What are folks doing about keeping hardware running on large clusters? Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... Sure seems like every week or two, I notice dead fans (each RS-1200 has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). My last fan failure was a CPU fan that toasted the CPU and motherboard. How are folks with significantly more nodes than mine dealing with constant maintenance on their nodes? Do you have whole spare nodes sitting around- ready to be installed if something fails, or do you have a pile of spare parts? Did you get the vendor (if you purchased prebuilt systems) to supply a stockpile of warranty parts? One of the problems I'm facing is that every time something croaks, Racksaver is very good about replacing it under warranty, but getting the new parts delivered usually takes several days. For some things like fans, they sent extras for me to keep on-hand. For my last fan/CPU/motherboard failure, the node pair will be down ~5 days waiting for parts. Comments? Thoughts? Ideas? Thanks- --- Cris ---- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Maggie.Linux-Consulting.com Wed Apr 3 13:44:14 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: hi ya buy better quality fans... we use $15.oo fans ( 40x40x10mm ) stuff used in 1U chassis ( you can get fans as cheap as $4.oo but is a dead $1,000 server ( worth the cost differences of cheap fans ??? ( not the place to save $$$ ) - similarly ..get better quality (cooler running) powersupply too fans should NOT die... at least not more than once a year ... c ya alvin http:/www.linux-1U.net ... 11" deep 1U chassis w/ amd 1700+ On Wed, 3 Apr 2002, Cris Rhea wrote: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > How are folks with significantly more nodes than mine dealing with constant > maintenance on their nodes? Do you have whole spare nodes sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? > From nordwall at pnl.gov Wed Apr 3 14:46:31 2002 From: nordwall at pnl.gov (Doug J Nordwall) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> References: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <1017873992.2054.42.camel@duke> On Wed, 2002-04-03 at 13:04, Cris Rhea wrote: What are folks doing about keeping hardware running on large clusters? Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... Sure seems like every week or two, I notice dead fans (each RS-1200 has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). You running lm_sensors on your nodes? That's a handy tool for paying attention to things like that. We use ours in combination with ganglia and pump it to a web page and to big brother to see when a cpu might be getting hot, or a fan might be too slow. We actually saved a dozen machines that way...we have 32 4 processor racksaver boxes in a rack, and they rack was not designed to handle racksaver's fan system. That is to say, there was a solid sidewall on the rack, and it kept in heat. I set up lm_sensors on all the nodes (homogenous, so configured on one and pushed it out to all), then pumped the data into ganglia (ganglia.sourceforge.net) and then to a web page. I noticed that the temp on a dozen of the machines was extremely high. So, I took off the side panel of the rack. The temp dropped by 15 C on all the nodes, and everything was within normal parameters again. My last fan failure was a CPU fan that toasted the CPU and motherboard. Ya, we would have seen this on ours earlier...excellent tool How are folks with significantly more nodes than mine dealing with constant maintenance on their nodes? Do you have whole spare nodes sitting around- ready to be installed if something fails, or do you have a pile of spare parts? No, we don't actually, but we've talked about it Did you get the vendor (if you purchased prebuilt systems) to supply a stockpile of warranty parts? we use racksaver as well, so our experience is similar. Probably should talk to our people about getting some spare nodes One of the problems I'm facing is that every time something croaks, Racksaver is very good about replacing it under warranty, but getting the new parts delivered usually takes several days. Ya...this is another area where just monitoring the data can be helpful...if a fan is failing, you can see it coming (temperature slowly rises) and you can order it before hand and schedule downtime. ---- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Douglas J Nordwall http://rex.nmhu.edu/~musashi System Administrator Pacific Northwest National Labs -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020403/ad990599/attachment.html From tim.carlson at pnl.gov Wed Apr 3 14:59:56 2002 From: tim.carlson at pnl.gov (Tim Carlson) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <000901c1db57$d222ac90$0300005a@papabear> Message-ID: On Wed, 3 Apr 2002, Jim Fraser wrote: > Sounds to me like you have a heat problem. dual ultra thin's generally run > pretty hot. If you are putting these boxes in a rack and are not using the Racksaver rack, you need to take the side off of your rack (assuming you can do that) We've got 32 of these in a rack (4 CPU's per 1U) and they were running really hot until week took the side panel off. 5 minutes later the CPU temps had dropped 10C. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson@pnl.gov EMSL UNIX System Support From opengeometry at yahoo.ca Wed Apr 3 12:32:27 2002 From: opengeometry at yahoo.ca (William Park) Date: Tue Mar 16 01:02:18 2010 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: ; from hahn@physics.mcmaster.ca on Wed, Apr 03, 2002 at 10:50:06AM -0500 References: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: <20020403153227.A15201@node0.opengeometry.ca> On Wed, Apr 03, 2002 at 10:50:06AM -0500, Mark Hahn wrote: > > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > > behave like 4-way SMP? > > for some value of "behave like" ;) > that is, it will definitely NOT get twice as fast. but it will appear > to have 4 CPUs, and can run 4 threads/procs at once (for values of > "once" > 1 clock cycle ;) > > we did a quick test on a dual-prestonia here, and saw a ~5% speedup > on a probably cache-friendly, compute-bound task. Hi Mark, Steve, and Michael, Can you try compiling your kernel, using make clean; time make bzImage modules >& j1 make clean; time make -j2 bzImage modules >& j2 make clean; time make -j4 bzImage modules >& j4 -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From James.P.Lux at jpl.nasa.gov Wed Apr 3 17:15:58 2002 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <5.1.0.14.2.20020403170033.0248fec0@mail1.jpl.nasa.gov> You know, fans shouldn't fail...... There are fans available with 50,000 hour MTBFs.. sure, they cost a bit more than $5, but, given the cost of the time to replace them (especially if you cook something), it might be a good investment. You might cannibalize one of your failed fans to look for the number and kind of bearings. I have heard that some "ball bearing" fans actually have sleeve bearings, a sure recipe for short life. It's not unheard of to have some fans that are mislabelled. Bear in mind that most fans have two bearings (one on each end of the shaft) and it is entirely possible to build a fan with one sleeve and one ball bearing. At 03:04 PM 4/3/2002 -0600, Cris Rhea wrote: >What are folks doing about keeping hardware running on large clusters? > >Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > >Sure seems like every week or two, I notice dead fans (each RS-1200 >has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). >Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory 4800 Oak Grove Road, Mail Stop 161-213 Pasadena CA 91109 818/354-2075, fax 818/393-6875 From emiller at techskills.com Wed Apr 3 18:12:46 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:18 2010 Subject: Node boot disk to designate eth0 In-Reply-To: <20020403040719.14849.qmail@web11408.mail.yahoo.com> Message-ID: is there a switch I can pass to the node floppy routine that will cause the node to boot using a designated ethernet adapter? I have one onboard 10mb adapter and a PCI 100 mb adapter (eth0), but the node tries to connect throught the onboard eth1. I cannot disable the onboard adapter in BIOS (compaq :( ), so I need pass a pararmeter at boot time to use eth0. Can this be done? From becker at scyld.com Wed Apr 3 19:37:23 2002 From: becker at scyld.com (Donald Becker) Date: Tue Mar 16 01:02:18 2010 Subject: Node boot disk to designate eth0 In-Reply-To: Message-ID: On Wed, 3 Apr 2002, Eric Miller wrote: > Subject: Node boot disk to designate eth0 > > is there a switch I can pass to the node floppy routine that will cause the > node to boot using a designated ethernet adapter? I have one onboard 10mb > adapter and a PCI 100 mb adapter (eth0), but the node tries to connect > throught the onboard eth1. I cannot disable the onboard adapter in BIOS > (compaq :( ), so I need pass a pararmeter at boot time to use eth0. Can > this be done? The Scyld system tries all interfaces (using RARP) to find a master. That allows the system to work with all network topologies. To avoid using finding the master on eth1, just don't connect that interface to a master. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From emiller at techskills.com Wed Apr 3 19:50:21 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:18 2010 Subject: Fw: Node boot disk to designate eth0 Message-ID: <005e01c1db8b$da3a8590$c31fa6ac@xp> ----- Original Message ----- From: "Eric Miller" To: "Donald Becker" Sent: Wednesday, April 03, 2002 10:48 PM Subject: Re: Node boot disk to designate eth0 > > ----- Original Message ----- > From: "Donald Becker" > To: "Eric Miller" > Cc: > Sent: Wednesday, April 03, 2002 10:37 PM > Subject: Re: Node boot disk to designate eth0 > > > > On Wed, 3 Apr 2002, Eric Miller wrote: > > > > > Subject: Node boot disk to designate eth0 > > > > > > is there a switch I can pass to the node floppy routine that will cause > the > > > node to boot using a designated ethernet adapter? I have one onboard > 10mb > > > adapter and a PCI 100 mb adapter (eth0), but the node tries to connect > > > throught the onboard eth1. I cannot disable the onboard adapter in BIOS > > > (compaq :( ), so I need pass a pararmeter at boot time to use eth0. Can > > > this be done? > > > > The Scyld system tries all interfaces (using RARP) to find a master. > > That allows the system to work with all network topologies. > > > > To avoid using finding the master on eth1, just don't connect that > > interface to a master. > > It's odd, I can see that the drivers for both interfaces are being loaded, > and they are both the correct drivers. It does not seem to be looking on > both interfaces, however. It clearly is looking on only eth1, as it > specifies it line by line during the RARP requests. I've tried all the > obvious, different NIC, different board, etc. > > Thanks, I guess Ill leave it alone. > > > > -- > > Donald Becker becker@scyld.com > > Scyld Computing Corporation http://www.scyld.com > > 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters > > Annapolis MD 21403 410-990-9993 > > > From leandro at ep.petrobras.com.br Thu Apr 4 05:12:54 2002 From: leandro at ep.petrobras.com.br (Leandro Tavares Carneiro) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> References: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <1017925975.30189.87.camel@linux60> We have here an beowulf cluster with 64 production nodes and 128 processors, and we have some problems like you, about fans. Here, our cluster hardware is very cheap, using motherboards and cases founds easily in the local market, and the problems is critical. We have 5 spare nodes, and only 3 of that are ready to work. All our production nodes and the 3 spare nodes which are read to start are an dual PIII 1GHz, the other 2 spare nodes are an dual PIII 800MHz but this processors are slot 1 (SECC2) and we have one node down because we dont find coolers for this! The cooler vendors say they not producing anymore SECC2 coolers, and i am studying how can i adapt others fans in that coolers... this is sad but true. We have a lot of problems with memory, hard disks and other parts. A 3 months ago, our cluster nodes was one PIII 500 MHz per node, and after the upgrade to dual 1GHz we now have lots of memory and spare disks. I think this kind of problem is inevitable with cheap PC parts, and can be lower with high-quality (and price) parts. We are making an study to by a new cluster, for another application and we call Compaq and IBM to see what they have in hardware and software, with the hope of a future with less problems... Regards, and sorry about my poor english, i am brazilian and speak portuguese... Em Qua, 2002-04-03 ?s 18:04, Cris Rhea escreveu: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > How are folks with significantly more nodes than mine dealing with constant > maintenance on their nodes? Do you have whole spare nodes sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? > > Thanks- > > --- Cris > > > > ---- > Cristopher J. Rhea Mayo Foundation > Research Computing Facility Pavilion 2-25 > crhea@Mayo.EDU Rochester, MN 55905 > Fax: (507) 266-4486 (507) 284-0587 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Leandro Tavares Carneiro Analista de Suporte EP-CORP/TIDT/INFI Telefone: 2534-1427 From rgb at phy.duke.edu Thu Apr 4 06:52:47 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:18 2010 Subject: DHCP Help Again In-Reply-To: Message-ID: On Wed, 3 Apr 2002, Adrian Garcia Garcia wrote: For one thing don't use the range statement -- it tells dhcpd the range of IP numbers to assign UNKNOWN ethernet numbers. You are statically assigning an IP number in your "free" range to a particular host with a KNOWN ethernet number below. I don't know what dhcpd would do in that case -- something sensible one would hope but then, maybe not. The range statement is really there so you can dynamically allocate addresses from the range to hosts you may never have seen before that you don't care to ever address by name (as they might well get a different IP number on the next boot). DHCP servers run by ISP's not infrequently use the range feature to conserve IP numbers -- they only need enough to cover the greatest number of connections they are likely to have at any one time, not one IP number per host that might ever connect. Departments might use it to give IP numbers to laptops brought in by visitors (with the extra benefit that they can assign a subnet block that isn't "trusted" by the usual department servers and/or is firewalled from the outside by an ip-forwarding/masquerading host). You want "only" static IP's in your cluster, as you'd like nodo1 to be the same machine and IP address every time. Be a bit careful about your use of domain names. As it happens, I don't find cluster.org registered yet (amazingly enough!) but it is pretty easy to pick one that does exist in nameservice in the outside world. In that case you'll run a serious risk of routing or name resolution problems depending on things like the search order you use in /etc/nsswitch.conf. Even my previous example of rgb.private.net is a bit risky. You should run a nameserver (cache only is fine) on your 192.168.1.1 server, presuming it lives on an external network and you care to resolve global names. Similarly you may want: option routers 192.168.1.1; if you want internal hosts to be able to get out through your (presumed gateway) server. Finally, if you want nodo1 to come up knowing its own name without hardwiring it in on the node itself, add option host-name nodo1; to its definition. I admit that I do tend to lay out my dhcpd.conf a bit differently than you have it below but I don't think that the differences are particularly significant, and you have a copy of the one I use anyway if you want to play with the pieces. You should find a log trace of dhcpd's activities in /var/log/messages, which should help with any further debugging. On your nodo1 host, make sure that: cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=dhcp ONBOOT=yes and cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=nodo1 and that in /etc/modules.conf there is something like: cat /etc/modules.conf alias parport_lowlevel parport_pc alias eth0 tulip (or instead of tulip, whatever your network module is). If you then boot your e.g. RH client it SHOULD just come up, automatically try to start the network on device eth0 using dhcp as its protocol for obtaining and IP number, ask the dhcp server for an address and a route, and just "work" when they come back. Hope this helps. rgb > server-name "server.cluster.org" > > subnet 192.168.1.0 netmask 255.255.255.0 > { > range 192.168.1.2 192.168.1.10 #my client has the ip > 192.168.1.2 > #and my > server the static ip 192.168.1.1 > option subnet-mask 255.255.255.0; > option broadcast-address 192.168.1.255; > option domain-name-server 192.168.1.1; > option domain-name "cluster.org"; > > host nodo1.cluster.org > { > hardware ethernet 00:60:97:a1:ef:e0; #here is the address of the > client's card > fixed-address 192.168.1.2; > } > } > > And finally some files on my server. > > NETWORK > ------------------------------------------ > networking = yes > hostname =server.cluster.org > gatewaydev = eth0 > gatewaye= > ------------------------------------------ > > HOSTS ( In my server and in the client I have the same on this file ) > ------------------------------------------ > 127.0.0.1 localhost > 192.168.1.1 server.cluster.org > 192.168.1.2 nodo1.cluster.org > > > Ok thats the information, I am a little confuse, could you help me please > =). I can´t detect the mistake, I dont know if is the server or some card > =s. Thanks for all. > > ________________________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com. > _______________________________________________ Beowulf mailing list, > Beowulf@beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jayne at sphynx.clara.co.uk Thu Apr 4 10:49:03 2002 From: jayne at sphynx.clara.co.uk (Jayne Heger) Date: Tue Mar 16 01:02:18 2010 Subject: commercial parallel libraries Message-ID: Hi, I know this is a beowulf list, but I could do with getting some info on any (if there are) commercial parallel libraries, the equivalent of pvm and mpi. Do any of you know the names of any? Thanks. Jayne From gropp at mcs.anl.gov Thu Apr 4 11:01:59 2002 From: gropp at mcs.anl.gov (William Gropp) Date: Tue Mar 16 01:02:18 2010 Subject: commercial parallel libraries In-Reply-To: Message-ID: <5.1.0.14.2.20020404125850.0197fb88@localhost> At 06:49 PM 4/4/2002 +0000, Jayne Heger wrote: >Hi, > >I know this is a beowulf list, but I could do with getting some info on any >(if there are) commercial parallel libraries, the equivalent of pvm and mpi. > >Do any of you know the names of any? MPI is a standard for which there are both freely available and commercial implementations. Bill From Daniel.Kidger at quadrics.com Thu Apr 4 11:25:29 2002 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Tue Mar 16 01:02:18 2010 Subject: commercial parallel libraries Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2D43@stegosaurus.bristol.quadrics.com> -----Original Message----- William Gropp [mailto:gropp@mcs.anl.gov] wrote: >At 06:49 PM 4/4/2002 +0000, Jayne Heger wrote: > >>Hi, >> >>I know this is a beowulf list, but I could do with getting some info on any >>(if there are) commercial parallel libraries, the equivalent of pvm and mpi. >> >>Do any of you know the names of any? > >MPI is a standard for which there are both freely available and commercial >implementations. or do you mean something that is 'the equivalent of mpi and pvm' but which isn't pvm or mpi (like perhaps ARMCI)? Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From Kim.Branson at csiro.au Thu Apr 4 07:38:50 2002 From: Kim.Branson at csiro.au (Kim Branson) Date: Tue Mar 16 01:02:18 2010 Subject: node problems Message-ID: <1017934730.20621.38.camel@paracelsus> Hi all i have a 64node athlon cluster, at the moment i have about 19 nodes that are flaky, they stay up for a bit and then fall over. one can still ping them but not telnet or ftp. I'm trying to keep as many up as possible (more nodes means i can get the final calculations done for my phd thesis faster....) this may be an unrelated problem but i see errors in the logs about telnet node01 telnetd[16941]: ttloop: peer died: EOF xinetd[17099]: warning: can't get client address: Connection reset by peer Apr 5 00:32:21 node01 rlogind[17099]: Can't get peer name of remote host: Transport endpoint is not connected Apr 5 00:32:21 node01 rshd[17098]: getpeername: Transport endpoint is not connected Apr 5 00:32:21 node01 ftpd[17097]: getpeername (in.ftpd): Transport endpoint is not connected Apr 5 00:32:31 node01 rlogind[17100]: Can't get peer name of remote host: Transport endpoint is not connected Apr 5 00:32:31 node01 xinetd[17101]: warning: can't get client address: Connection reset by peer Apr 5 00:32:31 node01 xinetd[17102]: warning: can't get client address: Connection reset by peer Apr 5 00:32:31 node01 xinetd[17103]: warning: can't get client address: Connection reset by peer Apr 5 00:32:31 node01 ftpd[17101]: getpeername (in.ftpd): Transport endpoint is not connected i am using enfuzion to do job dispatch and collect. by looking at the packets i see the enfuzion director on the head node attempts to send a UDP packet to the node. all udp ports on the nodes are blocked i checked this by scanning a node with nmap. older installs of redhat (i.e my workstation) seem to have udp ports enabled. regardless of the ttloop error the machine appears to work for a while. i.e enfuzion logs in jobs run etc, untill sudennly all stops. the machines remain up, and can be pinged. but no other services (rsh ssh etc run) If i connect a monitor and keyboard to the node it is also unresponive. this is a problem across many nodes. has anyone who uses enfuzion seen this error with nodes that are a rh7.1 install On one node i have seen on 2 occasions CPU 0: Machine Check Exception: 0000000000000004 Bank 2: d40040000000017a at 540040000000017a decoding this using a until i found on the net Status: (4) Machine Check in progress. Restart IP invalid. parsebank(2): f60020000000017a @ 760020000000017a External tag parity error Correctable ECC error MISC register information valid Memory heirarchy error Request: Generic error Transaction type : Generic Memory/IO : I/O can anyone tell me what the Restart IP invalid means. is this a dead cpu or a memory problem causing a mce? cheers Kim -- ______________________________________________________________________ Kim Branson Phd Student Structural Biology CSIRO Health Sciences and Nutrition Walter and Eliza Hall Institute Royal Parade, Parkville, Melbourne, Victoria Ph 61 03 9662 7136 Email kbranson@wehi.edu.au ______________________________________________________________________ From juari at provinet.com.br Thu Apr 4 14:20:47 2002 From: juari at provinet.com.br (JOELMIR RITTER MULLER ) Date: Tue Mar 16 01:02:18 2010 Subject: very high bandwidth, low latency manner? Message-ID: <200204041920.AA26476752@provinet.com.br> what the best mean of interconnecting several microcomputers in a very high bandwidth, low latency manner? does anyone have some ideas about this subject? Cheers, Juari R. M?ller From James.P.Lux at jpl.nasa.gov Thu Apr 4 16:05:30 2002 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue Mar 16 01:02:18 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <200204041920.AA26476752@provinet.com.br> Message-ID: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> What's high bandwidth? What's low latency? How much money do you want to spend? Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time you get switches, cables, adapters, etc.) Latency is kind of slow (compared to dedicated point to point links) At 07:20 PM 4/4/2002 -0300, JOELMIR RITTER MULLER wrote: >what the best mean of interconnecting several microcomputers >in a very high bandwidth, low latency manner? >does anyone have some ideas about this subject? > >Cheers, >Juari R. M?ller > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory 4800 Oak Grove Road, Mail Stop 161-213 Pasadena CA 91109 818/354-2075, fax 818/393-6875 From aby_sinha at yahoo.com Thu Apr 4 16:43:01 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Tue Mar 16 01:02:18 2010 Subject: console redirect issue References: <3CA244E0.5060602@yahoo.com> Message-ID: <3CACF315.2080704@yahoo.com> Hi list Sometime back i posted this problem on the list and now that i solved it i wanted to share my experience with the list. I had console redirection enabled on a Tyan 2505 T bios and everytime i used to boot it used to go straight in the BIOS. On the other side i was using Hyperterminal(customer requirement). I checked the console redirection on an older version of hyperterminal(windows 2000) and found it to be working . I mean the system was not going into BIOS everytime. But when i used the Hyperterminal version 5 that comes with win2000 professional the system it was going into BIOS every time it booted without touching any key. So much for microsoft technology that the newer version doesnt work and the older version does . Finally we resorted to using CRT in windows to do console redirect and it worked fine. I was trying to convince the customer to use minicom since we were selling Linux based servers and knew it would work, but to no use. being a tech i was amazed at what we can do with linux since we have the code open. U dont realise it a lot of time until u get a Application that doesnt run and u cant do anythign abt it . Its ridiculous that the older version of Hyperterminal works and the newer one shows strange problems... Abhishek Sinha California Digital Abhishek sinha wrote: > hi list > > > This might be just out of the topic, but i couldnt find help anywhere. > I am using serial console redirect on the 2505 t Tyan board. now i am > getting strange things that i have never seen before. When i connect > the machines with the null modem cable , the machine (where the > console redirect is enabled ) goes into the BIOS. If u save and exit > again it goes into the BIOS without doing anything. When u disconnect > the cable then this does not happen . I tried using a cross over rj45 > cable. With this i cannot see the POST messages and i can only see the > messages when the kernel boots. Is this an issue with the BIOS or some > one has been in wonderland and seen this issue . > > Please advise > abhisek > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From raysonlogin at yahoo.com Thu Apr 4 17:43:34 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Tue Mar 16 01:02:18 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> Message-ID: <20020405014334.82902.qmail@web11402.mail.yahoo.com> You may consider Myrinet, VIA, SCI... (don't have the money to try each of those, so I can tell you which is the best ;-( ) http://grappew2k.imag.fr/evalRezo.html (just found this benchmark on the Net) Rayson --- Jim Lux wrote: > What's high bandwidth? > What's low latency? > How much money do you want to spend? > > Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time > you get > switches, cables, adapters, etc.) > Latency is kind of slow (compared to dedicated point to point links) > > > > > > > At 07:20 PM 4/4/2002 -0300, JOELMIR RITTER MULLER wrote: > > >what the best mean of interconnecting several microcomputers > >in a very high bandwidth, low latency manner? > >does anyone have some ideas about this subject? > > > >Cheers, > >Juari R. Müller > > > > > >_______________________________________________ > >Beowulf mailing list, Beowulf@beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > Jim Lux > Spacecraft Telecommunications Equipment Section > Jet Propulsion Laboratory > 4800 Oak Grove Road, Mail Stop 161-213 > Pasadena CA 91109 > > 818/354-2075, fax 818/393-6875 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From ron_chen_123 at yahoo.com Thu Apr 4 20:23:51 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:18 2010 Subject: FreeBSD port of SGE (Compute farm system) Message-ID: <20020405042351.86759.qmail@web14706.mail.yahoo.com> Hi, I compiled the source, changed a few parameters, and SGE finally runs on FreeBSD. It is running in single- user mode, with only 1 host. I am doing a little clean up, and then I will need to make sure my changes do not affect others (by "#ifdef BSD"). It still does not get the correct system information yet, but some of the job accounting info is there (at least run time is correct 8-) ). It is now running for several hours, it looks stable. It ran several tens of jobs. "qstat", "qhost", "qacct", "qconf", "qdel" look fine, output makes sense (but need to implement the resource info collecting routines). I will post the patches tomorrow, together with some output of the commands. (I will be busy today) Also, I will move the discussion from the hackers list to the cluster@freebsd list. -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From Karl.Bellve at umassmed.edu Fri Apr 5 07:08:53 2002 From: Karl.Bellve at umassmed.edu (Karl Bellve) Date: Tue Mar 16 01:02:18 2010 Subject: ISSPL Message-ID: <3CADBE05.1A608277@umassmed.edu> Is there an AMD or INTEL optimized version of the ISSPL libraries? We have an application that I ported from an array processor from CSPI to a Beowulf system and it uses ISSPL. Right now, the ISSPL library I am using is just straight C code and doesn't contain any optimization for the Intel/AMD platform. Or, is it better to switch to another library, like Intel Kernel Math Library, or perhaps just use FFTw. It would be simplier if I could just find a standard ISSPL library for Intel/AMD. -- Cheers, Karl Bellve, Ph.D. ICQ # 13956200 Biomedical Imaging Group TLCA# 7938 University of Massachusetts Email: Karl.Bellve@umassmed.edu Phone: (508) 856-6514 Fax: (508) 856-1840 PGP Public key: finger kdb@molmed.umassmed.edu From math at velocet.ca Fri Apr 5 09:59:56 2002 From: math at velocet.ca (Velocet) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <1017925975.30189.87.camel@linux60>; from leandro@ep.petrobras.com.br on Thu, Apr 04, 2002 at 10:12:54AM -0300 References: <200204032104.PAA23347@sijer.mayo.edu> <1017925975.30189.87.camel@linux60> Message-ID: <20020405125956.D69845@velocet.ca> On Thu, Apr 04, 2002 at 10:12:54AM -0300, Leandro Tavares Carneiro's all... > We have here an beowulf cluster with 64 production nodes and 128 > processors, and we have some problems like you, about fans. > Here, our cluster hardware is very cheap, using motherboards and cases > founds easily in the local market, and the problems is critical. > We have 5 spare nodes, and only 3 of that are ready to work. All our [..] > I think this kind of problem is inevitable with cheap PC parts, and can > be lower with high-quality (and price) parts. We are making an study to > by a new cluster, for another application and we call Compaq and IBM to > see what they have in hardware and software, with the hope of a future > with less problems... You can always employ the 'maximum tolerable failure rate' concept and buy for that rate. I find in terms of pricing equipment, there is a definite non linear (exponential?) relationship between MTBF and price. For a failure rate thats 3-5 times higher you can spend up to 40% less (or better) on equipment. This isnt a solid number, but feels within the ballpark to me based on what I've priced out before on clusters. Others may dispute this, but I am talking about buying Dell 2U rackmount servers pre-assembled vs a bunch of boards and CPUs and ram you slap together yourself. Using this concept, and setting your maximum tolerable failure rate at a specific level that suits your needs, for eg 1 node per month, coupled with an agreesive RMA schedule with a good vendor, you can get the best price performance out of a cluster. If you can withstand, using my example, 3-5 times higher failure rate which ends up being 1 node per month, you end up with 40% more gear. If you require 100% of all nodes present to be in one mesh involved in parallel calculations and a single node failure is catastrophic to the entire job running since startup, then its obviously not worth it if your jobs have a similar runtime as the failure rate (1 month). A failure rate of 1 node/5 months would work far better in that case, as the average failure would lose you only 10% of the work you do in 5 months, whereas with 40% more equipment and 5x the failure rate you may lose most of your work. (Note I am not considering that your jobs may run in [1 month / 1.4] instead due to the speedup from more gear - which will cause jobs to run in ~70% of the time (~3 weeks) - and therefore have a higher success rate in finishing in the 1 node/mo MTBF environment.) However, if your jobs run on all nodes for only a day, then a failure of a single node once per month nets you a loss of a half day per month lost work average. For this concession you get 40% more equipment (possibly meaning 40% more processing power, depending on your application). You also need to factor in how much personal time you have to deal with RMAing and swapping equipment. This may well make any efforts towards this kind of model impossible if extra time is not available. That notwithstanding, the cost of extra time can be easily factored into the equation (and knowledgeable work-study undergrads can be a REALLY cheap alternative here :) Of course with 40% more power, you may configure two sub-clusters of 70% power of the original HA design (HA = high availability ~ higher price). If this fits your needs, a failure of a single node once per month on average jobs of a day in length will net you the equivalent loss of a quarter-day total possible work. The more you isolate sections of the cluster from eachother, the less you will lose when a failure occurs. If you can manually segment your jobs to run one per node and still achieve near 100% (or more?) of possible capacity vs a more parallelized system, then a single node failure is inconsequential. Considering the amount and types of failures discussed here, there are obviously no guarantee that a certain type of cluster setup will save you from having massive problems. Being able to plan for downtime and manage the costs associated with it is also obviously part of the design and operation of the cluster. Its a seesaw-type of balance - if you want more nodes for less money, be prepared to spend more time fixing them. Of course with any cluster, more nodes of any type will logically translate into more down/service time - so there will probably be a non-linear translation of amount of work when comparing fewer HA nodes vs more cheaper nodes. Of course by this logic, buying fewer bigger nodes would also result in less work. At some point this becomes too expensive because you're buying big Suns that are very expensive per GFLOPS (unless of course, it suits your needs best...). Another problem with this whole situation that makes it even more complex is that many cluster installations are subject to strange pricing/operation cost models. Various parts may actually lie outside your budget responsability: One time costs: - design costs (on paper) - equipment purchase - equipment cosntruction/installation - equipment configuration - softwre installation & configuration Long term/ongoing: - software maintenance/reconfiguration - upkeep/repair - equipment upgrades - power costs - cooling costs There are probably sub categories these could be split into as well. The issue here is that, say in a university, power and cooling may be paid for by the university as well as manual labour for upkeep and repair. If that is the case, then getting very power-inefficient but fast CPUs may work well (AMD thunderbirds, for eg :). If you have to pay for your own power and cooling and manual labour, then you may well just opt for spending more on cheaper gear (Athlon XPs) - and at that point may as well go for HA gear as well (depending on the cost model) to save expensive manual labour (at commercial rates >$50/hr you can quickly rack up a node's cost in a day of work). We have successfully employed the non-HA equipment deisgn in building one of our clusters - and in fact there are added advantages. We have observed that most (for various values of 'most' - 50% to 80%?) failures occur within the first month of usage. Once you start swapping out bad nodes, you have a falling rate of failure (though the age of components slowly catches up over a long time period - things with moving parts, such as fans, especially). With all problems taken together (swapping over NFS included, as these are diskless nodes) we have about 1 node crash/fail in some way every 2 months. Of course, since jobs can be checkpointed, and a single node failing doesnt take down the whole cluster (as jobs are run on subsets of nodes) not much work is lost overall. For the increased throughput from more nodes for the money, and including about 15 minutes of work per month physically messing with the machines thats directly related to hardware problems and crashes (ie unrelated to the time spent maintaining the cluster as per normal operations), its been an overall win on that particular cluster. (We have not had to RMA any equipment since the start of the 2nd month of operation - under our current service agreement, RMA would take 1-3 days, and about 20-30 min of labour, and in the meantime not significantly impact the cluster's performance). As always, designing your cluster customized for your needs and limitations is always the biggest win on price/performance. Limitations to this are having very wide ranges of needs and not having any idea of what capabilities will be required in the future, along with expensive losses when there's downtime, and expensive manual labour to get things working again. Barring these kinds of considerations, commodity equipment with a failure rate that you can deal with can net noticeable gains - having a planned failure cost related to that rate will save you from suprises. No matter what kind of cluster you build you WILL have failures, and designing to be able to mitigate the impact from such to the highest possible extent is obviously good planning. /kc > Em Qua, 2002-04-03 ?s 18:04, Cris Rhea escreveu: > > > > What are folks doing about keeping hardware running on large clusters? > > > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > > > Sure seems like every week or two, I notice dead fans (each RS-1200 > > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > > > How are folks with significantly more nodes than mine dealing with constant > > maintenance on their nodes? Do you have whole spare nodes sitting around- > > ready to be installed if something fails, or do you have a pile of > > spare parts? Did you get the vendor (if you purchased prebuilt systems) > > to supply a stockpile of warranty parts? > > > > One of the problems I'm facing is that every time something croaks, > > Racksaver is very good about replacing it under warranty, but getting > > the new parts delivered usually takes several days. > > > > For some things like fans, they sent extras for me to keep on-hand. > > > > For my last fan/CPU/motherboard failure, the node pair will be > > down ~5 days waiting for parts. > > > > Comments? Thoughts? Ideas? > > > > Thanks- > > > > --- Cris > > > > > > > > ---- > > Cristopher J. Rhea Mayo Foundation > > Research Computing Facility Pavilion 2-25 > > crhea@Mayo.EDU Rochester, MN 55905 > > Fax: (507) 266-4486 (507) 284-0587 > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- > Leandro Tavares Carneiro > Analista de Suporte > EP-CORP/TIDT/INFI > Telefone: 2534-1427 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From cblack at EraGen.com Tue Apr 2 12:09:34 2002 From: cblack at EraGen.com (Chris Black) Date: Tue Mar 16 01:02:18 2010 Subject: Lost cycles due to PBS (was Re: Uptime data/studies/anecdotes) In-Reply-To: <"from roger"@ERC.MsState.Edu> References: <200204021824.g32IOMa14409@mycroft.ahpcrc.org> Message-ID: <20020402140934.A29446@getafix.EraGen.com> On Tue, Apr 02, 2002 at 12:46:07PM -0600, Roger L. Smith wrote: > On Tue, 2 Apr 2002, Richard Walsh wrote: [stuff deleted] > PBS is our leading cause of cycle loss. We now run a cron job on the > headnode that checks every 15 minutes to see if the PBS daemons have died, > and if so, it automatically restarts them. About 75% of the time that I > have a node fail to accept jobs, it is because its pbs_mom has died, not > because there is anything wrong with the node. > We used to have the same problem with PBS, especially when many jobs were in the queue. At that point sometimes the pbs master died as well. Since we've switched to SGE/GridEngine/CODINE I've been MUCH happier. Plus there are lots of nifty things you can do with the expandibility of writing your own load monitors via shell scripts and such. The whole point of this post is: GNQS < PBS < Sun Gridengine :) Chris (who tried two other batch schedulers until settling on SGE) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20020402/1433e290/attachment.bin From tekka99 at libero.it Tue Apr 2 13:29:55 2002 From: tekka99 at libero.it (Gianluca Cecchi) Date: Tue Mar 16 01:02:18 2010 Subject: Linux Software RAID5 Performance References: Message-ID: <006b01c1da8d$89365dd0$44e01d97@emea.cpqcorp.net> Which option did you use for ext3 journal mechanism? It makes difference expecially when using "writeback" vs the default "ordered" (see below part of the "Changes" file for ext3) Which I/O benchmark did you use? Thanks, Gianluca Cecchi New mount options: "mount -o journal=update" Mounts a filesystem with a Version 1 journal, upgrading the journal dynamically to Version 2. "mount -o data=journal" Journals all data and metadata, so data is written twice. This is the mode which all prior versions of ext3 used. "mount -o data=ordered" Only journals metadata changes, but data updates are flushed to disk before any transactions commit. Data writes are not atomic but this mode still guarantees that after a crash, files will never contain stale data blocks from old files. "mount -o data=writeback" Only journals metadata changes, and data updates are entirely left to the normal "sync" process. After a crash, files will may contain stale data blocks from old files: this mode is exactly equivalent to running ext2 with a very fast fsck on reboot. Ordered and Writeback data modes require a Version 2 journal: if you do not update the journal format then only the Journaled data will be allowed. The default data mode is Journaled for a V1 journal, and Ordered for V2. ----- Original Message ----- From: "Michael Prinkey" To: Sent: Sunday, March 31, 2002 9:33 PM Subject: Linux Software RAID5 Performance > Some time ago, a thread discussed the relative performance and stability > merits of different RAID solutions. At that time, I gave some results for > 640-GB arrays that I had build using EIDE drives and Software RAID5. I just > recently constructed and installed a 1.0-TB array and had some performance > numbers to share for it as well. They are interesting for two reasons: > First, the filesystem in use is ext3, rather than ext2. Second, the read > performance is significantly better (almost 2x) than that of the 640-GB > units. > > The system uses 11 120-GB Maxtor 5400-RPM drives, two Promise Ultra66 > controllers, a P4 1.6-GHz CPU, an Intel 850 motherboard, and 512 MB ECC > RDRAM. Drives are configured in RAID5 (9 data, 1 parity, 1 hot spare). > Four drives are on each Promise controller. Three are on the on-board EIDE > controller (UDMA100). A small boot drive is also on the on-board > controller. I had intended to use Ultra100 TX2 controllers, but the latest > EIDE driver updates with TX2 support are not making it into the latest > kernels (I'm using 2.4.18), so I opted for the older, slower controllers > rather than patching. So, I am both cautious and lazy. 8) > > Again, performance (see below) is remarkably good, especially considering > all of the strikes against this configuration: EIDE instead of SCSI, UDMA66 > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave drives on > each port instead of a single drive per port. With some hdparm tuning (-c 3 > -u 1), the read performance went from 83 MB/sec to 93 MB/sec. Write > performance remained essentially unchanged by tuning at 26 MB/sec. For > comparison, the 640-GB arrays gave read performance of about 56 MB/sec, > write performance of 28.5 MB/sec. > > Had I more time, I would have tested ext2 vs ext3 to ascertain how much that > change effected performance. Likewise, I was considering the use of a raid1 > array as the ext3 journal device to perhaps improve write performance. Any > thoughts? > > Regards, > > Mike Prinkey > Aeolus Research, Inc. > > ---------------------- > > [root@tera /root]# df; mount; cat /proc/mdstat; cat bonnie10.log > Filesystem 1k-blocks Used Available Use% Mounted on > /dev/hda6 38764268 2601128 34193976 8% / > /dev/hda1 101089 4965 90905 6% /boot > /dev/md0 1063591944 58195936 1005396008 6% /raid > raid640:/raid/home 630296592 284066148 346230444 46% /mnt/tmp > /dev/hda6 on / type ext2 (rw) > none on /proc type proc (rw) > /dev/hda1 on /boot type ext2 (rw) > none on /dev/pts type devpts (rw,gid=5,mode=620) > /dev/md0 on /raid type ext3 (rw) > automount(pid580) on /misc type autofs > (rw,fd=5,pgrp=580,minproto=2,maxproto=3) > raid640:/raid/home on /mnt/tmp type nfs (rw,addr=192.168.0.123) > Personalities : [raid5] > read_ahead 1024 sectors > md0 : active raid5 hdl1[10] hdk1[9] hdj1[8] hdi1[7] hdh1[6] hdg1[5] hdf1[4] > hde1[3] hdd1[2] hdc1[1] hdb1[0] > 1080546624 blocks level 5, 32k chunk, algorithm 2 [10/10] [UUUUUUUUUU] > > unused devices: > Bonnie 1.2: File '/raid/Bonnie.1027', size: 1048576000, volumes: 10 > Writing with putc()... done: 14810 kB/s 88.9 %CPU > Rewriting... done: 22288 kB/s 13.4 %CPU > Writing intelligently... done: 26438 kB/s 21.7 %CPU > Reading with getc()... done: 17112 kB/s 97.9 %CPU > Reading intelligently... done: 93332 kB/s 32.2 %CPU > Seek numbers calculated on first volume only > Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... > ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd > Seek- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k > (03)- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec > %CPU > raid05 10*1000 14810 88.9 26438 21.7 22288 13.4 17112 97.9 93332 32.2 206.3 > 2.1 > > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tekka99 at libero.it Tue Apr 2 13:44:28 2002 From: tekka99 at libero.it (Gianluca Cecchi) Date: Tue Mar 16 01:02:18 2010 Subject: Syntax for executing References: Message-ID: <00b201c1da8f$91945340$44e01d97@emea.cpqcorp.net> if using also pvm is not a problem you could use the pvm enabled version of povray 3d rendring engine: http://www.povray.org/ http://pvmpov.sourceforge.net/ Or, there are also MPI patches to povray (but I never used them): http://www.ce.unipr.it/pardis/parma2/povray/povray.html http://www.verrall.demon.co.uk/mpipov/ HIH. Bye, Gianluca Cecchi ----- Original Message ----- From: "Eric Miller" To: Sent: Tuesday, April 02, 2002 11:34 PM Subject: RE: Syntax for executing > disregard. SETI is not available in an MPI-enabled format. > > My apologies. Can anyone direct me to an URL that lists some available > programs that I can execute on the cluster? Preferably something with a > continuous (looping?) graphical output (e.g. SETI). This is a display for > students to visualize and promote educational programs for Linux, like a > museum peice. > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > Hey all, got a five-node cluster up running 27-z9, preparing for a 30 node > cluster. > > - What is the syntax to run an executable in the cluster environment? For > example, I run > > NP=5 mpi-mandel > > to run the test fractal program. How would I execute say, SETI, using the > cluster? Assume that the SETI executable is in the PATH. Also, the older > version of Scyld had some test code in /usr/mpi-beowulf/*. Is that gone? > > - What would cause all but one of the processors to show usage in > beostatus? The node shows "up" in every other way: hardware identical, > memory, swap, network, etc....just when I run something, only that one > processor on one node shows no % usage. > > -ETM > > .~. > /V\ > // \\ > /( )\ > ^'~'^ > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Wed Apr 3 13:23:44 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: I don't know how to say this without sounding condescending, but we resolved this problem by purchasing high quality machines. We currently use IBM x330s (although I also had good luck with our SGI 1100's before SGI discontinued them). We have enough nodes on hand, that IBM has stocked a couple of spare motherboards, power supplies, etc., but we don't need them that often. I've never had a fan failure. In general, hardware problems are a very minor part of the care and feeding of our cluster. On Wed, 3 Apr 2002, Cris Rhea wrote: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > How are folks with significantly more nodes than mine dealing with constant > maintenance on their nodes? Do you have whole spare nodes sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? > > Thanks- > > --- Cris > > > > ---- > Cristopher J. Rhea Mayo Foundation > Research Computing Facility Pavilion 2-25 > crhea@Mayo.EDU Rochester, MN 55905 > Fax: (507) 266-4486 (507) 284-0587 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From SGaudet at turbotekcomputer.com Wed Apr 3 13:26:28 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... Message-ID: <3450CC8673CFD411A24700105A618BD61BF020@911TURBO> Hello Chris, > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of > 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power > supply fans). > > My last fan failure was a CPU fan that toasted the CPU and > motherboard. > > How are folks with significantly more nodes than mine dealing > with constant > maintenance on their nodes? Do you have whole spare nodes > sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased > prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? ------------------------------------------ The vendor of choise should be using quality parts. We don't see these issues here. Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From haohe at me1.eng.wayne.edu Wed Apr 3 14:48:14 2002 From: haohe at me1.eng.wayne.edu (Hao He) Date: Tue Mar 16 01:02:18 2010 Subject: GbE Channel Bonding Message-ID: <200204032258.RAA15974@me1.eng.wayne.edu> Any one who has experience in bonding Gigabit Ethernet cards? How about the performance? Thanks. -HH From mikeprinkey at hotmail.com Wed Apr 3 10:10:10 2002 From: mikeprinkey at hotmail.com (Michael Prinkey) Date: Tue Mar 16 01:02:18 2010 Subject: Hyperthreading in P4 Xeon (question) Message-ID: I can amplify that point. A commercial CFD application ran significantly slower using 4 threads vs 2 on a dual Prestonia system. Anything memory limited will probably behave the same way. Mike Prinkey Aeolus Research, Inc. >From: Mark Hahn >To: William Park >CC: >Subject: Re: Hyperthreading in P4 Xeon (question) >Date: Wed, 3 Apr 2002 10:50:06 -0500 (EST) > > > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > > behave like 4-way SMP? > >for some value of "behave like" ;) >that is, it will definitely NOT get twice as fast. but it will appear >to have 4 CPUs, and can run 4 threads/procs at once (for values of >"once" > 1 clock cycle ;) > >we did a quick test on a dual-prestonia here, and saw a ~5% speedup >on a probably cache-friendly, compute-bound task. > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From alan at infogroup.it Wed Apr 3 15:13:39 2002 From: alan at infogroup.it (amedeo pimpini) Date: Tue Mar 16 01:02:18 2010 Subject: my first diskless beowulf cluster. Message-ID: <3CAB8CA3.9030809@infogroup.it> I've encountered a difficlult to launch init after mount root on nfs. Can somebody help me ? Follows details: I have compiled a 2.4.7-10 kernel whith autoconfiguration ip, with root on nfs and placed on /tftpboot the kernel mount /tftbboot but dont start init. On console of first ws: IP-Config: Got DHCPanswer from 10.1.1.1 my address is 10.0.0.2 ... VFS: Mounted root (nfs filesystem). Freeing unused kernel memory: 180k freed Kernel panik: No init found. Try passing init= option to kernel i have recompiled main.c with printk( %d ), errno end i obtined 8 with perror on have Error code 8: Exec format error If i mv /sbin/init /sbin/init.old then i obtine error 14. ON the server /var/log/messages i have: Apr 3 00:17:14 nut1 dhcpd: Both dynamic and static leases present for 10.1.1.2. Apr 3 00:17:14 nut1 dhcpd: Either remove host declaration nut2 or remove 10.1.1.2 Apr 3 00:17:14 nut1 dhcpd: from the dynamic address pool for 10.1.0.0 Apr 3 00:17:14 nut1 dhcpd: DHCPREQUEST for 10.1.1.2 from 00:e0:4c:20:6b:8f via eth0 Apr 3 00:17:14 nut1 dhcpd: DHCPACK on 10.1.1.2 to 00:e0:4c:20:6b:8f via eth0 Apr 3 00:17:14 nut1 mountd[1520]: mountproc_translate_mnt_1_svc(/tftpboot/10.1.1.2) Apr 3 00:17:14 nut1 mountd[1520]: NFS mount of /tftpboot/10.1.1.2 attempted from 10.1.1.2 Apr 3 00:17:14 nut1 mountd[1520]: /tftpboot/10.1.1.2 has been mounted by 10.1.1.2 and tcpdump: 00:21:06.149970 arp who-has nut1 tell nut2 00:21:06.149970 arp reply nut1 is-at 0:e0:4c:f0:6d:fb 00:21:06.149970 nut2.800 > nut1.sunrpc: udp 56 (DF) 00:21:06.149970 nut1.sunrpc > nut2.800: udp 28 (DF) 00:21:06.149970 nut2.800 > nut1.sunrpc: udp 56 (DF) 00:21:06.149970 nut1.sunrpc > nut2.800: udp 28 (DF) 00:21:06.149970 nut2.800 > nut1.849: udp 64 (DF) 00:21:06.149970 nut1.849 > nut2.800: udp 60 (DF) 00:21:06.149970 nut2.56685225 > nut1.nfs: 100 getattr [|nfs] (DF) 00:21:06.149970 nut1.nfs > nut2.56685225: reply ok 96 getattr DIR 47777 ids 0/0 sz 4096 (DF) 00:21:06.149970 nut2.73462441 > nut1.nfs: 100 fsstat [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.73462441: reply ok 48 fsstat [|nfs] (DF) 00:21:06.159970 nut2.90239657 > nut1.nfs: 108 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.90239657: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.107016873 > nut1.nfs: 112 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.107016873: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.123794089 > nut1.nfs: 108 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.123794089: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.140571305 > nut1.nfs: 108 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.140571305: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.157348521 > nut1.nfs: 112 read [|nfs] (DF) 00:21:06.159970 nut1 > nut2: (frag 23518:1244@2960) 00:21:06.159970 nut1 > nut2: (frag 23518:1480@1480+) 00:21:06.159970 nut1.nfs > nut2.157348521: reply ok 1472 read (frag 23518:1480@0+) 00:21:11.149970 arp who-has nut2 tell nut1 00:21:11.149970 arp reply nut2 is-at 0:e0:4c:20:6b:8f i've tagged my kernel with mknbi-linux --output=/tftpboot/vmlinux.3com /usr/src/linux-2.4.7-10/arch/i386/boot/bzImage --ip=":10.1.1.1:10.1.1.1:255.255.0.0:" From rgb at phy.duke.edu Wed Apr 3 15:27:31 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:18 2010 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: On Wed, 3 Apr 2002, Cris Rhea wrote: > Comments? Thoughts? Ideas? a) Use onboard sensors (hoping your motherboards have them) to shut nodes down if the CPU temp exceeds an alarm threshold. That way future fan failures shouldn't cause system failure, just node shutdown. b) Use the largest cases you can manage given your space requirements. Larger cases have a bit more thermal ballast and can tolerate poor cooling for a bit longer before catastrophically failing. Gives you (or your monitor software) more time to react if nothing else. c) With only ten boxes, it sounds like you're having plain old bad luck, possibly caused by a bad batch of fans. Relax, perhaps your luck will improve;-) With all that said, it is still true that maintenance problems scale poorly with number of nodes. One reason (of many) that I prefer not to get nodes from vendors in another state that I never meet face to face. If your nodes are built by a local vendor (especially one with a decent local parts inventory and service department) then it is a bit easier to get good turnaround on node repairs and minimize downtime, especially since a local business rapidly learns that to make you happy is more important to their bottom line than making the next twenty or thirty customers that might walk through their door happy. There is also the usual tradeoff between buying "insurance" (e.g. onsite, 24 hour service contracts) on everything and number of nodes. There are plenty of companies that will sell you nodes and guarantee minimal downtime -- for a price. IBM and Dell come to mind, although there are many more. Only you can determine how mission critical it is to keep your nodes up and what the cost benefit tradeoffs are between buying fewer nodes (but getting better quality nodes and arranging guarantees of minimal downtime) or buying more nodes (but risking having a node or two down pending repairs from time to time). Cost-benefit analysis is at the heart of beowulf engineering, but you have to determine the "values" that enter into the analysis based on your local needs. rgb > > Thanks- > > --- Cris > > > > ---- > Cristopher J. Rhea Mayo Foundation > Research Computing Facility Pavilion 2-25 > crhea@Mayo.EDU Rochester, MN 55905 > Fax: (507) 266-4486 (507) 284-0587 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From tim at dolphinics.com Tue Apr 2 10:05:48 2002 From: tim at dolphinics.com (Tim Wilcox) Date: Tue Mar 16 01:02:18 2010 Subject: Call for Papers Message-ID: <3CA9F2FC.7F39E715@dolphinics.com> CALL FOR PAPERS Workshop on High-Speed Local Networks (HSLN) as part of the IEEE LCN conference http://www.hcs.ufl.edu/hsln http://www.ieeelcn.org November 6 - 8, 2002 Embassy Suites USF, Tampa, Florida Important dates and contact: ---------------------------- Paper submission: June 10, 2002 Notification of acceptance: July 15, 2002 Camera-ready copy due: August 16, 2002 General Chair: Alan D. George (george@hcs.ufl.edu) General Information: -------------------- The High-Speed Local Networks (HSLN) workshop, within the 27th IEEE Conference on Local Computer Networks (LCN), focuses on the design, analysis, implementation, and exploitation of new concepts, techno- logies, and applications related to high-performance networks on a local scale. This workshop will bring together networking researchers, engineers, and practitioners from across the spectrum of high-speed local networks, with participants from industry, academia, and government. Original papers that present research results, case studies, technology development or deployment experience, work in progress, etc. are solicited, as are survey articles. Specific areas of interest include (but are not limited to): - High-speed LANs (e.g. Gigabit Ethernet, 10 Gigabit Ethernet) - System-area networks (e.g. SCI, Myrinet, ServerNet) - Storage-area networks (e.g. Fibre Channel) and I/O interconnects - High-speed networks in embedded systems (e.g. avionics, space systems) - Protocols, services, and topologies for high-speed local networks - Routing and switch architectures for high-speed local networks - Quality of Service (QoS) in high-speed local networks - Performance analysis of high-speed local networks and systems - Modeling and simulation of high-speed local networks - Middleware for high-speed local network communication - Applications for high-speed local networks (e.g. video on demand) Paper Submission Instructions: ------------------------------ Authors are invited to submit papers of up to ten camera-ready pages, in PDF or Postscript format, for presentation at the workshop and publication in the conference proceedings. Papers should be submitted by email to the workshop at hsln@hcs.ufl.edu on or before June 10, 2002. Alternatively, send five hard copies via postal mail to: Dr. Alan D. George HSLN General Chair Department of Electrical and Computer Engineering University of Florida PO Box 116200, 327 Larsen Hall Gainesville, FL 32611-6200 HSLN Organizing Committee: -------------------------- Workshop Chair: Industry Chair: Program Chair: A.D. George J.L. Meier K.J. Christensen ECE Department Advanced Technology Center CSE Department Univ of Florida Rockwell Collins, Inc. Univ of South Florida george@hcs.ufl.edu jlmeier@rockwellcollins.com christen@csee.usf.edu HSLN Program Committee: ----------------------- Jay Bragg (awbragg@yahoo.com) Consultant Ron Brightwell (bright@sandia.gov) Sandia National Labs, New Mexico Wayne Chang (wchang@arl.army.mil) Army Research Laboratory Helen Chen (hycsw@california.sandia.gov) Sandia National Labs, California Patrick W. Dowd (dowd@lts.ncsc.mil) University of Maryland at College Park and U.S. Department of Defense College Park, MD Mike Foster (michael.s.foster@boeing.com) Boeing Corporation Michael A. Hoard (hoardm@us.ibm.com) IBM Beaverton, OR Cynthia S. Hood (hood@iit.edu) Illinois Institute of Technology Chicago, IL Anestis Karasaridis (karasaridis@att.com) Network Design and Performance Analysis Dept. AT&T Labs, Middletown, NJ Fred Kuhns (fredk@arl.wustl.edu) Washington University St. Louis, MI Michael McKee (mckee026@umn.edu) University of Minnesota, Rochester Rochester, MN Knut Omang (knuto@fast.no) University of Oslo Oslo, Norway Sarp Oral (oral@hcs.ufl.edu) University of Florida Gainesville, FL D. K. Panda (panda@cis.ohio-state.edu) Ohio State University Columbus, Ohio Anthony Skjellum (tony@MPI-SoftTech.Com) Mississippi State University Starkville, MS Norm Strole (ncstrole@us.ibm.com) IBM Research Triangle Park, NC Rollins Turner (rturner@paradyne.com) Paradyne Corporation Largo, FL William White (wwhite@siue.edu) Southern Illinois University Edwardsville, IL Tim Wilcox (tim.wilcox@dolphinics.com) Technical Director, Dolphin Interconnect --- -------------- next part -------------- A non-text attachment was scrubbed... Name: tim.vcf Type: text/x-vcard Size: 180 bytes Desc: Card for Tim Wilcox Url : http://www.scyld.com/pipermail/beowulf/attachments/20020402/3038c307/tim.vcf From mikeprinkey at hotmail.com Tue Apr 2 15:07:21 2002 From: mikeprinkey at hotmail.com (Michael Prinkey) Date: Tue Mar 16 01:02:18 2010 Subject: Linux Software RAID5 Performance Message-ID: Hi Gianluca, I used the default "ordered" journaling option. I haven't really looked into the different journaling options and there impact on performance. Does the ordered option require two writes? Also, any thoughts on performance tuning or using an external raid1 journal device? The benchmark application is Bonnie 1.2. Thanks, Mike >From: "Gianluca Cecchi" >To: , >Subject: Re: Linux Software RAID5 Performance >Date: Tue, 2 Apr 2002 23:29:55 +0200 > >Which option did you use for ext3 journal mechanism? It makes difference >expecially >when using "writeback" vs the default "ordered" (see below part of the >"Changes" file for ext3) >Which I/O benchmark did you use? >Thanks, >Gianluca Cecchi > >New mount options: > > "mount -o journal=update" > Mounts a filesystem with a Version 1 journal, upgrading the > journal dynamically to Version 2. > > "mount -o data=journal" > Journals all data and metadata, so data is written twice. This > is the mode which all prior versions of ext3 used. > > "mount -o data=ordered" > Only journals metadata changes, but data updates are flushed to > disk before any transactions commit. Data writes are not atomic > but this mode still guarantees that after a crash, files will > never contain stale data blocks from old files. > > "mount -o data=writeback" > Only journals metadata changes, and data updates are entirely > left to the normal "sync" process. After a crash, files will > may contain stale data blocks from old files: this mode is > exactly equivalent to running ext2 with a very fast fsck on >reboot. > >Ordered and Writeback data modes require a Version 2 journal: if you do >not update the journal format then only the Journaled data will be >allowed. > >The default data mode is Journaled for a V1 journal, and Ordered for V2. > > >----- Original Message ----- >From: "Michael Prinkey" >To: >Sent: Sunday, March 31, 2002 9:33 PM >Subject: Linux Software RAID5 Performance > > > > Some time ago, a thread discussed the relative performance and stability > > merits of different RAID solutions. At that time, I gave some results >for > > 640-GB arrays that I had build using EIDE drives and Software RAID5. I >just > > recently constructed and installed a 1.0-TB array and had some >performance > > numbers to share for it as well. They are interesting for two reasons: > > First, the filesystem in use is ext3, rather than ext2. Second, the >read > > performance is significantly better (almost 2x) than that of the 640-GB > > units. > > > > The system uses 11 120-GB Maxtor 5400-RPM drives, two Promise Ultra66 > > controllers, a P4 1.6-GHz CPU, an Intel 850 motherboard, and 512 MB ECC > > RDRAM. Drives are configured in RAID5 (9 data, 1 parity, 1 hot spare). > > Four drives are on each Promise controller. Three are on the on-board >EIDE > > controller (UDMA100). A small boot drive is also on the on-board > > controller. I had intended to use Ultra100 TX2 controllers, but the >latest > > EIDE driver updates with TX2 support are not making it into the latest > > kernels (I'm using 2.4.18), so I opted for the older, slower controllers > > rather than patching. So, I am both cautious and lazy. 8) > > > > Again, performance (see below) is remarkably good, especially >considering > > all of the strikes against this configuration: EIDE instead of SCSI, >UDMA66 > > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave >drives >on > > each port instead of a single drive per port. With some hdparm tuning >(-c >3 > > -u 1), the read performance went from 83 MB/sec to 93 MB/sec. Write > > performance remained essentially unchanged by tuning at 26 MB/sec. For > > comparison, the 640-GB arrays gave read performance of about 56 MB/sec, > > write performance of 28.5 MB/sec. > > > > Had I more time, I would have tested ext2 vs ext3 to ascertain how much >that > > change effected performance. Likewise, I was considering the use of a >raid1 > > array as the ext3 journal device to perhaps improve write performance. >Any > > thoughts? > > > > Regards, > > > > Mike Prinkey > > Aeolus Research, Inc. > > > > ---------------------- > > > > [root@tera /root]# df; mount; cat /proc/mdstat; cat bonnie10.log > > Filesystem 1k-blocks Used Available Use% Mounted on > > /dev/hda6 38764268 2601128 34193976 8% / > > /dev/hda1 101089 4965 90905 6% /boot > > /dev/md0 1063591944 58195936 1005396008 6% /raid > > raid640:/raid/home 630296592 284066148 346230444 46% /mnt/tmp > > /dev/hda6 on / type ext2 (rw) > > none on /proc type proc (rw) > > /dev/hda1 on /boot type ext2 (rw) > > none on /dev/pts type devpts (rw,gid=5,mode=620) > > /dev/md0 on /raid type ext3 (rw) > > automount(pid580) on /misc type autofs > > (rw,fd=5,pgrp=580,minproto=2,maxproto=3) > > raid640:/raid/home on /mnt/tmp type nfs (rw,addr=192.168.0.123) > > Personalities : [raid5] > > read_ahead 1024 sectors > > md0 : active raid5 hdl1[10] hdk1[9] hdj1[8] hdi1[7] hdh1[6] hdg1[5] >hdf1[4] > > hde1[3] hdd1[2] hdc1[1] hdb1[0] > > 1080546624 blocks level 5, 32k chunk, algorithm 2 [10/10] >[UUUUUUUUUU] > > > > unused devices: > > Bonnie 1.2: File '/raid/Bonnie.1027', size: 1048576000, volumes: 10 > > Writing with putc()... done: 14810 kB/s 88.9 %CPU > > Rewriting... done: 22288 kB/s 13.4 %CPU > > Writing intelligently... done: 26438 kB/s 21.7 %CPU > > Reading with getc()... done: 17112 kB/s 97.9 %CPU > > Reading intelligently... done: 93332 kB/s 32.2 %CPU > > Seek numbers calculated on first volume only > > Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... > > ---Sequential Output (nosync)--- ---Sequential Input-- >--Rnd > > Seek- > > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- >--04k > > (03)- > > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU >/sec > > %CPU > > raid05 10*1000 14810 88.9 26438 21.7 22288 13.4 17112 97.9 93332 32.2 >206.3 > > 2.1 > > > > > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at >http://explorer.msn.com/intl.asp. > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > > _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From mikeprinkey at hotmail.com Wed Apr 3 11:49:00 2002 From: mikeprinkey at hotmail.com (Michael Prinkey) Date: Tue Mar 16 01:02:18 2010 Subject: Linux Software RAID5 Performance Message-ID: Indeed, the multiple processes accessing the device made significantly degrade performance. Fortunately for us, as well, access speed is limited by the NFS/SMB and the network, not by array performance. Unfortunately, the unit is online now and I can't fiddle around with the settings and test it further. WRT reliability, we have seen the array drop to degraded mode because of a single drive failure. We have also a single drive take down the entire IDE port. This results in the md device disappearing until you swap out the offending drive and restart the array. There is no data here. Usually one drive goes and the array goes into degraded mode and starts reconstructing on the spare. Then the second goes and the array disappears. It is a bit disconcerting to do ls /raid and get nothing back. Changing out the drive and restarting pulls everything back. I can honestly say that the only data loss that I have had on these arrays came when a maintenance person completely unplugged one of the arrays from the UPS. It caused low-level corruption on 5 of the 9 drives in the array. We ended up using a Windows 98 boot floppy with Maxtor's Powermax utility to patch them all back up. It took many hours. This is the WORST possible scenario, BTW. Even reseting the system gives the EIDE devices a chance to flush their caches and maintain low-level integrity. Cutting the power can leave the array/drives inconsistent on the filesystem, device (/dev/md0), and hardware-format datagram levels. So, lock your arrays in a cabinet! 8) Mike >From: Jurgen Botz >To: mprinkey@aeolusresearch.com (Michael Prinkey) >CC: beowulf@beowulf.org >Subject: Re: Linux Software RAID5 Performance >Date: Wed, 03 Apr 2002 10:25:31 -0800 > >Michael Prinkey wrote: > > Again, performance (see below) is remarkably good, especially >considering > > all of the strikes against this configuration: EIDE instead of SCSI, >UDMA66 > > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave >drives on > > each port instead of a single drive per port. > >With regard to the master/slave config... I note that your performance >test is a single reader/writer... in this config with RAID5 I would >expect the performance to be quite good even with 2 drives per IDE >controller. But if you have several processes doing disk I/O >simultaneously you should see a rather more precipitous drop in >performance than you would with a single drive per IDE controller. >I'm working on testing a very similar config right now and that's >one of my findings (which I had expected) but our application for this >is not very performance sensitive so it's not a big deal. > >A more important issue for me is reliability, and I'm somewhat >concerned about failure modes. For example, can an IDE drive fail >in such a way that if will disable the controller or the other >drive on the same controller? If so, that would seriously limit >the usefulness of RAID5 in this config. In general how good is >Linux software RAID's failure handling? Etc. > >:j > > >-- >Jürgen Botz | While differing widely in the various >jurgen@botz.org | little bits we know, in our infinite > | ignorance we are all equal. -Karl >Popper > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx From emiller at techskills.com Wed Apr 3 17:06:32 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:18 2010 Subject: another syntax question In-Reply-To: <20020403153227.A15201@node0.opengeometry.ca> Message-ID: For non-parallel applications, is it possible to run individual instances on diskless nodes? For example, I want to execute a non-MPI program "A" that is located in the /bin directory of my master node, but I want to run one instance of "A" on each of my diskless nodes. What is the syntax that equates to: #NP=1 "A" on node0 only #NP=1 "A" on node1 only #.... #.... From alvin at Maggie.Linux-Consulting.com Wed Apr 10 20:55:45 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Tue Mar 16 01:02:18 2010 Subject: Call for Papers In-Reply-To: <3CA9F2FC.7F39E715@dolphinics.com> Message-ID: hi tim am fairly sure St Louis in Missouri is MO for state initials thanx alvin http://www.Linux-1U.net .... 8 Drives in 1U chassis ... On Tue, 2 Apr 2002, Tim Wilcox wrote: > CALL FOR PAPERS > Workshop on High-Speed Local Networks (HSLN) > as part of the IEEE LCN conference > http://www.hcs.ufl.edu/hsln > http://www.ieeelcn.org > > November 6 - 8, 2002 > Embassy Suites USF, Tampa, Florida > > Fred Kuhns (fredk@arl.wustl.edu) > Washington University .... [ snipped ] > St. Louis, MI ^^^^^^^^^^^^^^^^^^^^^ > Michael McKee (mckee026@umn.edu) > University of Minnesota, Rochester > Rochester, MN > > Knut Omang (knuto@fast.no) > University of Oslo > Oslo, Norway > > Sarp Oral (oral@hcs.ufl.edu) > University of Florida > Gainesville, FL > > D. K. Panda (panda@cis.ohio-state.edu) > Ohio State University > Columbus, Ohio > > Anthony Skjellum (tony@MPI-SoftTech.Com) > Mississippi State University > Starkville, MS > .... From garcia_garcia_adrian at hotmail.com Thu Apr 4 09:48:29 2002 From: garcia_garcia_adrian at hotmail.com (Adrian Garcia Garcia) Date: Tue Mar 16 01:02:18 2010 Subject: DHCP Help Message-ID: An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020404/ca30ab16/attachment.html From rocky at atipa.com Thu Apr 4 11:43:58 2002 From: rocky at atipa.com (Rocky McGaugh) Date: Tue Mar 16 01:02:18 2010 Subject: commercial parallel libraries In-Reply-To: Message-ID: On Thu, 4 Apr 2002, Jayne Heger wrote: > > Hi, > > I know this is a beowulf list, but I could do with getting some info on any > (if there are) commercial parallel libraries, the equivalent of pvm and mpi. > > Do any of you know the names of any? > > Thanks. > > Jayne MPIPro is a commercial implementation of MPI. I've heard alot of good about their Win/32 implementation, but not as much about their normal unix MPI. Linda is another commercial parallel API that provides very good support and services. -- Rocky McGaugh Atipa Technologies rocky@atipatechnologies.com rmcgaugh@atipa.com 1-785-841-9513 x3110 http://1087800222/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' From wewu at oscar.eecs.tufts.edu Thu Apr 4 12:56:20 2002 From: wewu at oscar.eecs.tufts.edu (wewu@oscar.eecs.tufts.edu) Date: Tue Mar 16 01:02:18 2010 Subject: restrict node access Message-ID: We want to restrict regular users to access the nodes using rlogin or rsh or ssh in cluster, but still let PBS run the job. Does anybady have a better suggestion? We install oscar 1.2.1 on our cluster. Thanks From s02.sbecker at wittenberg.edu Thu Apr 4 15:47:53 2002 From: s02.sbecker at wittenberg.edu (s02.sbecker) Date: Tue Mar 16 01:02:18 2010 Subject: Scyld node boot problem Message-ID: <3C435198@smtp.wittenberg.edu> I am using version 27bz-8 for the Scyld disk, with kernel version 2.2.19-12.beo. I have a 3c905b card in the slave and a 3c905 in the master. I am getting to the third phase of the boot for the slave to where it outputs the log file. Then the node hangs. Here is the log file for node.0... node_up: Setting system clock. node_up: TODO set interface netmask. node_up: Configuring loopback interface. node_up: Configuring PCI devices. setup_fs: Configuring node filesystems... setup_fs: Using /etc/beowulf/fstab setup_fs: Checking /dev/ram3 (type=ext2)... setup_fs: Hmmm...This appears to be a ramdisk. setup_fs: I'm going to try to try checking the filesystem (fsck) anyway. setup_fs: If it is a RAM disk the following will fail harmlessly. e2fsck 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 Couldn't find ext2 superblock, trying backup blocks... e2fsck The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 : Bad magic number in super-block while trying to open /dev/ram3 setup_fs: FSCK failure. (OK for RAM disks) setup_fs: Creating ext2 on /dev/ram3... mke2fs 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults) setup_fs: Checking 192.168.1.1:/home (type=nfs)... setup_fs: Mounting 192.168.1.1:/home on /rootfs//home... (type=nfs; options=nolock) mount: 192.168.1.1:/home failed, reason given by server: Permission denied Failed to mount 192.168.1.1:/home on /home. Can someone help? Thanks. Shawn From sp at scali.com Thu Apr 4 18:08:32 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:18 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> Message-ID: On Thu, 4 Apr 2002, Jim Lux wrote: > What's high bandwidth? > What's low latency? > How much money do you want to spend? > > Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time you get > switches, cables, adapters, etc.) > Latency is kind of slow (compared to dedicated point to point links) > > Well this is a "touchy" topic since different people has different opinions. There is also different ways of measuring bandwidth mainly point to point (two machines talking together) and bisection (dividing your nework in half and let the one half talk to the other which kind of shows how the network scales with more nodes). Also some people like to talk about the hardware bandwidth and hardware latency, while the thing that really matters (IMHO) is application to application bandwidth and latency. I don't want to start a flamewar here, but I _think_ (not knowing real numbers for other high speed interconnects) that SCI has atleast the lowest latency and maybe also the highest point to point bandwidth : SCI application to application latency : 2.5 us SCI application to application bandwidth : 325 MByte/sec Note that these numbers are very chipset specific (as most high speed interconnect numbers are), these numbers are from IA64. Here are numbers from a popular IA32 platform, the AMD 760MPX : SCI application to application latency : 1.8 us SCI application to application bandwidth : 283 MByte/sec More "real" performance numbers using MPI over SCI (also collective and application benchmarks) can be located on Dolphin's homepage http://www.dolphinics.com Other popular high speed interconnects I know of is Myrinet (considered the main competitor to SCI for cluster interconnects) and Giganet. There are some performance numbers on Myricoms homepage (http://www.myricom.com) but I doubt if that is for their latest hardware generation (correct me if I'm wrong). Best regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From alvin at Maggie.Linux-Consulting.com Wed Apr 10 21:07:40 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Tue Mar 16 01:02:18 2010 Subject: Call for Papers - oops In-Reply-To: <20020411040416.0F7FE14093@smtp.x263.net> Message-ID: > hi tim > > am fairly sure St Louis in Missouri is MO for state initials - was hoping to help catch the typo before (??) it goes to the hard copy printers oopps... didnt mean for that to go the list... my apologies for bothering ya.... ( twice )... thanx alvin From suraj_peri at yahoo.com Sat Apr 6 03:35:45 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Tue Mar 16 01:02:18 2010 Subject: What could be the performance of my cluster In-Reply-To: <20020405125956.D69845@velocet.ca> Message-ID: <20020406113545.91938.qmail@web10504.mail.yahoo.com> Hi group, I was calculating the performance of my cluster. The features are 1. 8 nodes 2. Processor: AMD Athlon XP 1800+ 3. 8 CPUs 4. 8*1.5 GB DDR RAM 5. 1 Server with 2 processorts with AMD MP 1800+ and 2GB DDR RAM I calculated this to be 48 Mflops . Is this correct ? if not, what is the correct performance of my cluster. I also comparatively calculated that my cluster would be 3 times faster than AlphaServer DS20E ( 833 MHz alpha 64 bit processor, 4 GB max memory) Is my calculation correct or wrong? please help me ASAP. thanks in advance. cheers suraj. ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From suraj_peri at yahoo.com Sat Apr 6 03:35:45 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Tue Mar 16 01:02:18 2010 Subject: What could be the performance of my cluster In-Reply-To: <20020405125956.D69845@velocet.ca> Message-ID: <20020406113545.91938.qmail@web10504.mail.yahoo.com> Hi group, I was calculating the performance of my cluster. The features are 1. 8 nodes 2. Processor: AMD Athlon XP 1800+ 3. 8 CPUs 4. 8*1.5 GB DDR RAM 5. 1 Server with 2 processorts with AMD MP 1800+ and 2GB DDR RAM I calculated this to be 48 Mflops . Is this correct ? if not, what is the correct performance of my cluster. I also comparatively calculated that my cluster would be 3 times faster than AlphaServer DS20E ( 833 MHz alpha 64 bit processor, 4 GB max memory) Is my calculation correct or wrong? please help me ASAP. thanks in advance. cheers suraj. ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From sp at scali.com Thu Apr 4 18:08:32 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:18 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> Message-ID: On Thu, 4 Apr 2002, Jim Lux wrote: > What's high bandwidth? > What's low latency? > How much money do you want to spend? > > Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time you get > switches, cables, adapters, etc.) > Latency is kind of slow (compared to dedicated point to point links) > > Well this is a "touchy" topic since different people has different opinions. There is also different ways of measuring bandwidth mainly point to point (two machines talking together) and bisection (dividing your nework in half and let the one half talk to the other which kind of shows how the network scales with more nodes). Also some people like to talk about the hardware bandwidth and hardware latency, while the thing that really matters (IMHO) is application to application bandwidth and latency. I don't want to start a flamewar here, but I _think_ (not knowing real numbers for other high speed interconnects) that SCI has atleast the lowest latency and maybe also the highest point to point bandwidth : SCI application to application latency : 2.5 us SCI application to application bandwidth : 325 MByte/sec Note that these numbers are very chipset specific (as most high speed interconnect numbers are), these numbers are from IA64. Here are numbers from a popular IA32 platform, the AMD 760MPX : SCI application to application latency : 1.8 us SCI application to application bandwidth : 283 MByte/sec More "real" performance numbers using MPI over SCI (also collective and application benchmarks) can be located on Dolphin's homepage http://www.dolphinics.com Other popular high speed interconnects I know of is Myrinet (considered the main competitor to SCI for cluster interconnects) and Giganet. There are some performance numbers on Myricoms homepage (http://www.myricom.com) but I doubt if that is for their latest hardware generation (correct me if I'm wrong). Best regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From tony at MPI-SoftTech.Com Sun Apr 7 09:36:53 2002 From: tony at MPI-SoftTech.Com (Tony Skjellum) Date: Tue Mar 16 01:02:18 2010 Subject: Experience with GigE Switches with Jumbo packet support Message-ID: Any Beowulf folks out there have specific experience with switches that allow Jumbo packets? It seems hard to tell from online specs on various company pages whether a switch does this or not? Adapters seem to be readily available... Any clusters doing this right now? Thanks, Tony From s02.sbecker at wittenberg.edu Sun Apr 7 19:02:40 2002 From: s02.sbecker at wittenberg.edu (Shawn M Becker s02) Date: Tue Mar 16 01:02:18 2010 Subject: Scyld slave node boot problem Message-ID: <5.1.0.14.2.20020407220211.0196d080@mail.wittenberg.edu> I am using version 27bz-8 for the Scyld disk, with kernel version 2.2.19-12.beo. I have a 3c905b card in the slave and a 3c905 in the master. I am getting to the third phase of the boot for the slave to where it outputs the log file. Then the node hangs. Here is the log file for node.0... node_up: Setting system clock. node_up: TODO set interface netmask. node_up: Configuring loopback interface. node_up: Configuring PCI devices. setup_fs: Configuring node filesystems... setup_fs: Using /etc/beowulf/fstab setup_fs: Checking /dev/ram3 (type=ext2)... setup_fs: Hmmm...This appears to be a ramdisk. setup_fs: I'm going to try to try checking the filesystem (fsck) anyway. setup_fs: If it is a RAM disk the following will fail harmlessly. e2fsck 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 Couldn't find ext2 superblock, trying backup blocks... e2fsck The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 : Bad magic number in super-block while trying to open /dev/ram3 setup_fs: FSCK failure. (OK for RAM disks) setup_fs: Creating ext2 on /dev/ram3... mke2fs 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults) setup_fs: Checking 192.168.1.1:/home (type=nfs)... setup_fs: Mounting 192.168.1.1:/home on /rootfs//home... (type=nfs; options=nolock) mount: 192.168.1.1:/home failed, reason given by server: Permission denied Failed to mount 192.168.1.1:/home on /home. Can someone help? Thanks. Shawn ~~~~~~~~~~~~~~~~~~~ Shawn Becker Wittenberg University 930 N. Fountain Springfield, OH 45504 (937) 360-7562 ~~~~~~~~~~~~~~~~~~~ From wheeler.mark at ensco.com Mon Apr 8 04:56:07 2002 From: wheeler.mark at ensco.com (Wheeler.Mark) Date: Tue Mar 16 01:02:18 2010 Subject: PG Compilers Message-ID: <8986151694190742869D08450EE4DCDE0CF298@amu-exch.ensco.win> We are running pgf77 version 3.2-4 on a Linux cluster. For a standard FORTRAN WRITE statement with IOSTAT=IOS, I am getting a value of 5. In the section B.4 (runtime error messages) of the PGI User's Guide, I do not see values less than 201. Does anyone know what this error means? How can I determine what is causing this error? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020408/13d77c93/attachment.html From ron_chen_123 at yahoo.com Mon Apr 8 08:01:27 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:19 2010 Subject: FreeBSD port of SGE (Compute farm system) In-Reply-To: <20020405042351.86759.qmail@web14706.mail.yahoo.com> Message-ID: <20020408150127.20897.qmail@web14708.mail.yahoo.com> Patch and output attached. Also, I already found 1 problem -- somewhere in execd. It affects the process' priority in SGEEE mode. However, I've not fixed it yet, I just want to release the current patch ASAP to let people try it out. -Ron --- Ron Chen wrote: > Hi, > > I compiled the source, changed a few parameters, and > SGE finally runs on FreeBSD. It is running in > single- > user mode, with only 1 host. I am doing a little > clean > up, and then I will need to make sure my changes do > not affect others (by "#ifdef BSD"). > > It still does not get the correct system information > yet, but some of the job accounting info is there > (at > least run time is correct 8-) ). > > It is now running for several hours, it looks > stable. It ran several tens of jobs. "qstat", > "qhost", "qacct", "qconf", "qdel" look fine, output > makes sense (but need to implement the resource info > collecting routines). > > I will post the patches tomorrow, together with some > output of the commands. (I will be busy today) > > Also, I will move the discussion from the hackers > list > to the cluster@freebsd list. __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ -------------- next part -------------- Index: aimk =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/aimk,v retrieving revision 1.38 diff -u -6 -r1.38 aimk --- aimk 2002/02/22 13:23:59 1.38 +++ aimk 2002/04/05 17:54:10 @@ -1,7 +1,7 @@ -#!/bin/csh -fb +#!/bin/csh # # aimk # #___INFO__MARK_BEGIN__ ########################################################################## # @@ -78,12 +78,18 @@ case "crayts": set BUILDARCH = UNICOS_TS breaksw case "craytsieee": set BUILDARCH = UNICOS_TS_IEEE breaksw +case "darwin": + set BUILDARCH = DARWIN + breaksw +case "freebsd" + set BUILDARCH = FREEBSD + breaksw case "glinux": set BUILDARCH = LINUX6 breaksw case "hp10": set BUILDARCH = HP10 breaksw @@ -872,12 +878,97 @@ set GCC_NOERR_CXXFLAGS = "$CXXFLAGS" endif set SGE_NPROCS_CFLAGS = "$CFLAGS" breaksw + +case DARWIN: + set COMPILE_DC = 1 + if ( $USE_QMAKE == 0 ) then + set MAKE = make + endif + set OFLAG = "-O" + if ( "$CC" != insure ) then + set CC = cc + set CXX = c++ + else + set CFLAGS = "-Wno-error $CFLAGS" + set CXXFLAGS = "-Wno-error $CXXFLAGS" + set LIBS = "$LIBS" + endif + set DEPEND_FLAGS = "$CFLAGS $XMTINCD" + + set LD_LIBRARY_PATH = "/usr/lib" + + if ( $SHAREDLIBS == 1 ) then + set LIBEXT = ".dylib" + else + set LIBEXT = ".a" + endif + + set PTHRDSFLAGS = "-D_REENTRANT -D__USE_REENTRANT" + + if ( $DEBUGGED == 1) then + set DEBUG_FLAG = "-ggdb $INSURE_FLAG" + endif + if ( $GPROFFED == 1) then + set DEBUG_FLAG = "$DEBUG_FLAG -pg" + endif + + set ARFLAGS = rcv + set CFLAGS = "$OFLAG -Wall -Werror -D$BUILDARCH $DEBUG_FLAG $CFLAGS" + set CXXFLAGS = "$OFLAG -Werror -Wstrict-prototypes -D$BUILDARCH $DEBUG_FLAG $CXXFLAGS" + set NOERR_CFLAG = "-Wno-error" + set GCC_NOERR_CFLAGS = "$CFLAGS -Wno-error" + set GCC_NOERR_CXXFLAGS = "$CXXFLAGS -Wno-error" + set LFLAGS = "$DEBUG_FLAG $LFLAGS" + set LIBS = "$LIBS" + set RANLIB = "ranlib" + set XMTDEF = "" + set XINCD = "$XMTINCD $XINCD -I/usr/X11R6/include" + set XCFLAGS = "-Wno-strict-prototypes -Wno-error $XMTDEF $XINCD" + set XLIBD = "-L/usr/X11R6/lib" + set XLFLAGS = "$XLIBD" + set XLIBS = "-lXm -lXpm -lXt -lXext -lX11 -lSM -lICE -lXp" + + set SGE_NPROCS_CFLAGS = "$CFLAGS" + + breaksw + +case FREEBSD: + set COMPILE_DC = 1 + set MAKE = make + set OFLAG = "-O" + set ARFLAGS = rcv + if ( "$CC" != insure ) then + set CC = gcc + set CXX = g++ + else + set CFLAGS = "-Wno-error $CFLAGS" + set CXXFLAGS = "-Wno-error $CXXFLAGS" + set LIBS = "$LIBS" + endif + set DEPEND_FLAGS = "$CFLAGS $XMTINCD" + set PTHRDSFLAGS = "-D_REENTRANT -D__USE_REENTRANT" + set CFLAGS = "$OFLAG -Wall -D$BUILDARCH $DEBUG_FLAG $CFLAGS -I/usr/X11R6/include" + set CXXFLAGS = "$OFLAG -Wstrict-prototypes -D$BUILDARCH $DEBUG_FLAG $CXXFLAGS" + set NOERR_CFLAG = "-Wno-error" + set GCC_NOERR_CFLAGS = "$CFLAGS -Wno-error" + set GCC_NOERR_CXXFLAGS = "$CXXFLAGS -Wno-error" + set LFLAGS = "$DEBUG_FLAG $LFLAGS" + set LIBS = "$LIBS" + set XMTDEF = "" + set XINCD = "$XMTINCD $XINCD -I/usr/X11/include" + set XCFLAGS = "-Wno-strict-prototypes -Wno-error $XMTDEF $XINCD" + set XLIBD = "-L/usr/X11R6/lib" + set XLFLAGS = "$XLIBD" + set XLIBS = "-Xlinker -Bstatic -lXm -Xlinker -Bdynamic -lXpm -lXt -lXext -lX11 -lSM -lICE -lXp" + + set SGE_NPROCS_CFLAGS = "$CFLAGS" + breaksw case IRIX6*: set COMPILE_DC = 1 set ARCH = $IRIX_ARCHDEF #if (`hostname` != DWAIN) then # set MAKE = make Index: 3rdparty/sge_depend/Makefile =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/3rdparty/sge_depend/Makefile,v retrieving revision 1.1.1.1 diff -u -6 -r1.1.1.1 Makefile --- 3rdparty/sge_depend/Makefile 2001/07/18 11:06:07 1.1.1.1 +++ 3rdparty/sge_depend/Makefile 2002/04/05 17:54:11 @@ -53,11 +53,14 @@ ifparser.o: $(DEP_DIR)/ifparser.c $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/ifparser.c cppsetup.o: $(DEP_DIR)/cppsetup.c $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/cppsetup.c -include.o: $(DEP_DIR)/include.c +include.o: $(DEP_DIR)/include.c + @echo "CFLAGS" : $(CFLAGS) + @echo "MAIN_DEFINES" : $(MAIN_DEFINES) + @echo "DEP_DIR" : $(DEP_DIR) $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/include.c pr.o: $(DEP_DIR)/pr.c $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/pr.c Index: daemons/common/pdc.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/common/pdc.c,v retrieving revision 1.4 diff -u -6 -r1.4 pdc.c --- daemons/common/pdc.c 2002/02/24 13:41:30 1.4 +++ daemons/common/pdc.c 2002/04/05 17:54:11 @@ -114,13 +114,13 @@ #include #include #include #include "sge_stat.h" #endif -#if defined(LINUX) || defined(ALPHA) || defined(IRIX6) || defined(SOLARIS) +#if defined(LINUX) || defined(ALPHA) || defined(IRIX6) || defined(SOLARIS) || defined(FREEBSD) #include "sge_os.h" #endif #if defined(IRIX6) # define F64 "%lld" # define S64 "%lli" @@ -2041,13 +2041,13 @@ static time_t start_time; int psStartCollector(void) { static int initialized = 0; - int ncpus; + int ncpus = 0; #if defined(ALPHA) int start=0; #endif if (initialized) @@ -2069,13 +2069,13 @@ sysdata.sys_length = sizeof(sysdata); /* page size */ pagesize = getpagesize(); /* retrieve static parameters */ -#if defined(LINUX) || defined(ALINUX) || defined(IRIX6) || defined(SOLARIS) +#if defined(LINUX) || defined(ALINUX) || defined(IRIX6) || defined(SOLARIS) || defined(FREEBSD) ncpus = sge_nprocs(); #elif defined(ALPHA) { /* Number of CPUs */ ncpus = sge_nprocs(); #ifdef PDC_STANDALONE Index: daemons/common/procfs.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/common/procfs.c,v retrieving revision 1.3 diff -u -6 -r1.3 procfs.c --- daemons/common/procfs.c 2002/02/24 13:41:30 1.3 +++ daemons/common/procfs.c 2002/04/05 17:54:11 @@ -47,13 +47,15 @@ #include #endif #include #include #include -#include +#if 0 + #include +#endif #include #include #include #include #if defined(ALPHA) Index: daemons/execd/exec_job.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/execd/exec_job.c,v retrieving revision 1.20 diff -u -6 -r1.20 exec_job.c --- daemons/execd/exec_job.c 2002/02/24 13:41:34 1.20 +++ daemons/execd/exec_job.c 2002/04/05 17:54:12 @@ -408,13 +408,13 @@ static const char *get_sharedlib_path_name(void) { #if defined(AIX4) return "LIBPATH"; #elif defined(HP10) || defined(HP11) return "SHLIB_PATH"; -#elif defined(ALPHA) || defined(IRIX6) || defined(IRIX65) || defined(LINUX) || defined(SOLARIS) +#elif defined(ALPHA) || defined(IRIX6) || defined(IRIX65) || defined(LINUX) || defined(SOLARIS) ||defined(FREEBSD) return "LD_LIBRARY_PATH"; #else #error "don't know how to set shared lib path on this architecture" return NULL; /* never reached */ #endif } Index: daemons/execd/ptf.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/execd/ptf.c,v retrieving revision 1.15 diff -u -6 -r1.15 ptf.c --- daemons/execd/ptf.c 2002/02/24 13:41:35 1.15 +++ daemons/execd/ptf.c 2002/04/05 17:54:12 @@ -272,13 +272,13 @@ * static osjobid_t - os job id (job id / ash / supplementary gid) ******************************************************************************/ static osjobid_t ptf_get_osjobid(lListElem *osjob) { osjobid_t osjobid; -#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) +#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) && !defined(FREEBSD) osjobid = lGetUlong(osjob, JO_OS_job_ID2); osjobid = (osjobid << 32) + lGetUlong(osjob, JO_OS_job_ID); #else @@ -302,13 +302,13 @@ * INPUTS * lListElem *osjob - element of type JO_Type * osjobid_t osjobid - os job id (job id / ash / supplementary gid) ******************************************************************************/ static void ptf_set_osjobid(lListElem *osjob, osjobid_t osjobid) { -#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) +#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) && !defined(FREEBSD) lSetUlong(osjob, JO_OS_job_ID2, ((u_osjobid_t) osjobid) >> 32); lSetUlong(osjob, JO_OS_job_ID, osjobid & 0xffffffff); #else @@ -907,13 +907,13 @@ { lListElem *job, *osjob = NULL; lCondition *where; DENTER(TOP_LAYER, "ptf_get_job_os"); -#if defined(LINUX) || defined(SOLARIS) || defined(ALPHA5) || defined(NECSX4) || defined(NECSX5) +#if defined(LINUX) || defined(SOLARIS) || defined(ALPHA5) || defined(NECSX4) || defined(NECSX5) || defined(FREEBSD) where = lWhere("%T(%I == %u)", JO_Type, JO_OS_job_ID, (u_long32) os_job_id); #else where = lWhere("%T(%I == %u && %I == %u)", JO_Type, JO_OS_job_ID, (u_long) (os_job_id & 0xffffffff), JO_OS_job_ID2, (u_long) (((u_osjobid_t) os_job_id) >> 32)); #endif Index: daemons/shepherd/setrlimits.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/shepherd/setrlimits.c,v retrieving revision 1.5 diff -u -6 -r1.5 setrlimits.c --- daemons/shepherd/setrlimits.c 2002/02/24 13:41:43 1.5 +++ daemons/shepherd/setrlimits.c 2002/04/05 17:54:12 @@ -45,14 +45,19 @@ #endif #if defined(HP10_01) || defined(HPCONVEX) # define _KERNEL #endif -#include +#if defined(FREEBSD) +#include +#endif +#if 0 +#include +#endif #if defined(HP10_01) || defined(HPCONVEX) # undef _KERNEL #endif #if defined(IRIX6) # define RLIMIT_STRUCT_TAG rlimit64 @@ -403,13 +408,13 @@ /* hard limit must be greater or equal to soft limit */ if (rlp->rlim_max < rlp->rlim_cur) rlp->rlim_cur = rlp->rlim_max; #if defined(LINUX) || ( defined(SOLARIS) && !defined(SOLARIS64) ) || defined(NECSX4) || defined(NECSX5) # define limit_fmt "%ld" -#elif defined(IRIX6) || defined(HP11) || defined(HP10) +#elif defined(IRIX6) || defined(HP11) || defined(HP10) || defined(FREEBSD) # define limit_fmt "%lld" #elif defined(ALPHA) || defined(SOLARIS64) # define limit_fmt "%lu" #else # define limit_fmt "%d" #endif Index: dist/util/arch =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/dist/util/arch,v retrieving revision 1.7 diff -u -6 -r1.7 arch --- dist/util/arch 2002/01/29 14:58:56 1.7 +++ dist/util/arch 2002/04/05 17:54:12 @@ -44,12 +44,32 @@ # PATH=/bin:/usr/bin:/usr/sbin ARCH=UNKNOWN +if [ -x /usr/bin/uname ]; then + os="`/usr/bin/uname -s`" + ht="`/usr/bin/uname -m`" + osht="$os,$ht" + case $osht in + Darwin,*) + ARCH=darwin + ;; + FreeBSD,*) + ARCH=freebsd + ;; + OpenBSD,*) + ARCH=freebsd + ;; + NetBSD,*) + ARCH=freebsd + ;; + esac +fi + if [ -x /bin/uname ]; then os="`/bin/uname -s`" ht="`/bin/uname -m`" osht="$os,$ht" case $osht in SUPER-UX,SX-4*) Index: libs/comm/commlib.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/comm/commlib.c,v retrieving revision 1.7 diff -u -6 -r1.7 commlib.c --- libs/comm/commlib.c 2002/02/27 08:14:45 1.7 +++ libs/comm/commlib.c 2002/04/05 17:54:13 @@ -2063,12 +2063,14 @@ sigdelset(&mask, SIGILL); sigdelset(&mask, SIGQUIT); sigdelset(&mask, SIGURG); sigdelset(&mask, SIGIO); sigdelset(&mask, SIGSEGV); sigdelset(&mask, SIGFPE); + +#define SIGCLD SIGCHLD /* Same as SIGCHLD (System V). */ sigaddset(&mask, SIGCLD); sigprocmask(SIG_SETMASK, &mask, NULL); return omask; } #endif Index: libs/rmon/rmon_semaph.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/rmon/rmon_semaph.c,v retrieving revision 1.2 diff -u -6 -r1.2 rmon_semaph.c --- libs/rmon/rmon_semaph.c 2001/07/20 08:21:38 1.2 +++ libs/rmon/rmon_semaph.c 2002/04/05 17:54:13 @@ -53,13 +53,13 @@ #include "msg_rmon.h" #define BIGCOUNT 10000 /* initial value of process counter */ /* * Define the semaphore operation arrays for the semop() calls. */ -#if defined(bsd4_2) || defined(MACH) || defined(__hpux) || defined(_AIX) || defined(SOLARIS) || defined(SINIX) || (defined(LINUX) && defined(_SEM_SEMUN_UNDEFINED)) +#if defined(bsd4_2) || defined(MACH) || defined(__hpux) || defined(_AIX) || defined(SOLARIS) || defined(SINIX) || (defined(LINUX) && defined(_SEM_SEMUN_UNDEFINED)) union semun { int val; /* value for SETVAL */ struct semid_ds *buf; /* buffer for IPC_STAT & IPC_SET */ ushort *array; /* array for GETALL & SETALL */ }; Index: libs/sched/sort_hosts.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/sched/sort_hosts.c,v retrieving revision 1.6 diff -u -6 -r1.6 sort_hosts.c --- libs/sched/sort_hosts.c 2001/12/17 15:09:38 1.6 +++ libs/sched/sort_hosts.c 2002/04/05 17:54:13 @@ -31,16 +31,12 @@ /*___INFO__MARK_END__*/ #include #include #include #include -#ifndef WIN32 -# include -#endif - #include "sgermon.h" #include "sge.h" #include "sge_gdi_intern.h" #include "cull.h" #include "sge_all_listsL.h" #include "sge_parse_num_par.h" Index: libs/uti/sge_arch.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/uti/sge_arch.c,v retrieving revision 1.10 diff -u -6 -r1.10 sge_arch.c --- libs/uti/sge_arch.c 2002/02/24 13:41:51 1.10 +++ libs/uti/sge_arch.c 2002/04/05 17:54:13 @@ -85,20 +85,22 @@ #elif defined(ALINUX) # define ARCHBIN "alinux" #elif defined(LINUX5) # define ARCHBIN "linux" #elif defined(LINUX6) # define ARCHBIN "glinux" +#elif defined(FREEBSD) +# define ARCHBIN "freebsd" #elif defined(SLINUX) # define ARCHBIN "slinux" #elif defined(CRAY) # if defined(CRAYTSIEEE) # define ARCHBIN "craytsieee" -# elif defined(CRAYTS) +#elif defined(CRAYTS) # define ARCHBIN "crayts" -# else +#else # define ARCHBIN "cray" # endif #elif defined(NECSX4) # define ARCHBIN "necsx4" #elif defined(NECSX5) # define ARCHBIN "necsx5" Index: libs/uti/sge_getloadavg.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/uti/sge_getloadavg.c,v retrieving revision 1.6 diff -u -6 -r1.6 sge_getloadavg.c --- libs/uti/sge_getloadavg.c 2002/02/24 13:41:54 1.6 +++ libs/uti/sge_getloadavg.c 2002/04/05 17:54:13 @@ -600,12 +600,56 @@ } #endif DEXIT; return cpu_load; } + +#elif defined(FREEBSD) + +double get_cpu_load() +{ + return 0.0; +} + #elif defined(LINUX) static char* skip_token( char *p ) { while (isspace(*p)) { @@ -833,12 +877,38 @@ loadavg[2] /= cpus; return 3; } else { return -1; } } +#elif defined(FREEBSD) + +static int get_load_avg( +double loadavg[], +int nelem +) { + + return 0; + +} #elif defined(LINUX) static int get_load_avg( double loadv[], int nelem @@ -1075,13 +1145,13 @@ int nelem ) { int elem = 0; #if defined(SOLARIS64) elem = getloadavg(loadavg, nelem); /* <== library function */ -#elif (defined(SOLARIS) && !defined(SOLARIS64)) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) || defined(CRAY) || defined(NECSX4) || defined(NECSX5) || defined(LINUX) +#elif (defined(SOLARIS) && !defined(SOLARIS64)) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) || defined(CRAY) || defined(NECSX4) || defined(NECSX5) || defined(LINUX) ||defined(FREEBSD) elem = get_load_avg(loadavg, nelem); #else elem = -1; #endif if (elem != -1) { elem = nelem; Index: libs/uti/sge_getloadavg.h =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/uti/sge_getloadavg.h,v retrieving revision 1.3 diff -u -6 -r1.3 sge_getloadavg.h --- libs/uti/sge_getloadavg.h 2001/10/20 14:47:28 1.3 +++ libs/uti/sge_getloadavg.h 2002/04/05 17:54:13 @@ -29,17 +29,17 @@ * * All Rights Reserved. * ************************************************************************/ /*___INFO__MARK_END__*/ -#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(CRAY) || defined(NEXSX4) || defined(NECSX5) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) -# define SGE_LOADAVG +#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(CRAY) || defined(NEXSX4) || defined(NECSX5) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(FREEBSD) +#define SGE_LOADAVG #endif -#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) +#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) || defined(FREEBSD) # define SGE_LOADCPU #endif #ifdef SGE_LOADAVG int sge_getloadavg(double loadavg[], int nelem); Index: scripts/distinst =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/scripts/distinst,v retrieving revision 1.21 diff -u -6 -r1.21 distinst --- scripts/distinst 2002/01/28 08:57:01 1.21 +++ scripts/distinst 2002/04/05 17:54:13 @@ -52,23 +52,23 @@ HASDIR="ckpt doc examples/jobs locale mpi pvm qmon/PIXMAPS/big qmon/locale" HASARCHDIR="bin lib examples/jobsbin utilbin" DEFAULTPROG="sge_qmaster sge_execd sge_shadowd sge_commd sge_schedd \ sge_shepherd sge_coshepherd qstat qsub qalter qconf qdel \ - qacct qmod qsh commdcntl utilbin jobs qmon qhold qrls qhost \ - qmake qtcsh" + qacct qmod qsh commdcntl utilbin jobs qhold qrls qhost \ + " UTILITYBINARIES="uidgid gethostname gethostbyname gethostbyaddr \ getservbyname filestat checkprog loadcheck now checkuser \ adminrun qrsh_starter testsuidroot openssl" REMOTEBINARIES="rsh rshd rlogin" SUPPORTEDARCHS="aix42 aix43 alinux cray crayts craytsieee glinux hp10 \ -hp11 irix6 necsx4 necsx5 slinux solaris solaris64 solaris86 osf4 tru64" +hp11 irix6 necsx4 necsx5 slinux solaris solaris64 solaris86 osf4 tru64 freebsd" #SGEEE_UTILITYBINARIES="sge_share_mon sge_host_mon" SGEEE_UTILITYBINARIES="sge_share_mon" JOBBINARIES="work" @@ -161,12 +161,14 @@ elif [ $i = hp11 ]; then ARCHBIN=HP11 elif [ $i = irix6 ]; then ARCHBIN=IRIX6 elif [ $i = glinux ]; then ARCHBIN=LINUX6 + elif [ $i = freebsd ]; then + ARCHBIN=FREEBSD elif [ $i = alinux ]; then ARCHBIN=ALINUX elif [ $i = slinux ]; then ARCHBIN=SLINUX elif [ $i = osf4 ]; then ARCHBIN=ALPHA4 @@ -655,143 +657,13 @@ if [ $instexamples = true ]; then echo Installing \"examples/jobs\" Execute rm -f $DEST_SGE_ROOT/examples/jobs/* Execute cp dist/examples/jobs/*.sh $DEST_SGE_ROOT/examples/jobs fi - if [ $instqmon = true ]; then - echo Copying Pixmaps and Qmon resource file - - Execute rm -f $DEST_SGE_ROOT/qmon/PIXMAPS/*.xpm - Execute rm -f $DEST_SGE_ROOT/qmon/PIXMAPS/big/*.xpm - Execute cp dist/qmon/PIXMAPS/small/*.xpm $DEST_SGE_ROOT/qmon/PIXMAPS - Execute cp dist/qmon/PIXMAPS/big/toolbar*.xpm $DEST_SGE_ROOT/qmon/PIXMAPS/big - - Execute chmod 644 $DEST_SGE_ROOT/qmon/PIXMAPS/*.xpm - Execute chmod 644 $DEST_SGE_ROOT/qmon/PIXMAPS/big/*.xpm - - Execute cp dist/qmon/Qmon $DEST_SGE_ROOT/qmon/Qmon - Execute chmod 644 $DEST_SGE_ROOT/qmon/Qmon - - Execute cp dist/qmon/qmon_help.ad $DEST_SGE_ROOT/qmon - Execute chmod 644 $DEST_SGE_ROOT/qmon/qmon_help.ad - - ( echo changing to $DEST_SGE_ROOT/qmon/PIXMAPS ; \ - cd $DEST_SGE_ROOT/qmon/PIXMAPS; \ - echo ln -s intro-sge.xpm intro.xpm; \ - ln -s intro-sge.xpm intro.xpm; \ - echo ln -s logo-sge.xpm logo.xpm; \ - ln -s logo-sge.xpm logo.xpm \ - ) - fi - - if [ $instpvm = true ]; then - echo Installing \"pvm\" - Execute rm -rf $DEST_SGE_ROOT/pvm/* - Execute mkdir $DEST_SGE_ROOT/pvm/src - - for f in $PVMSCRIPTS; do - Execute cp dist/pvm/$f $DEST_SGE_ROOT/pvm - done - chmod 755 $DEST_SGE_ROOT/pvm/*.sh - - for f in $PVMSOURCES; do - Execute cp dist/pvm/src/$f $DEST_SGE_ROOT/pvm/src - done - - for f in $PVMSRCSCRIPTS; do - Execute cp dist/pvm/src/$f $DEST_SGE_ROOT/pvm/src - chmod 755 $DEST_SGE_ROOT/pvm/src/$f - done - fi - - if [ $instmpi = true ]; then - echo Installing \"mpi/\" - rm -rf $DEST_SGE_ROOT/mpi/* - for f in $MPIFILES; do - Execute cp dist/mpi/$f $DEST_SGE_ROOT/mpi - done - chmod 755 $DEST_SGE_ROOT/mpi/*.sh $DEST_SGE_ROOT/mpi/hostname $DEST_SGE_ROOT/mpi/rsh - - HPCBASE=mpi/sunhpc/loose-integration - Execute mkdir -p $DEST_SGE_ROOT/$HPCBASE/accounting - - for f in $SUNHPC_FILES; do - Execute cp dist/$HPCBASE/$f $DEST_SGE_ROOT/$HPCBASE - Execute chmod 644 $DEST_SGE_ROOT/$HPCBASE/$f - done - - for f in $SUNHPC_SCRIPTS; do - Execute cp dist/$HPCBASE/$f $DEST_SGE_ROOT/$HPCBASE - Execute chmod 755 $DEST_SGE_ROOT/$HPCBASE/$f - done - - for f in $SUNHPCACCT_FILES; do - Execute cp dist/$HPCBASE/accounting/$f $DEST_SGE_ROOT/$HPCBASE/accounting - Execute chmod 644 $DEST_SGE_ROOT/$HPCBASE/accounting/$f - done - - for f in $SUNHPCACCT_SCRIPTS; do - Execute cp dist/$HPCBASE/accounting/$f $DEST_SGE_ROOT/$HPCBASE/accounting - Execute chmod 755 $DEST_SGE_ROOT/$HPCBASE/accounting/$f - done - fi - - if [ $instman = true ]; then - echo Installing \"man/\" and \"catman/\" - Execute rm -rf $DEST_SGE_ROOT/man $DEST_SGE_ROOT/catman - Execute cp -r MANSBUILD_$SGE_PRODUCT_MODE/SEDMAN/man $DEST_SGE_ROOT - Execute cp -r MANSBUILD_$SGE_PRODUCT_MODE/ASCMAN/catman $DEST_SGE_ROOT - fi - - if [ $instdoc = true ]; then - echo Installing \"doc/\" - echo " --> PS and PDF files" - Execute rm -rf $DEST_SGE_ROOT/doc - Execute mkdir $DEST_SGE_ROOT/doc - Execute cp $MANUALPDF $DEST_SGE_ROOT/doc/SGE53beta2_doc.pdf - fi - # this rule must come *after* the "instdoc" rule - # - if [ $insttxtdoc = true ]; then - echo "Installing README, INSTALL ... files" - Execute cp ../doc/*.asc $DEST_SGE_ROOT/doc - Execute cp ../doc/INSTALL $DEST_SGE_ROOT/doc - Execute cp ../doc/UPGRADE-2-53 $DEST_SGE_ROOT/doc/UPGRADE - Execute chmod 644 $DEST_SGE_ROOT/doc/* - fi - - if [ $instckpt = true ]; then - echo Installing \"ckpt/\" - Execute rm -rf $DEST_SGE_ROOT/ckpt/* - cp dist/ckpt/* $DEST_SGE_ROOT/ckpt - chmod 755 $DEST_SGE_ROOT/ckpt/*_command - fi - - if [ $instlocale = true ]; then - echo "Installing \"locale/\" and \"qmon/locale/\"" - Execute cp -r locale/* $DEST_SGE_ROOT/locale - Execute rm -rf $DEST_SGE_ROOT/qmon/locale/* - Execute cp -r dist/qmon/locale/* $DEST_SGE_ROOT/qmon/locale - fi - - if [ $instsec = true ]; then - echo Installing \"security\" modules - Execute mkdir -p $DEST_SGE_ROOT/security - for f in $SECFILES; do - Execute cp $f $DEST_SGE_ROOT/security - fb=`basename $f` - if [ -x $DEST_SGE_ROOT/security/$fb ]; then - chmod 755 $DEST_SGE_ROOT/security/$fb - else - chmod 644 $DEST_SGE_ROOT/security/$fb - fi - done - Execute ln -s gss_customer.html $DEST_SGE_ROOT/security/README.html - fi # Set file and directory permissions to 755/644 and owner to 0.0 if [ $setfileperm = true ]; then echo Setting file permissions SetFilePerm $DEST_SGE_ROOT fi @@ -820,13 +692,13 @@ echo "Installing binaries for $i from `pwd` -->" echo " --> $DEST_SGE_ROOT/bin/$i" echo ------------------------------------------------------------------------ for prog in $PROG; do case $prog in - jobs|ckpt|locale|doc|inst_sge|utiltree|examples|man|mpi|pvm|qmontree|common|distcommon|utilbin) + jobs|ckpt|locale|doc|inst_sge|utiltree|examples|man|mpi|pvm|common|distcommon|utilbin) : ;; qmake) echo Installing qmake Install 0.0 755 ../3rdparty/qmake/$ARCHBIN/make $DEST_SGE_ROOT/${UTILPREFIX}/$DSTARCH/qmake ;; -------------- next part -------------- > qstat > qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS -------------------------------------------------------------------------------- global - - - - - - - host1 freebsd 1 0.00 - - - - > cat s #!/bin/sh sleep 10 echo "Hello" exit 2 > qsub s your job 11 ("s") has been submitted > qstat job-ID prior name user state submit/start at queue master ja-task-ID --------------------------------------------------------------------------------------------- 11 0 s ron qw 04/05/2002 12:04:07 > cat s.o11 Hello > qacct -j 11 ============================================================== qname host1.q hostname host1 group UNKNOWN owner ron jobname s jobnumber 11 taskid undefined account sge priority 0 qsub_time Fri Apr 5 12:04:07 2002 start_time Fri Apr 5 12:04:10 2002 end_time Fri Apr 5 12:04:20 2002 granted_pe none slots 1 failed 0 exit_status 2 ru_wallclock 10 ru_utime 0 ru_stime 0 ru_maxrss 916 ru_ixrss 808 ru_ismrss 0 ru_idrss 488 ru_isrss 256 ru_minflt 361 ru_majflt 0 ru_nswap 0 ru_inblock 0 ru_oublock 1 ru_msgsnd 17 ru_msgrcv 17 ru_nsignals 5 ru_nvcsw 29 ru_nivcsw 5 cpu 0 mem 0.000 io 0.000 iow 0.000 maxvmem 0.000000 From christon at pluto.dsu.edu Tue Apr 9 07:40:15 2002 From: christon at pluto.dsu.edu (Christoffersen, Neils) Date: Tue Mar 16 01:02:19 2010 Subject: Beowulf -- confirmation of subscription -- request 851822 Message-ID: <0718ABB23368D2119FC200008362AF6816BDD7@pluto.dsu.edu> -----Original Message----- From: beowulf-request@beowulf.org To: christon@pluto.dsu.edu Sent: 4/9/02 9:35 AM Subject: Beowulf -- confirmation of subscription -- request 851822 Beowulf -- confirmation of subscription -- request 851822 We have received a request from 138.247.172.98 for subscription of your email address, , to the beowulf@beowulf.org mailing list. To confirm the request, please send a message to beowulf-request@beowulf.org, and either: - maintain the subject line as is (the reply's additional "Re:" is ok), - or include the following line - and only the following line - in the message body: confirm 851822 (Simply sending a 'reply' to this message should work from most email interfaces, since that usually leaves the subject line in the right form.) If you do not wish to subscribe to this list, please simply disregard this message. Send questions to beowulf-admin@beowulf.org. From eugen at leitl.org Wed Apr 10 03:52:30 2002 From: eugen at leitl.org (Eugen Leitl) Date: Tue Mar 16 01:02:19 2010 Subject: CCL:parallel quantum solutions (fwd) Message-ID: ---------- Forwarded message ---------- Date: Tue, 09 Apr 2002 10:07:54 -0400 From: David J Giesen To: Dr. Bill Davis , CHEMISTRY@ccl.net Subject: CCL:parallel quantum solutions Bill - This is a long post, but I think vendors that do excellent jobs should get pats on the back. I am not a PQS employee nor do I receive new-customer kick-backs. Those not interested in a review of PQS products can hit delete now. I've been a very satisfied PQS customer since Aug. 2000. We purchased an 8-processor Linux cluster at that point, and several months later we were so pleased we bought a second. At the end of last year, we contracted with them to build a 34-processor cluster for general computational (not just chemistry) use. Hardware performance : The setup they use works well for running serial or parallel codes, and I have PQS, Jaguar and Gaussian (using LINDA) running in parallel on them. Based on timings against other machines/platforms, the PQS machines perform as well as could be expected. Our 1.2 GHz athlon PQS machine runs G98 slightly (~10%) faster than the latest-and-greatest-just-off-the-design-sheet Sun hardware, and 2-3 times faster than our SGI 194 MHz R10000. It is ~10% slower than a 1.5GHz P4. Both the Athlon and P4 machine used PIII optimized blas libraries... Software performance : the PQS software 'is what it is'. If you are interested in mainly HF, MP2 and DFT computations, it is very good. You can see its capabilities on their website. Speed-wise, it runs faster than other codes I use, although it is not faster than Jaguar's pseudo-spectral methods. The geometry optimizer is rock solid dependable as one would expect from code by Pulay and Baker. PQS uses PVM for parallel execution. Without getting into a debate about parallel paradigms, I'll say simply this: in our hands, I have never had a PVM job die because of inter-process communication problems while MPICH/MPI is very flaky and tends to die on about 10-25% of chemistry jobs (even more for systems using automount) independent of linux, sun or SGI. Because PQS uses PVM to set up the parallel system only once per job, there is less parallel overhead using PQS than with other codes that set up LINDA parallel systems at every SCF and geometry optimization step - although for large jobs, these both essentially go to zero. Support : PQS is a small company, and the support shows it. They have absolutely bent over backwards each time we have had an issue, and dealing with them is always a pleasure. We have not had a hardware or PQS software issue that they haven't resolved to our satisfaction. In fairness, you can't expect a 24-hour help line or technicians in suits to fly in and fix problems. You should be aware that they are not in the business of selling/supporting Linux or Gnu software, so some problems you have on your machine if you veer off the PQS path might be technically out of their scope. However, in my experience, they make every honest effort to solve those as well (and usually do). Every machine they shipped us has been stress-tested by an expert for a number of days before they are delivered. Ease of use : The machines come setup to run the PQS chemistry code out of the box. If you are planning on running one PQS parallel job at a time across the whole cluster or multiple serial jobs, the included DQS (not associated with PQS) queuing system works OK. Running multiple parallel codes/jobs on the same cluster through the queue does not work well. Running other codes in parallel through the queue takes some hard work. Setting up other parallel codes also takes some work. This is not really a function of PQS, however, and you'll find this is true no matter what machine you get. Disclaimer : This e-mail does not in any way imply an 'official Kodak' stance, it is merely the personal opinion of a Kodak employee who uses PQS products at work. Dave Dr. Bill Davis wrote: > > Hi! > > Does anyone have any experience with the PQS hardware/software > combination, more specifically the QS4-1800S? Any comments on ease of > use, support and any other important points would be greatly > appreciated...Thanks! > > Bill > > > -- > ********************************** > Dr. William M. Davis > Assistant Professor of Chemistry/ > Phi Theta Kappa Advisor > Dept. of Chemistry and Environmental Science > University of Texas at Brownsville > 80 Fort Brown > Brownsville, TX 78520 > Phone: (956) 574-6646 > Fax: (956) 574-6692 > WWW: unix.utb.edu/~bdavis > ********************************** > > -- Dr. David J. Giesen Eastman Kodak Company david.giesen@kodak.com 2/83/RL MC 02216 (ph) 1-585-58(8-0480) Rochester, NY 14650 (fax)1-585-588-1839 -= This is automatically added to each message by mailing script =- CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu From fraser5 at cox.net Wed Apr 10 05:24:05 2002 From: fraser5 at cox.net (Jim Fraser) Date: Tue Mar 16 01:02:19 2010 Subject: MPICH: works for users not root? Message-ID: <003801c1e08a$9b6d8840$0300005a@papabear> I have had this pesky problem of running mpi using the bash shell and trying to figure out how to get it to work for root. It works fine for all the users but not root. As root I can rsh to any node ok but if I do a rsh node2 -n true then I get a permission denied. Again, it works for normal users. I have gutted the .bashrc and /etc/bashrc scripts and the .rhosts seem ok. what could be the problem? (Linux 7.2) thanks jim From rastapoppolous at yahoo.com Wed Apr 10 22:29:24 2002 From: rastapoppolous at yahoo.com (k r) Date: Tue Mar 16 01:02:19 2010 Subject: Scyld and mpi fasta Makefile Problems Message-ID: <20020411052924.33098.qmail@web9009.mail.yahoo.com> hello all, I can't seem to get the included Makefile (Makefile.mpi4) for FASTA to compile. When i compile i get the following error. mm_file.h:25: conflicting types for `int64_t' types.h:172: previous declaration of `int64_t' I did not make any changes to the Makefile. any help is appreciated. Thanks, Kart __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From siegert at sfu.ca Wed Apr 10 22:43:28 2002 From: siegert at sfu.ca (Martin Siegert) Date: Tue Mar 16 01:02:19 2010 Subject: MPICH: works for users not root? In-Reply-To: <003801c1e08a$9b6d8840$0300005a@papabear>; from fraser5@cox.net on Wed, Apr 10, 2002 at 08:24:05AM -0400 References: <003801c1e08a$9b6d8840$0300005a@papabear> Message-ID: <20020410224328.A19551@stikine.ucs.sfu.ca> On Wed, Apr 10, 2002 at 08:24:05AM -0400, Jim Fraser wrote: > I have had this pesky problem of running mpi using the bash shell and trying > to figure out how to get it to work for root. It works fine for all the > users but not root. As root I can rsh to any node ok but if I do a rsh > node2 -n true then I get a permission denied. Again, it works for normal > users. I have gutted the .bashrc and /etc/bashrc scripts and the .rhosts > seem ok. what could be the problem? (Linux 7.2) > > thanks > jim The least that you need is a line "rsh" in /etc/securetty. Hope this helps. Cheers, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From rastapoppolous at yahoo.com Wed Apr 10 22:44:14 2002 From: rastapoppolous at yahoo.com (k r) Date: Tue Mar 16 01:02:19 2010 Subject: mpi fasta Makefile Problems Message-ID: <20020411054414.34011.qmail@web9009.mail.yahoo.com> hello all, I can't seem to get the included Makefile (Makefile.mpi4) for FASTA to compile on a beowulf cluster. When i compile i get the following error. mm_file.h:25: conflicting types for `int64_t' types.h:172: previous declaration of `int64_t' I did not make any changes to the Makefile. any help is appreciated. Thanks, Kart __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From math at velocet.ca Wed Apr 10 23:07:43 2002 From: math at velocet.ca (Velocet) Date: Tue Mar 16 01:02:19 2010 Subject: Linux Software RAID5 Performance In-Reply-To: ; from mikeprinkey@hotmail.com on Wed, Apr 03, 2002 at 02:49:00PM -0500 References: Message-ID: <20020411020739.W19272@velocet.ca> On Wed, Apr 03, 2002 at 02:49:00PM -0500, Michael Prinkey's all... > Indeed, the multiple processes accessing the device made significantly > degrade performance. Fortunately for us, as well, access speed is limited > by the NFS/SMB and the network, not by array performance. Unfortunately, > the unit is online now and I can't fiddle around with the settings and test > it further. > > WRT reliability, we have seen the array drop to degraded mode because of a > single drive failure. We have also a single drive take down the entire IDE > port. This results in the md device disappearing until you swap out the > offending drive and restart the array. There is no data here. Usually one > drive goes and the array goes into degraded mode and starts reconstructing > on the spare. Then the second goes and the array disappears. It is a bit > disconcerting to do ls /raid and get nothing back. Changing out the drive > and restarting pulls everything back. > > I can honestly say that the only data loss that I have had on these arrays > came when a maintenance person completely unplugged one of the arrays from > the UPS. It caused low-level corruption on 5 of the 9 drives in the array. > We ended up using a Windows 98 boot floppy with Maxtor's Powermax utility to > patch them all back up. It took many hours. This is the WORST possible > scenario, BTW. Even reseting the system gives the EIDE devices a chance to > flush their caches and maintain low-level integrity. Cutting the power can > leave the array/drives inconsistent on the filesystem, device (/dev/md0), > and hardware-format datagram levels. So, lock your arrays in a cabinet! 8) ok get an EIDE RAID controller with battery backed-up ram onboard. We pulled a bunch of the SCSI equiv of such from a netfinity server a customer pawned off on us. Rather nice. (anyone want to buy? :) /kc > > Mike > > >From: Jurgen Botz > >To: mprinkey@aeolusresearch.com (Michael Prinkey) > >CC: beowulf@beowulf.org > >Subject: Re: Linux Software RAID5 Performance > >Date: Wed, 03 Apr 2002 10:25:31 -0800 > > > >Michael Prinkey wrote: > > > Again, performance (see below) is remarkably good, especially > >considering > > > all of the strikes against this configuration: EIDE instead of SCSI, > >UDMA66 > > > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave > >drives on > > > each port instead of a single drive per port. > > > >With regard to the master/slave config... I note that your performance > >test is a single reader/writer... in this config with RAID5 I would > >expect the performance to be quite good even with 2 drives per IDE > >controller. But if you have several processes doing disk I/O > >simultaneously you should see a rather more precipitous drop in > >performance than you would with a single drive per IDE controller. > >I'm working on testing a very similar config right now and that's > >one of my findings (which I had expected) but our application for this > >is not very performance sensitive so it's not a big deal. > > > >A more important issue for me is reliability, and I'm somewhat > >concerned about failure modes. For example, can an IDE drive fail > >in such a way that if will disable the controller or the other > >drive on the same controller? If so, that would seriously limit > >the usefulness of RAID5 in this config. In general how good is > >Linux software RAID's failure handling? Etc. > > > >:j > > > > > >-- > >J?rgen Botz | While differing widely in the various > >jurgen@botz.org | little bits we know, in our infinite > > | ignorance we are all equal. -Karl > >Popper > > > > > >_______________________________________________ > >Beowulf mailing list, Beowulf@beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: > http://photos.msn.com/support/worldwide.aspx > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From ron_chen_123 at yahoo.com Thu Apr 11 00:53:52 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:19 2010 Subject: Lost cycles due to PBS (was Re: Uptime data/studies/anecdotes) In-Reply-To: <20020402140934.A29446@getafix.EraGen.com> Message-ID: <20020411075352.26837.qmail@web14703.mail.yahoo.com> --- Chris Black wrote: > On Tue, Apr 02, 2002 at 12:46:07PM -0600, Roger L. > Smith wrote: > > On Tue, 2 Apr 2002, Richard Walsh wrote: > [stuff deleted] > > PBS is our leading cause of cycle loss. We now > run a cron job on the > > headnode that checks every 15 minutes to see if > the PBS daemons have died, > > and if so, it automatically restarts them. About > 75% of the time that I > > have a node fail to accept jobs, it is because its > pbs_mom has died, not > > because there is anything wrong with the node. > > > > We used to have the same problem with PBS, > especially when many jobs were > in the queue. At that point sometimes the pbs master > died as well. > Since we've switched to SGE/GridEngine/CODINE I've > been MUCH happier. > Plus there are lots of nifty things you can do with > the expandibility of > writing your own load monitors via shell scripts and > such. > The whole point of this post is: > GNQS < PBS < Sun Gridengine :) > > Chris (who tried two other batch schedulers until > settling on SGE) > I also have similar experience -- I tried PBS, it is hard to install, and there are not much scheduling policies -- but it is hard to config. Then I read the news about SGE, and since it does not require root access to install/run, I gave it a try. I did an experience a few weeks ago -- submitting over 30,000 "sleep jobs" to SGE, and it did not die! If the master host is down, another machine takes over, so there is not lost of computing power. I think SGE 5.3 is better than anything available. I tried commerical DRM systems, other open source packages, but so far SGE is by far the best. BTW, Chris, how many nodes are there in your cluster? -Ron P.S. I'm doing a port of SGE to FreeBSD, hope people find it useful __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From keithu at parl.clemson.edu Thu Apr 11 05:11:23 2002 From: keithu at parl.clemson.edu (Keith Underwood) Date: Tue Mar 16 01:02:19 2010 Subject: Experience with GigE Switches with Jumbo packet support In-Reply-To: Message-ID: Most of the Extreme switches support Jumbo packets. The new line of products from Foundry Networks (JetCore is what I think they call it) is supposed to support Jumbo packets (even for Fast Ethernet from the way I read the spec, if you could find a card to do it). Keith On Sun, 7 Apr 2002, Tony Skjellum wrote: > Any Beowulf folks out there have specific experience with switches that allow > Jumbo packets? It seems hard to tell from online specs on various company > pages whether a switch does this or not? > > Adapters seem to be readily available... > > Any clusters doing this right now? > > Thanks, > Tony > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu@parl.clemson.edu Clemson University From wonglijie at yahoo.com Thu Apr 11 06:00:08 2002 From: wonglijie at yahoo.com (Li Jie) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) Message-ID: <20020411130008.63691.qmail@web9608.mail.yahoo.com> hi may i know if anyone here can provide detailed information on how to start a a beowulf? i have about 9 machinese with Pentium MMX 233 Mhz processors, 128 mb RAM, 2 x 1.99 Gb. HDD I am also considering various designs and this is a school project. Thanks for your help! lijie [2002] 7540832 [2 Cor. 5:7] We live by faith and not by sight. --------------------------------- Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020411/fd1e3d99/attachment.html From rgb at phy.duke.edu Thu Apr 11 06:18:39 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help In-Reply-To: Message-ID: On Thu, 4 Apr 2002, Adrian Garcia Garcia wrote: > Ok I understand most of the tips, but I have some doubts about the domain > name, I used the domain name "cluster.org" because every documentation > about DHCP had a domain name in the configuration so ... > Is it necesary to have a domain name server (like BIND) working together > with the dhcp server?????? We're getting to where I don't know the answers -- just try it with and without. At a guess, the answer is no, you don't need a domain name and if you use one it can likely be made up -- mine always have been, and IIRC I've used names that didn't correspond to anything in hosts and didn't even have an approved ending. If you do make one up I'd suggest you stay away from any name (unfortunately like cluster.org or cluster.net) that MIGHT be registered in nameservice so you can avoid any possibility of name resolution confusion in the future. You definitely don't need a nameserver -- my hosts are all on a private internal network anyway and not in nameservice. If you want them to resolve by name you have to ensure that they are resolvable one of the ways given for hosts in /etc/nsswitch.conf and the library calls will take care of the rest. > One more thing... > ? > I don?t have Internet in my LAN and I don?t know if is it necesary the > domain name????? > > Thanks a lot. I'm newbe and my english is not good =) Probably not. It depends on what services you want to run elsewhere. Mail servers/clients will likely get unhappy without some sort of domain name defined, maybe a few other things like this. It is also possible some distribution-installed tools (assuming in their preconfiguration that they are on an open LAN) will bitch or break if no domain name is defined -- I've not tried it so can't tell you. /etc/hosts based name resolution per se couldn't care less. Domain names are used primarily for routing or domain administration. The correspondance between a domain name and a subnet block or union of subnet blocks is often useful for both. If you have a private network, no routing except between hosts on the same wire/switch, and no need to differentiate subnet blocks for administrative purposes you can probably live without. If you think that there is any reasonable chance that your cluster might one day end up on a public network it is reasonable to define one anyway. If any installed tools complain because there isn't one it is certainly harmless enough to define one. I generally do out of sheer habit and inertia even within my private lan at home. rgb > > Adri?n . > > ________________________________________________________________________________ > Chat with friends online, try MSN Messenger: Click Here > _______________________________________________ Beowulf mailing list, > Beowulf@beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From emiller at techskills.com Thu Apr 11 07:08:16 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: <20020411130008.63691.qmail@web9608.mail.yahoo.com> Message-ID: >hi >may i know if anyone here can provide detailed information on how to start a a beowulf? >i have about 9 machinese with Pentium MMX 233 Mhz processors, 128 mb RAM, 2 x 1.99 Gb. HDD >I am also considering various designs and this is a school project. Thanks for your help! Li, this group is not very "newbie freindly" when you ask for detailed information, I am working on a similar project and have only gotten spotty assistance. I will tell you that the Scyld system is by far the easiset to set up, and it works well. If you are familiar with Linux, you should have no problems getting a Scyld beowulf up and running, just be sure to read the docs first, they explain the NIC requirements on the master, and other important physical setup issues. After you get the network built, it is really a well-built distribution, with a GUI and all. See www.scyld.com I have tried others, but Scyld is far and away the best, with the most community support. After you get the cluster up and running, that's where the help seems to drift off. Most of the people in this group are upper-level users who know how to get these MPI enabled programs to run on thier clusters. If you are like me, these topics are a little foreign. If you are looking for something to run continuously, like a display, they say the MandelBrot renderer has a loop function, but I can't get it to work. Someone suggested SETI many months ago, which would be perfect, but SETI does not offer an MPI enabled program. Maybe you and I can work together, Ill help you get your cluster up and running, then together we can rattle our swords for some detailed assistance with the MPI programs (and programming). Contact me emiller@techskills.com, good luck! From pzb at datastacks.com Wed Apr 10 20:56:53 2002 From: pzb at datastacks.com (Peter Bowen) Date: Tue Mar 16 01:02:19 2010 Subject: Newest RPM's? In-Reply-To: <1017637861.1772.39.camel@loiosh> References: <1017612125.19271.20.camel@vhwalke.mathsci.usna.edu> <002c01c1d933$ec02a3c0$c31fa6ac@xp> <1017637861.1772.39.camel@loiosh> Message-ID: <1018497414.18187.3.camel@gargleblaster.caffeinexchange.org> On Mon, 2002-04-01 at 00:11, Sean DIlda wrote: > On Sun, 2002-03-31 at 23:15, Eric Miller wrote: > I must note, my above answer was given as if you were installing over > RH6.2 I do *NOT* recommend installing the binary rpms from a > RHL6.2-based Scyld Beowulf over a RHL7.2 system. This is by no means a > supported method. I don't know if anything will or won't break in doing > it, but I would assume that something will considering how much has > changed between RHL6.2 and RHL7.2 If you really want to try, you're > free to try, but if you want something right now, I'd suggesting going > to the RHL6.2 based setup. Are there beowulf packages available for RHL7.2? Thanks. Peter From Todd_Henderson at Raytheon.com Tue Apr 9 07:16:44 2002 From: Todd_Henderson at Raytheon.com (Todd Henderson) Date: Tue Mar 16 01:02:19 2010 Subject: NASTRAN on cluster Message-ID: <3CB2F7CC.82F09FF9@raytheon.com> We're in the process of starting the search for a cluster to replace our 35 XP1000's that come off lease in Sept. We currently use the cluster for CFD only, but we have been instructed that it would be beneficial to all to ensure that NASTRAN can use the new cluster. Therefore, I was wondering if anyone out there is running NASTRAN on a cluster? If so, what OS and cpu's are you using, and do you have any suggestions. thanks, Todd Henderson From jayne at sphynx.clara.co.uk Tue Apr 9 09:28:09 2002 From: jayne at sphynx.clara.co.uk (Jayne Heger) Date: Tue Mar 16 01:02:19 2010 Subject: Parallel povraying baby!!!! Message-ID: Right, I've now ran an parallel application on my Beowulf Cluster, and its working well! ;) When runnig pvmpov which is a parallel rendering farm application. I get these results when I render skyvase.pov, (a picture of a vase) 1 host = 7 mins, 11 seconds 2 hosts = 3min, 30 seconds 3 hosts = 2min 18 seconds One other machine to add yet though! These are all 486's This is my final year project at university What do you think??? kw1el huh???? Jayne From rickey-co at mug.biglobe.ne.jp Wed Apr 10 22:15:40 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: References: Message-ID: I think ... Quadrics is another one. Here's quick figures I have on hand.... RH7.2, 2.4.9 kernel for i860 cluster. On their site, they claim; after protocol, of 340Mbytes/second in each direction. The process-to-process latency for remote write operations is2us, and 5us for MPI messages. But pricing is MUCH higher than SCI/Myrinet. Best regards, At 4:08 +0200 5.04.2002, Steffen Persvold wrote: >On Thu, 4 Apr 2002, Jim Lux wrote: > >> What's high bandwidth? >> What's low latency? > > How much money do you want to spend? >I don't want to start a flamewar here, but I _think_ (not knowing real >numbers for other high speed interconnects) that SCI has atleast the >lowest latency and maybe also the highest point to point bandwidth : > >SCI application to application latency : 2.5 us >SCI application to application bandwidth : 325 MByte/sec > >Note that these numbers are very chipset specific (as most high speed >interconnect numbers are), these numbers are from IA64. Here are numbers >from a popular IA32 platform, the AMD 760MPX : > >SCI application to application latency : 1.8 us >SCI application to application bandwidth : 283 MByte/sec -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ --> Now Shipping 1U Dual Athlon DDR <- --> Ask me about the new Alpha DDR UP1500 Systems <- From jobriant at MPI-SoftTech.Com Thu Apr 11 08:20:00 2002 From: jobriant at MPI-SoftTech.Com (Jennifer O'Briant) Date: Tue Mar 16 01:02:19 2010 Subject: cluster of IBM Netfinity's Message-ID: I have a cluster of 10 IBM Netfinity's that I am upgrading with a 2nd PIII 700Mhz,type slot 1, processor. I am having a hard time finding a fan and heatsink that will fit in these 1U size servers. Does anyone have any ideas where I can find a side mount fan or side flow fan that will work? Jennifer O'Briant Associate Systems Administrator MPI Software Technology, Inc. From rgb at phy.duke.edu Wed Apr 10 06:31:06 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Quoting "Robert G. Brown" : > Is there a convenient way to obtain static ip-addresses using dhcp without > having to explicitly write down the mac-addresses in dhcpd.conf? > > Regards, > > /jon Static? As in each machine gets a single IP number that remains its own "forever" through all reboots and which can be identified by a fixed name in host tables? Following the time-honored tradition of actually reading the man pages for dhcpd, we see that the answer is "sort of". As in in principle yes, but only in a wierd way and would you really want to? First of all, let us consider, how COULD it do this? All dhcp knows of a host is its mac address. System needs IP number. System broadcasts a DHCP request. What can the daemon do? It can assign the address out of a range without looking at the MAC address (beyond ensuring that isn't one that it recognizes already) or it can look at the MAC address, do a table lookup and find it in the table, and assign an IP address based on the table that maps MAC->IP. This is pretty much what actually happens, and of course the lookup table CAN ensure a static MAC->IP matchup. The only question is how the lookup table is constructed. The obvious way is by making explict per-host entries in the dhcpd.conf file. dhcpd reads the file and builds the table from what it finds there. You make the dhcpd.conf entries by hand or automagically by means of a clever script. In general this isn't a real problem. You have to make a per-host entry into e.g. /etc/hosts as well, or you won't know the NAME the host is going to have to correspond to the IP number the daemon happened to give it the first time it saw it. The same script can do both, given e.g. the MAC address and hostname you wish to assign as arguments. Now there is nothing to PREVENT the daemon from assigning IP numbers out of the free range, creating a MAC->IP mapping, and saving the mapping itself so that it is automagically reloaded after, say, a crash (which tends to wipe out the table it builds in memory. By strange chance, this is pretty much exactly what dhcpd does. It views IP's assigned out of a given subnet range as "leases", to be given to hosts for a certain amount of time and then recovered for reuse. It saves its current lease table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this table and "grooms" it, cleaning out expired leases so the IP numbers are reused. In many/most cases where range addresses are used, this is just fine. Remember, dhcp was "invented" at least in part to simplify address assignment to rooms full of PC's running WinXX, a well-known stupid operating system that wouldn't know what to do with a remote login attempt if it saw one. Heck, it doesn't know what to do with a LOCAL login a lot of the time. The IP<->name map is pretty unimportant in this case, because you tend never to address the system by its internet name. So it's no big deal to let IP addresses for dumb WinXX clients recycle. Of course this isn't always true even for WinXX, especially if XX is 2K or XP or NT. Sometimes systems people really like to know that log traces by IP number can be mapped into specific machines just so they can go around with a sucker rod (see "man syslogd" and do a search on "sucker") to administer correction, for example, even if they cannot remotely login to the host in question. dhcpd allows you to pretty much totally control the lease time used for any given subnet or range. You can set it from very short to "very large", probably 4 billion or so seconds, which is (practically) "infinity". Infinity would be your coveted static IP address assignment. Once again I'd argue that although you CAN do this, you probably don't want to in just about any unixoid context including LAN management and cluster engineering. There is something so satisfying, so USEFUL, about the hostname<->IP map, and in order for this map to correspond to some SPECIFIC box, you really are building the hostname<->IP<->MAC map, piecewise. And of course you need to leave the NIC's in the boxes, since yes the map follows the NIC and not the actual box. Although it likely isn't the "only" way to control the complete chain, simultaneously and explicity building /etc/hosts (or the NIS, LDAP, rsync exported versions thereof), the various hostname-related permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is arguably the best way. To emphasize this last point, note that there is additional information that can be specified in the dhcp static table entries, such as the name of a per-host kickstart file to be used in installing it and more. dhcp is at least an approximation to a centralized configuration data server and can perform lots of useful services in this arena, not just handing out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's options can only be passed from it's own internal list, so one can't QUITE use it as a way of globally synchronizing whole tables of important data (like /etc/hosts,netgroups,passwd) across a subnet as systems automatically and periodically renew their leases. The list of options it supports as it stands now is quite large, though. I also don't know how susceptible it is to spoofing -- one problem with daemon-based services like this is that if they aren't uniquely bound at both ends to an authorized server and somebody puts a faster server on the same physical network, one can sometimes do something like dynamically change a systems "identity" in real time and gain access privileges you otherwise might not have had. Obviously, sending files like /etc/passwd around in this way would be a very dangerous thing to do unless the daemon were re-engineered to use something like ssl to simultaneously certify the server and encrypt the traffic. Hope this helps. BTW, in addition to the always useful man pages for dhcpd and dhcpd.conf (e.g.) you can and should look at the linux documentation project site and the various RFCs that specify dhcp's behavior and option spread. rgb From rgb at phy.duke.edu Wed Apr 10 09:03:42 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? Not a whole lot better. Since our installations tend to be O(10) systems at a time (10-30, not hundreds) and since we've gotten our local vendor to label each node with the MAC address before delivery (they've gotta boot up and burn in each node anyway) we just pop the nodes in a rack and use a script to insert a static entry for each one in an order that corresponds to rack order. After all, even though yes we label the nodes, it would be a bit silly to have g01 next to g22 next to g13 in rack order, and since we use the same dhcp server for nodes that we use for the general department, we cannot guarantee that some other host won't request and be granted a floating IP number that breaks the ordered sequence. The alternative (which would work fine for a cluster with a dedicated, in-the-local-isolated-net, and hence predictable dhcpd server) is to write the scriptset you describe, which we've actually considered doing. Boot the nodes in rack order, with floating addresses hopefully assigned in strict order from the address range, let them install themselves, and in the meantime write a script that parses e.g. /var/log/messages for the DHCP request and offer messages or /var/lib/dhpc/dhcpd.leases for the MAC and IP mapping and creates the required host and dhcpd.conf tables. We haven't gone this way partly out of laziness -- with tens of systems at a time to install it will only save work (relative to the time required to write the scripts) after we've used the scriptset for years -- and partly because to our direct observation at least one node install in twenty or thirty will screw up and occur in the wrong order. This, of course, will screw up EVERYTHING -- either one physically rearranges the rack or hand edits the tables, either of which costs one far more than the labor saved in the first place. There may be a better solution (probably smarter, more complex scripts that can perform e.g. node insert and delete operations and hence manage a reordering of the tables without having to hand edit everything) but more complex scripts require a signficant investment in time and one needs a very clear conceptualization of the design to have a good chance at ending up with something really usable. This in turn requires experience with the simpler scripts and a time living with their frustrations. We just don't have enough nodes to do all this except for the fun of it -- maybe a really big DOE site does but we don't. So we'll likely continue to use simple-building block scripts that require the entry of the MAC address and desired hostname/IP mapping as parameters (possibly augmented by a script that extracts MAC addresses from the log files, since even with help for the vendor we often have nodes or workstations to install with unknown MAC addresses and have to boot once, get the MAC address, and boot again to do the install). Not to beat dead horses or anything, but (IMHO) a lot of this management scriptset development is retarded by the fact that every single system tool has a configuration file with its own unique format and structure. I am well on the way to becoming downright religious about using xml as THE basis for the formatting of this sort of thing, at least where one can choose to do so in future applications. If dhcpd.conf and dhcpd.leases were written in an xml-compliant way, it would both make much better logical sense and it would be easier to both parse and write tools to manipulate them. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Apr 11 08:54:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:19 2010 Subject: Newest RPM's? In-Reply-To: <1018497414.18187.3.camel@gargleblaster.caffeinexchange.org> Message-ID: On 10 Apr 2002, Peter Bowen wrote: > Are there beowulf packages available for RHL7.2? Depends on what you mean. Some of the fundamental tools (PVM, flavors of MPI, more) are already packaged in and in 7.2. Ditto the full range of GPL compilers and programming support tools. Even commercial beowulf packages like scyld or a turnkey vendor's arrangement often use RH as a base, although they aren't always current with the very latest release. Pretty much all the truly open source beowulf tools either are available in rpm form that will install under 7.2 (or source rpm that will rebuild and install under 7.2) or at the very least and worst in a tarball form that will build and install under any unixoid/posix environment including 7.2. In fact, it is almost tautological that this would be so -- beowulf tools were mostly developed on linux/gnu boxes and RH is at heart a generic linux/gnu distribution. Commercial packages (e.g. portland or absoft compilers, PBS-Pro) can almost always be obtained in a form that runs under 7.2. The only exceptions are likely to be ones with library issues that haven't yet been ported. libc has from time to time changed enough to break things, so tools developed on e.g. RH 5.2 don't always work on 7.2 without some porting effort (but I know of no mainstream tools in this category). So I think the answer would have to be "yes";-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From josip at icase.edu Thu Apr 11 09:00:04 2002 From: josip at icase.edu (Josip Loncaric) Date: Tue Mar 16 01:02:19 2010 Subject: Naming etc. (Was: DHCP Help) References: Message-ID: <3CB5B304.49BCC793@icase.edu> "Robert G. Brown" wrote: > > [...] my hosts are all on a private > internal network anyway and not in nameservice. Good policy! Private hostnames/addresses should remain private because they are not guaranteed to be unique across the entire Internet. The DNS server should contain only registered hostnames/addresses. The head node of a cluster is typically multi-homed and its public interface should be DNS registered, but the internal private interface (and the client nodes on the internal private network) are best resolved via /etc/hosts, where internal domain name is determined from the FQDN form of the name. If /etc/hosts on client1 contains: 192.168.1.1 client1.internal.domain client1 then 'dnsdomainname' on client1 returns 'internal.domain' (clearly not found in any Internet registry). This would work fine internally, but NOT outside the cluster (e.g. sendmail may have problems, etc.). The /etc/hosts tables should be consistent across the cluster, even if there are reasons to play tricks. For example, one typically has all machines on a fast ethernet (FE) subnet (say 192.168.1.x) but a few may also have gigabit ethernet (GE) interfaces (say 192.168.2.x). Using IP level routing can result in complicated routing tables, because only specific FE hosts can also be reached via the GE interface. What about name level "routing"? While /etc/hosts can be used to make hostnames of GE machines resolve to GE addresses on GE machines but to their FE addresses on the FE-only machines, this can lead to problems with software packages which assume globally consistent hostname/address mapping. For example, grid software (Globus) needs a globally consistent FQDN/IP mapping. The grid machine name is the fully-qualified domain name or Internet name of a grid machine. It should be the name returned by the "gethostbyname()" function (from libc) and the primary name retrieved from DNS via nslookup. The primary name should correspond to the host's primary interface (if there is more than one) and be fully accessible across the grid. The grid could involve private addresses, but those are visible only WITHIN an organization because private addresses must not be routable outside an organization. This is a serious limitation -- so it is probably best to limit grids to publicly registered hosts only. Proxy processes on the head nodes to access internal machines may be needed. Most clusters are built around a private subnet, sometimes with IP masquerading enabled on the head node so that the internal clients can 'call out'. This still means that internal clients are not visible externally, i.e. one cannot 'call in' from the outside. As a consequence, parallel jobs which assume global TCP connectivity of all participating machines (e.g. MPICH-G2) will have problems in using two clusters (each with its own private internal subnet). At the moment, every node (that you wish to use in a MPICH-G2 job) must have a public IP address and must be fully accessible. To run jobs across several clusters with internal private networks, the MPI programmer would need to provide a proxy process on the head node to overcome this difficulty. In summary, naming is a simple concept but just under the surface is a can of worms created by established programming practices based on diverse assumptions. Multiply connected machines and/or public/private network mixtures need to be set up with great care. Tricky setups are fragile; simplicity and transparency works better. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From tegner at nada.kth.se Wed Apr 10 05:36:40 2002 From: tegner at nada.kth.se (tegner@nada.kth.se) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: References: Message-ID: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Quoting "Robert G. Brown" : Is there a convenient way to obtain static ip-addresses using dhcp without having to explicitly write down the mac-addresses in dhcpd.conf? Regards, /jon > On Wed, 3 Apr 2002, Adrian Garcia Garcia wrote: > > For one thing don't use the range statement -- it tells dhcpd the range > of IP numbers to assign UNKNOWN ethernet numbers. You are statically > assigning an IP number in your "free" range to a particular host with a > KNOWN ethernet number below. I don't know what dhcpd would do in that > case -- something sensible one would hope but then, maybe not. The > range statement is really there so you can dynamically allocate > addresses from the range to hosts you may never have seen before that > you don't care to ever address by name (as they might well get a > different IP number on the next boot). > > DHCP servers run by ISP's not infrequently use the range feature to > conserve IP numbers -- they only need enough to cover the greatest > number of connections they are likely to have at any one time, not one > IP number per host that might ever connect. Departments might use it to > give IP numbers to laptops brought in by visitors (with the extra > benefit that they can assign a subnet block that isn't "trusted" by the > usual department servers and/or is firewalled from the outside by an > ip-forwarding/masquerading host). > > You want "only" static IP's in your cluster, as you'd like nodo1 to be > the same machine and IP address every time. > > Be a bit careful about your use of domain names. As it happens, I don't > find cluster.org registered yet (amazingly enough!) but it is pretty > easy to pick one that does exist in nameservice in the outside world. > In that case you'll run a serious risk of routing or name resolution > problems depending on things like the search order you use in > /etc/nsswitch.conf. Even my previous example of rgb.private.net is a > bit risky. > > You should run a nameserver (cache only is fine) on your 192.168.1.1 > server, presuming it lives on an external network and you care to > resolve global names. > > Similarly you may want: > > option routers 192.168.1.1; > > if you want internal hosts to be able to get out through your (presumed > gateway) server. > > Finally, if you want nodo1 to come up knowing its own name without > hardwiring it in on the node itself, add > > option host-name nodo1; > > to its definition. > > I admit that I do tend to lay out my dhcpd.conf a bit differently than > you have it below but I don't think that the differences are > particularly significant, and you have a copy of the one I use anyway if > you want to play with the pieces. You should find a log trace of > dhcpd's activities in /var/log/messages, which should help with any > further debugging. > > On your nodo1 host, make sure that: > > cat /etc/sysconfig/network-scripts/ifcfg-eth0 > DEVICE=eth0 > BOOTPROTO=dhcp > ONBOOT=yes > > and > > cat /etc/sysconfig/network > NETWORKING=yes > HOSTNAME=nodo1 > > and that in /etc/modules.conf there is something like: > > cat /etc/modules.conf > alias parport_lowlevel parport_pc > alias eth0 tulip > > (or instead of tulip, whatever your network module is). > > If you then boot your e.g. RH client it SHOULD just come up, > automatically try to start the network on device eth0 using dhcp as its > protocol for obtaining and IP number, ask the dhcp server for an address > and a route, and just "work" when they come back. > > Hope this helps. > > rgb > > > server-name "server.cluster.org" > > > > subnet 192.168.1.0 netmask 255.255.255.0 > > { > > range 192.168.1.2 192.168.1.10 #my client has the ip > > 192.168.1.2 > > #and > my > > server the static ip 192.168.1.1 > > option subnet-mask 255.255.255.0; > > option broadcast-address 192.168.1.255; > > option domain-name-server 192.168.1.1; > > option domain-name "cluster.org"; > > > > host nodo1.cluster.org > > { > > hardware ethernet 00:60:97:a1:ef:e0; #here is the address of the > > client's card > > fixed-address 192.168.1.2; > > } > > } > > > > And finally some files on my server. > > > > NETWORK > > ------------------------------------------ > > networking = yes > > hostname =server.cluster.org > > gatewaydev = eth0 > > gatewaye= > > ------------------------------------------ > > > > HOSTS ( In my server and in the client I have the same on this file ) > > ------------------------------------------ > > 127.0.0.1 localhost > > 192.168.1.1 server.cluster.org > > 192.168.1.2 nodo1.cluster.org > > > > > > Ok thats the information, I am a little confuse, could you help me > please > > =). I can?t detect the mistake, I dont know if is the server or some > card > > =s. Thanks for all. > > > > > ________________________________________________________________________________ > > Get your FREE download of MSN Explorer at http://explorer.msn.com. > > _______________________________________________ Beowulf mailing list, > > Beowulf@beowulf.org To change your subscription (digest mode or > > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From tegner at nada.kth.se Wed Apr 10 07:25:17 2002 From: tegner at nada.kth.se (tegner@nada.kth.se) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: References: Message-ID: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Very helpful! Thanks! But I'm still curious about how you make - automagically - the hardware ethernet line in dhcpd.conf initially. Say you have 100 machines. One way I would think of would be to use kickstart and: Install the machines and boot them up in sequence and using the range statement in dhcpd.conf (so that the first machine gets 192.168.1.101, the second 192.168.1.102 ...) Once all nodes are up use some script to extract the mac addresses for all the nodes and either modify dhcpd.conf - or - discard of dhcp completely and hardwire the ip-addresses to each node. But I'm sure there are better ways to do this? Thanks again, /jon Quoting "Robert G. Brown" : > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > Quoting "Robert G. Brown" : > > Is there a convenient way to obtain static ip-addresses using dhcp > without > > having to explicitly write down the mac-addresses in dhcpd.conf? > > > > Regards, > > > > /jon > > Static? As in each machine gets a single IP number that remains its own > "forever" through all reboots and which can be identified by a fixed > name in host tables? > > Following the time-honored tradition of actually reading the man pages > for dhcpd, we see that the answer is "sort of". As in in principle yes, > but only in a wierd way and would you really want to? > > First of all, let us consider, how COULD it do this? All dhcp knows of > a host is its mac address. System needs IP number. System broadcasts a > DHCP request. What can the daemon do? > > It can assign the address out of a range without looking at the MAC > address (beyond ensuring that isn't one that it recognizes already) or > it can look at the MAC address, do a table lookup and find it in the > table, and assign an IP address based on the table that maps MAC->IP. > This is pretty much what actually happens, and of course the lookup > table CAN ensure a static MAC->IP matchup. > > The only question is how the lookup table is constructed. > > The obvious way is by making explict per-host entries in the dhcpd.conf > file. dhcpd reads the file and builds the table from what it finds > there. You make the dhcpd.conf entries by hand or automagically by > means of a clever script. In general this isn't a real problem. You > have to make a per-host entry into e.g. /etc/hosts as well, or you won't > know the NAME the host is going to have to correspond to the IP number > the daemon happened to give it the first time it saw it. The same > script can do both, given e.g. the MAC address and hostname you wish to > assign as arguments. > > Now there is nothing to PREVENT the daemon from assigning IP numbers out > of the free range, creating a MAC->IP mapping, and saving the mapping > itself so that it is automagically reloaded after, say, a crash (which > tends to wipe out the table it builds in memory. By strange chance, > this is pretty much exactly what dhcpd does. It views IP's assigned out > of a given subnet range as "leases", to be given to hosts for a certain > amount of time and then recovered for reuse. It saves its current lease > table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this > table and "grooms" it, cleaning out expired leases so the IP numbers are > reused. In many/most cases where range addresses are used, this is just > fine. Remember, dhcp was "invented" at least in part to simplify > address assignment to rooms full of PC's running WinXX, a well-known > stupid operating system that wouldn't know what to do with a remote > login attempt if it saw one. Heck, it doesn't know what to do with a > LOCAL login a lot of the time. The IP<->name map is pretty unimportant > in this case, because you tend never to address the system by its > internet name. So it's no big deal to let IP addresses for dumb WinXX > clients recycle. > > Of course this isn't always true even for WinXX, especially if XX is 2K > or XP or NT. Sometimes systems people really like to know that log > traces by IP number can be mapped into specific machines just so they > can go around with a sucker rod (see "man syslogd" and do a search on > "sucker") to administer correction, for example, even if they cannot > remotely login to the host in question. > > dhcpd allows you to pretty much totally control the lease time used for > any given subnet or range. You can set it from very short to "very > large", probably 4 billion or so seconds, which is (practically) > "infinity". Infinity would be your coveted static IP address > assignment. > > Once again I'd argue that although you CAN do this, you probably don't > want to in just about any unixoid context including LAN management and > cluster engineering. There is something so satisfying, so USEFUL, about > the hostname<->IP map, and in order for this map to correspond to some > SPECIFIC box, you really are building the hostname<->IP<->MAC map, > piecewise. And of course you need to leave the NIC's in the boxes, > since yes the map follows the NIC and not the actual box. Although it > likely isn't the "only" way to control the complete chain, > simultaneously and explicity building /etc/hosts (or the NIS, LDAP, > rsync exported versions thereof), the various hostname-related > permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is > arguably the best way. > > To emphasize this last point, note that there is additional information > that can be specified in the dhcp static table entries, such as the name > of a per-host kickstart file to be used in installing it and more. dhcp > is at least an approximation to a centralized configuration data server > and can perform lots of useful services in this arena, not just handing > out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's > options can only be passed from it's own internal list, so one can't > QUITE use it as a way of globally synchronizing whole tables of > important data (like /etc/hosts,netgroups,passwd) across a subnet as > systems automatically and periodically renew their leases. The list of > options it supports as it stands now is quite large, though. > > I also don't know how susceptible it is to spoofing -- one problem with > daemon-based services like this is that if they aren't uniquely bound at > both ends to an authorized server and somebody puts a faster server on > the same physical network, one can sometimes do something like > dynamically change a systems "identity" in real time and gain access > privileges you otherwise might not have had. Obviously, sending files > like /etc/passwd around in this way would be a very dangerous thing to > do unless the daemon were re-engineered to use something like ssl to > simultaneously certify the server and encrypt the traffic. > > Hope this helps. BTW, in addition to the always useful man pages for > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > documentation project site and the various RFCs that specify dhcp's > behavior and option spread. > > rgb > > From joelja at darkwing.uoregon.edu Thu Apr 11 09:27:47 2002 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Message-ID: You have to have a host-specific value to key on... that would be the mac address... you can approach the problem a different way (dynamic dns) so that the machine get the same hostname regardless of what ip they get but that's more trouble than it's worth for a cluster... On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Quoting "Robert G. Brown" : > Is there a convenient way to obtain static ip-addresses using dhcp without > having to explicitly write down the mac-addresses in dhcpd.conf? > > Regards, > > /jon > > > > > On Wed, 3 Apr 2002, Adrian Garcia Garcia wrote: > > > > For one thing don't use the range statement -- it tells dhcpd the range > > of IP numbers to assign UNKNOWN ethernet numbers. You are statically > > assigning an IP number in your "free" range to a particular host with a > > KNOWN ethernet number below. I don't know what dhcpd would do in that > > case -- something sensible one would hope but then, maybe not. The > > range statement is really there so you can dynamically allocate > > addresses from the range to hosts you may never have seen before that > > you don't care to ever address by name (as they might well get a > > different IP number on the next boot). > > > > DHCP servers run by ISP's not infrequently use the range feature to > > conserve IP numbers -- they only need enough to cover the greatest > > number of connections they are likely to have at any one time, not one > > IP number per host that might ever connect. Departments might use it to > > give IP numbers to laptops brought in by visitors (with the extra > > benefit that they can assign a subnet block that isn't "trusted" by the > > usual department servers and/or is firewalled from the outside by an > > ip-forwarding/masquerading host). > > > > You want "only" static IP's in your cluster, as you'd like nodo1 to be > > the same machine and IP address every time. > > > > Be a bit careful about your use of domain names. As it happens, I don't > > find cluster.org registered yet (amazingly enough!) but it is pretty > > easy to pick one that does exist in nameservice in the outside world. > > In that case you'll run a serious risk of routing or name resolution > > problems depending on things like the search order you use in > > /etc/nsswitch.conf. Even my previous example of rgb.private.net is a > > bit risky. > > > > You should run a nameserver (cache only is fine) on your 192.168.1.1 > > server, presuming it lives on an external network and you care to > > resolve global names. > > > > Similarly you may want: > > > > option routers 192.168.1.1; > > > > if you want internal hosts to be able to get out through your (presumed > > gateway) server. > > > > Finally, if you want nodo1 to come up knowing its own name without > > hardwiring it in on the node itself, add > > > > option host-name nodo1; > > > > to its definition. > > > > I admit that I do tend to lay out my dhcpd.conf a bit differently than > > you have it below but I don't think that the differences are > > particularly significant, and you have a copy of the one I use anyway if > > you want to play with the pieces. You should find a log trace of > > dhcpd's activities in /var/log/messages, which should help with any > > further debugging. > > > > On your nodo1 host, make sure that: > > > > cat /etc/sysconfig/network-scripts/ifcfg-eth0 > > DEVICE=eth0 > > BOOTPROTO=dhcp > > ONBOOT=yes > > > > and > > > > cat /etc/sysconfig/network > > NETWORKING=yes > > HOSTNAME=nodo1 > > > > and that in /etc/modules.conf there is something like: > > > > cat /etc/modules.conf > > alias parport_lowlevel parport_pc > > alias eth0 tulip > > > > (or instead of tulip, whatever your network module is). > > > > If you then boot your e.g. RH client it SHOULD just come up, > > automatically try to start the network on device eth0 using dhcp as its > > protocol for obtaining and IP number, ask the dhcp server for an address > > and a route, and just "work" when they come back. > > > > Hope this helps. > > > > rgb > > > > > server-name "server.cluster.org" > > > > > > subnet 192.168.1.0 netmask 255.255.255.0 > > > { > > > range 192.168.1.2 192.168.1.10 #my client has the ip > > > 192.168.1.2 > > > #and > > my > > > server the static ip 192.168.1.1 > > > option subnet-mask 255.255.255.0; > > > option broadcast-address 192.168.1.255; > > > option domain-name-server 192.168.1.1; > > > option domain-name "cluster.org"; > > > > > > host nodo1.cluster.org > > > { > > > hardware ethernet 00:60:97:a1:ef:e0; #here is the address of the > > > client's card > > > fixed-address 192.168.1.2; > > > } > > > } > > > > > > And finally some files on my server. > > > > > > NETWORK > > > ------------------------------------------ > > > networking = yes > > > hostname =server.cluster.org > > > gatewaydev = eth0 > > > gatewaye= > > > ------------------------------------------ > > > > > > HOSTS ( In my server and in the client I have the same on this file ) > > > ------------------------------------------ > > > 127.0.0.1 localhost > > > 192.168.1.1 server.cluster.org > > > 192.168.1.2 nodo1.cluster.org > > > > > > > > > Ok thats the information, I am a little confuse, could you help me > > please > > > =). I can?t detect the mistake, I dont know if is the server or some > > card > > > =s. Thanks for all. > > > > > > > > ________________________________________________________________________________ > > > Get your FREE download of MSN Explorer at http://explorer.msn.com. > > > _______________________________________________ Beowulf mailing list, > > > Beowulf@beowulf.org To change your subscription (digest mode or > > > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > -- > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > > Duke University Dept. of Physics, Box 90305 > > Durham, N.C. 27708-0305 > > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Academic User Services joelja@darkwing.uoregon.edu -- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -- The accumulation of all powers, legislative, executive, and judiciary, in the same hands, whether of one, a few, or many, and whether hereditary, selfappointed, or elective, may justly be pronounced the very definition of tyranny. - James Madison, Federalist Papers 47 - Feb 1, 1788 From rgb at phy.duke.edu Thu Apr 11 10:17:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? Not that I know of. Maybe somebody else knows of one. I'd just use perl or bash (either would probably work, although parsing is generally easier in perl), parse e.g. Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 from /var/log/messages on the dhcp server, and write an output routine to generate # golem (Linux/Windows laptop lilith, second/100BT interface) host golem { hardware ethernet 00:20:e0:6d:a0:05; fixed-address 192.168.1.140; next-server 192.168.1.131; option routers 192.168.1.1; option domain-name "rgb.private.net"; option host-name "golem"; } and 192.168.1.140 golem.rgb.private.net golem and append them to /etc/dhcpd.conf and /etc/hosts respectively, and then distribute copies of the resulting /etc/hosts -- as Josip made eloquently clear your private internal network should resolve consistently on all PIN hosts and probably should have SOME sort of domainname defined so that software the might include a getdomainbyname() call and might not include an adequate check and handle of a null value can cope. It's hard to know what assumptions were made by the designer of every single piece of network software you might want to run... Of coures you'll probably want to do the b01, b02, b03... hostname iteration -- I'm just pulling an example at random out of my own log tables. rgb > > Thanks again, > > /jon > > Quoting "Robert G. Brown" : > > > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > > > Quoting "Robert G. Brown" : > > > Is there a convenient way to obtain static ip-addresses using dhcp > > without > > > having to explicitly write down the mac-addresses in dhcpd.conf? > > > > > > Regards, > > > > > > /jon > > > > Static? As in each machine gets a single IP number that remains its own > > "forever" through all reboots and which can be identified by a fixed > > name in host tables? > > > > Following the time-honored tradition of actually reading the man pages > > for dhcpd, we see that the answer is "sort of". As in in principle yes, > > but only in a wierd way and would you really want to? > > > > First of all, let us consider, how COULD it do this? All dhcp knows of > > a host is its mac address. System needs IP number. System broadcasts a > > DHCP request. What can the daemon do? > > > > It can assign the address out of a range without looking at the MAC > > address (beyond ensuring that isn't one that it recognizes already) or > > it can look at the MAC address, do a table lookup and find it in the > > table, and assign an IP address based on the table that maps MAC->IP. > > This is pretty much what actually happens, and of course the lookup > > table CAN ensure a static MAC->IP matchup. > > > > The only question is how the lookup table is constructed. > > > > The obvious way is by making explict per-host entries in the dhcpd.conf > > file. dhcpd reads the file and builds the table from what it finds > > there. You make the dhcpd.conf entries by hand or automagically by > > means of a clever script. In general this isn't a real problem. You > > have to make a per-host entry into e.g. /etc/hosts as well, or you won't > > know the NAME the host is going to have to correspond to the IP number > > the daemon happened to give it the first time it saw it. The same > > script can do both, given e.g. the MAC address and hostname you wish to > > assign as arguments. > > > > Now there is nothing to PREVENT the daemon from assigning IP numbers out > > of the free range, creating a MAC->IP mapping, and saving the mapping > > itself so that it is automagically reloaded after, say, a crash (which > > tends to wipe out the table it builds in memory. By strange chance, > > this is pretty much exactly what dhcpd does. It views IP's assigned out > > of a given subnet range as "leases", to be given to hosts for a certain > > amount of time and then recovered for reuse. It saves its current lease > > table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this > > table and "grooms" it, cleaning out expired leases so the IP numbers are > > reused. In many/most cases where range addresses are used, this is just > > fine. Remember, dhcp was "invented" at least in part to simplify > > address assignment to rooms full of PC's running WinXX, a well-known > > stupid operating system that wouldn't know what to do with a remote > > login attempt if it saw one. Heck, it doesn't know what to do with a > > LOCAL login a lot of the time. The IP<->name map is pretty unimportant > > in this case, because you tend never to address the system by its > > internet name. So it's no big deal to let IP addresses for dumb WinXX > > clients recycle. > > > > Of course this isn't always true even for WinXX, especially if XX is 2K > > or XP or NT. Sometimes systems people really like to know that log > > traces by IP number can be mapped into specific machines just so they > > can go around with a sucker rod (see "man syslogd" and do a search on > > "sucker") to administer correction, for example, even if they cannot > > remotely login to the host in question. > > > > dhcpd allows you to pretty much totally control the lease time used for > > any given subnet or range. You can set it from very short to "very > > large", probably 4 billion or so seconds, which is (practically) > > "infinity". Infinity would be your coveted static IP address > > assignment. > > > > Once again I'd argue that although you CAN do this, you probably don't > > want to in just about any unixoid context including LAN management and > > cluster engineering. There is something so satisfying, so USEFUL, about > > the hostname<->IP map, and in order for this map to correspond to some > > SPECIFIC box, you really are building the hostname<->IP<->MAC map, > > piecewise. And of course you need to leave the NIC's in the boxes, > > since yes the map follows the NIC and not the actual box. Although it > > likely isn't the "only" way to control the complete chain, > > simultaneously and explicity building /etc/hosts (or the NIS, LDAP, > > rsync exported versions thereof), the various hostname-related > > permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is > > arguably the best way. > > > > To emphasize this last point, note that there is additional information > > that can be specified in the dhcp static table entries, such as the name > > of a per-host kickstart file to be used in installing it and more. dhcp > > is at least an approximation to a centralized configuration data server > > and can perform lots of useful services in this arena, not just handing > > out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's > > options can only be passed from it's own internal list, so one can't > > QUITE use it as a way of globally synchronizing whole tables of > > important data (like /etc/hosts,netgroups,passwd) across a subnet as > > systems automatically and periodically renew their leases. The list of > > options it supports as it stands now is quite large, though. > > > > I also don't know how susceptible it is to spoofing -- one problem with > > daemon-based services like this is that if they aren't uniquely bound at > > both ends to an authorized server and somebody puts a faster server on > > the same physical network, one can sometimes do something like > > dynamically change a systems "identity" in real time and gain access > > privileges you otherwise might not have had. Obviously, sending files > > like /etc/passwd around in this way would be a very dangerous thing to > > do unless the daemon were re-engineered to use something like ssl to > > simultaneously certify the server and encrypt the traffic. > > > > Hope this helps. BTW, in addition to the always useful man pages for > > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > > documentation project site and the various RFCs that specify dhcp's > > behavior and option spread. > > > > rgb > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From shaeffer at neuralscape.com Thu Apr 11 03:01:55 2002 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: ; from rgb@phy.duke.edu on Wed, Apr 10, 2002 at 09:31:06AM -0400 References: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Message-ID: <20020411030155.A30307@synapse.neuralscape.com> On Wed, Apr 10, 2002 at 09:31:06AM -0400, Robert G. Brown wrote: > > Hope this helps. BTW, in addition to the always useful man pages for > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > documentation project site and the various RFCs that specify dhcp's > behavior and option spread. http://www.amazon.com/exec/obidos/search-handle-form/ref=s_sf_b_as/002-5123550-8208810 Is a reasonably well done book that folks interested in DHCP might consider acquiring. It provides a comprehensive overview of the subject. cheers, Karen -- Karen Shaeffer Neuralscape; Santa Cruz, Ca. 95060 shaeffer@neuralscape.com http://www.neuralscape.com From roger at ERC.MsState.Edu Thu Apr 11 11:00:04 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? That's exactly how I do it. Then, in the Kickstart configuration script, I have the node configure itself not to use DHCP anymore. It is a bit cumbersome when new nodes are added, but since the nodes that I will be installing in two weeks are the product of a purchase cycle that started in February, I don't have to worry about doing it too often. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From roger at ERC.MsState.Edu Thu Apr 11 11:02:46 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Robert G. Brown wrote: > > But I'm sure there are better ways to do this? > > Not that I know of. Maybe somebody else knows of one. I'd just use > perl or bash (either would probably work, although parsing is generally > easier in perl), parse e.g. > > Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 > Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 > > from /var/log/messages on the dhcp server, and write an output routine > to generate It's actually easier to grab them out of /var/lib/dhcpd.leases, since some of the information that you're looking for is already in the format that you need it. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From christon at pluto.dsu.edu Thu Apr 11 11:09:59 2002 From: christon at pluto.dsu.edu (Christoffersen, Neils) Date: Tue Mar 16 01:02:19 2010 Subject: scyld slave node problems Message-ID: <0718ABB23368D2119FC200008362AF6816BDDA@pluto.dsu.edu> Hello all, I'm setting up a small cluster for my university using the Scyld distro. The master is up and running and now I'm trying to get the nodes to operate. However, the node I'm currently working on is having some difficulties. It seems to be communicating with the master just fine, but when copying the libraries from the master it starts spitting out "try_do_free_pages failed for init" and similar messages. It seems to me that maybe the hard drive is not being recognized and it's trying to run everything on ram and just running out of memory. Does anyone know what could be causing this? I have the node log which I can attach if you wish (I just don't have it with me at the moment). Thanks for any help you can lend. Sincerely Neils Christoffersen From canon at nersc.gov Thu Apr 11 11:16:25 2002 From: canon at nersc.gov (canon@nersc.gov) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: Message from tegner@nada.kth.se of "Wed, 10 Apr 2002 16:25:17 +0200." <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: <200204111816.g3BIGPw13292@pookie.nersc.gov> Jon, We install our machines in pretty much this fashion. I wrote a script that yanks out the mac address and builds a dhcp entry that I append to the dhcpd.conf file. Its not the most elegant solution but it works. Also, I think NPACI/ROCKS includes some utilities to stream-line this process. --Shane Canon From RSchilling at affiliatedhealth.org Thu Apr 11 10:59:55 2002 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Tue Mar 16 01:02:19 2010 Subject: Parallel povraying baby!!!! Message-ID: <51FCCCF0C130D211BE550008C724149E01165AF7@mail1.affiliatedhealth.org> Very nice results! Would you be willing to discuss or document the steps you took to get set up? Thanks! --Richard Schiling -----Original Message----- From: Jayne Heger To: Davis, Robin J.; Penfold, Brian; webmaster@wisewolf.com; Clever TW; Roy Gudz; Stephen.Cooke@severntrent.co.uk; Symon Cook; Tasneem Sharif; beowulf-newbie@fecundswamp.net; beowulf@beowulf.org; chris Sent: 9/04/02 17:28 Subject: Parallel povraying baby!!!! Right, I've now ran an parallel application on my Beowulf Cluster, and its working well! ;) When runnig pvmpov which is a parallel rendering farm application. I get these results when I render skyvase.pov, (a picture of a vase) 1 host = 7 mins, 11 seconds 2 hosts = 3min, 30 seconds 3 hosts = 2min 18 seconds One other machine to add yet though! These are all 486's This is my final year project at university What do you think??? kw1el huh???? Jayne _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020411/787c81d7/attachment.html From siegert at sfu.ca Thu Apr 11 11:34:38 2002 From: siegert at sfu.ca (Martin Siegert) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se>; from tegner@nada.kth.se on Wed, Apr 10, 2002 at 04:25:17PM +0200 References: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: <20020411113438.C20302@stikine.ucs.sfu.ca> On Wed, Apr 10, 2002 at 04:25:17PM +0200, tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? If you want to use static ip addresses anyway (as I do), why do you use dhcp at all? I use a kickstart file with something like network --bootproto static --device eth3 --ip 172.17.254.1 --netmask 255.255.0.0 --gateway 172.17.0.1 --hostname ks1 --nameserver 172.17.0.1 and have on the master node a set of ip addresses reserved for kickstart installations: 172.17.254.1 ks1 172.17.254.2 ks2 172.17.254.3 ks3 172.17.254.4 ks4 172.17.254.5 ks5 In the %post section of the kickstart file I then run a script that increases a counter on the master node, returns that counter as the real ip address of the new node, and updates the /etc/hosts file on all other nodes. I have installed my cluster (96 nodes) that way all by myself without any (big) problems ... maybe I just was too lazy to learn how to deal with dhcp. Cheers, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From joachim at lfbs.RWTH-Aachen.DE Thu Apr 11 11:46:28 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> ...Iwao Makino wrote: > > I think ... Quadrics is another one. [...] > But pricing is MUCH higher than SCI/Myrinet. Do you have any pricing information at all? AFAIK, they are only distribute with Compaq clusters. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From emiller at techskills.com Thu Apr 11 11:52:06 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:19 2010 Subject: scyld slave node problems In-Reply-To: <0718ABB23368D2119FC200008362AF6816BDDA@pluto.dsu.edu> Message-ID: >I'm setting up a small cluster for my university using the Scyld distro. >The master is up and running and now I'm trying to get the nodes to operate. >However, the node I'm currently working on is having some difficulties. It >seems to be communicating with the master just fine, but when copying the >libraries from the master it starts spitting out "try_do_free_pages failed >for init" and similar messages. It seems to me that maybe the hard drive is There might be a more technical solution, but I had the same problem and was able to solve it by booting that node diskless. Just disconnect the hard drive, and re-boot with the boot disk. Like I said, there may be a better way or technical solution, but ".....free_pages" led me to believe it was a HDD problem. I booted diskless, and had no problems. From rok at ucsd.edu Thu Apr 11 11:31:23 2002 From: rok at ucsd.edu (Robert Konecny) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: ; from rgb@phy.duke.edu on Thu, Apr 11, 2002 at 01:17:51PM -0400 References: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: <20020411113123.B26495@ucsd.edu> that's pretty much how insert-ethers from Rocks clustering software works (rocks.npaci.edu). You fire it up on frontend and it starts parsing /var/log/messages in real time. Then you kick start a node and when insert-ethers sees a request for a lease with unknown MAC it updates Rocks MySQL database, generates new dhcpd.conf and restarts dhcpd. Works like charm. robert On Thu, Apr 11, 2002 at 01:17:51PM -0400, Robert G. Brown wrote: > > Not that I know of. Maybe somebody else knows of one. I'd just use > perl or bash (either would probably work, although parsing is generally > easier in perl), parse e.g. > > Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 > Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 > > from /var/log/messages on the dhcp server, and write an output routine > to generate > > # golem (Linux/Windows laptop lilith, second/100BT interface) > host golem { > hardware ethernet 00:20:e0:6d:a0:05; > fixed-address 192.168.1.140; > next-server 192.168.1.131; > option routers 192.168.1.1; > option domain-name "rgb.private.net"; > option host-name "golem"; > } > > and > > 192.168.1.140 golem.rgb.private.net golem > > and append them to /etc/dhcpd.conf and /etc/hosts respectively, and then > distribute copies of the resulting /etc/hosts -- as Josip made > eloquently clear your private internal network should resolve > consistently on all PIN hosts and probably should have SOME sort of > domainname defined so that software the might include a > getdomainbyname() call and might not include an adequate check and > handle of a null value can cope. It's hard to know what assumptions > were made by the designer of every single piece of network software you > might want to run... > > Of coures you'll probably want to do the b01, b02, b03... hostname > iteration -- I'm just pulling an example at random out of my own log > tables. > > rgb > > > > > Thanks again, > > > > /jon > > > > Quoting "Robert G. Brown" : > > > > > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > > > > > Quoting "Robert G. Brown" : > > > > Is there a convenient way to obtain static ip-addresses using dhcp > > > without > > > > having to explicitly write down the mac-addresses in dhcpd.conf? > > > > > > > > Regards, > > > > > > > > /jon > > > > > > Static? As in each machine gets a single IP number that remains its own > > > "forever" through all reboots and which can be identified by a fixed > > > name in host tables? > > > > > > Following the time-honored tradition of actually reading the man pages > > > for dhcpd, we see that the answer is "sort of". As in in principle yes, > > > but only in a wierd way and would you really want to? > > > > > > First of all, let us consider, how COULD it do this? All dhcp knows of > > > a host is its mac address. System needs IP number. System broadcasts a > > > DHCP request. What can the daemon do? > > > > > > It can assign the address out of a range without looking at the MAC > > > address (beyond ensuring that isn't one that it recognizes already) or > > > it can look at the MAC address, do a table lookup and find it in the > > > table, and assign an IP address based on the table that maps MAC->IP. > > > This is pretty much what actually happens, and of course the lookup > > > table CAN ensure a static MAC->IP matchup. > > > > > > The only question is how the lookup table is constructed. > > > > > > The obvious way is by making explict per-host entries in the dhcpd.conf > > > file. dhcpd reads the file and builds the table from what it finds > > > there. You make the dhcpd.conf entries by hand or automagically by > > > means of a clever script. In general this isn't a real problem. You > > > have to make a per-host entry into e.g. /etc/hosts as well, or you won't > > > know the NAME the host is going to have to correspond to the IP number > > > the daemon happened to give it the first time it saw it. The same > > > script can do both, given e.g. the MAC address and hostname you wish to > > > assign as arguments. > > > > > > Now there is nothing to PREVENT the daemon from assigning IP numbers out > > > of the free range, creating a MAC->IP mapping, and saving the mapping > > > itself so that it is automagically reloaded after, say, a crash (which > > > tends to wipe out the table it builds in memory. By strange chance, > > > this is pretty much exactly what dhcpd does. It views IP's assigned out > > > of a given subnet range as "leases", to be given to hosts for a certain > > > amount of time and then recovered for reuse. It saves its current lease > > > table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this > > > table and "grooms" it, cleaning out expired leases so the IP numbers are > > > reused. In many/most cases where range addresses are used, this is just > > > fine. Remember, dhcp was "invented" at least in part to simplify > > > address assignment to rooms full of PC's running WinXX, a well-known > > > stupid operating system that wouldn't know what to do with a remote > > > login attempt if it saw one. Heck, it doesn't know what to do with a > > > LOCAL login a lot of the time. The IP<->name map is pretty unimportant > > > in this case, because you tend never to address the system by its > > > internet name. So it's no big deal to let IP addresses for dumb WinXX > > > clients recycle. > > > > > > Of course this isn't always true even for WinXX, especially if XX is 2K > > > or XP or NT. Sometimes systems people really like to know that log > > > traces by IP number can be mapped into specific machines just so they > > > can go around with a sucker rod (see "man syslogd" and do a search on > > > "sucker") to administer correction, for example, even if they cannot > > > remotely login to the host in question. > > > > > > dhcpd allows you to pretty much totally control the lease time used for > > > any given subnet or range. You can set it from very short to "very > > > large", probably 4 billion or so seconds, which is (practically) > > > "infinity". Infinity would be your coveted static IP address > > > assignment. > > > > > > Once again I'd argue that although you CAN do this, you probably don't > > > want to in just about any unixoid context including LAN management and > > > cluster engineering. There is something so satisfying, so USEFUL, about > > > the hostname<->IP map, and in order for this map to correspond to some > > > SPECIFIC box, you really are building the hostname<->IP<->MAC map, > > > piecewise. And of course you need to leave the NIC's in the boxes, > > > since yes the map follows the NIC and not the actual box. Although it > > > likely isn't the "only" way to control the complete chain, > > > simultaneously and explicity building /etc/hosts (or the NIS, LDAP, > > > rsync exported versions thereof), the various hostname-related > > > permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is > > > arguably the best way. > > > > > > To emphasize this last point, note that there is additional information > > > that can be specified in the dhcp static table entries, such as the name > > > of a per-host kickstart file to be used in installing it and more. dhcp > > > is at least an approximation to a centralized configuration data server > > > and can perform lots of useful services in this arena, not just handing > > > out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's > > > options can only be passed from it's own internal list, so one can't > > > QUITE use it as a way of globally synchronizing whole tables of > > > important data (like /etc/hosts,netgroups,passwd) across a subnet as > > > systems automatically and periodically renew their leases. The list of > > > options it supports as it stands now is quite large, though. > > > > > > I also don't know how susceptible it is to spoofing -- one problem with > > > daemon-based services like this is that if they aren't uniquely bound at > > > both ends to an authorized server and somebody puts a faster server on > > > the same physical network, one can sometimes do something like > > > dynamically change a systems "identity" in real time and gain access > > > privileges you otherwise might not have had. Obviously, sending files > > > like /etc/passwd around in this way would be a very dangerous thing to > > > do unless the daemon were re-engineered to use something like ssl to > > > simultaneously certify the server and encrypt the traffic. > > > > > > Hope this helps. BTW, in addition to the always useful man pages for > > > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > > > documentation project site and the various RFCs that specify dhcp's > > > behavior and option spread. > > > > > > rgb > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lightdee at netscape.net Thu Apr 11 11:34:33 2002 From: lightdee at netscape.net (lightdee@netscape.net) Date: Tue Mar 16 01:02:19 2010 Subject: How do you keep clusters running.... Message-ID: <1D889E10.452B89F1.009FF3AE@netscape.net> Doug J Nordwall wrote: >On Wed, 2002-04-03 at 13:04, Cris Rhea wrote: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 >nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply > fans). > > >You running lm_sensors on your nodes? That's a handy tool for paying >attention to things like that. We use ours in combination with ganglia >and pump it to a web page and to big brother to see when a cpu might be >getting hot, or a fan might be too slow. We actually saved a dozen >machines that way...we have 32 4 processor racksaver boxes in a rack, >and they rack was not designed to handle racksaver's fan system. That is >to say, there was a solid sidewall on the rack, and it kept in heat. I >set up lm_sensors on all the nodes (homogenous, so configured on one and >pushed it out to all), then pumped the data into ganglia >(ganglia.sourceforge.net) and then to a web page. I noticed that the >temp on a dozen of the machines was extremely high. So, I took off the >side panel of the rack. The temp dropped by 15 C on all the nodes, and >everything was within normal parameters again. > > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > >Ya, we would have seen this on ours earlier...excellent tool [snip] We use Clusterworx, which isn't open source (from Linux Networx), but it goes a step further than Ganglia. It uses lm_sensors and a power control box (again from linux networx) to actually shutdown a node if it is getting too hot, and the event parameters are all tweakable. It's always a good idea to have some kind of cluster monitoring software installed, but it's nice to be able to setup event triggers in your software in case something goes wrong and you're not around. ---- David Henry Synergy Software, Inc. lightdee@netscape.net __________________________________________________________________ Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ From ctierney at hpti.com Thu Apr 11 11:59:20 2002 From: ctierney at hpti.com (Craig Tierney) Date: Tue Mar 16 01:02:19 2010 Subject: What could be the performance of my cluster In-Reply-To: <20020406113545.91938.qmail@web10504.mail.yahoo.com>; from suraj_peri@yahoo.com on Sat, Apr 06, 2002 at 03:35:45AM -0800 References: <20020405125956.D69845@velocet.ca> <20020406113545.91938.qmail@web10504.mail.yahoo.com> Message-ID: <20020411125920.D32605@hpti.com> It depends on what you are trying to do (doesn't everyone love that answer). The number of flops your cluster can do should be equal to: flops = (no. of cpus) * (Mhz) * (flops per hz) So for your cluster flops = 8 * 1.53 Ghz * 2 I am assuming that with SSE you can get 2 flops per cycle. flops = 24.48 Gflops Now, there are some issues with this. First, you are never going to get 1.53*2 Gflops out of a single processor. Second, leveraging all 8 cpus to get their maximum is going to be difficult if there is any communication between the nodes. Compilers play a big role in extracting the best performance out of the system. If you don't have a commerical compiler from the likes of Intel or Portland Group, I highly recommend getting one. You only have to purchase the compiler for where you compile, and not where you run. You can get away with one copy of the compiler on your server. If you are trying to compare the AMD system to the DS20E system, it will depend on what you are actually trying to do. If you are running single precision floating point codes that do not require all the memory bandwidth a DS20E provides, I would think that within 10% that AMD processor will do the work of one 833 Mhz Alpha Cpu (You didn't say if you had 2 cpus in your DS20e). At least this is what I am seeing for my codes when comparing Dual Xeon's, Dual AMD's, and dual API 833 boxes. Craig On Sat, Apr 06, 2002 at 03:35:45AM -0800, Suraj Peri wrote: > Hi group, > I was calculating the performance of my cluster. The > features are > > 1. 8 nodes > 2. Processor: AMD Athlon XP 1800+ > 3. 8 CPUs > 4. 8*1.5 GB DDR RAM > 5. 1 Server with 2 processorts with AMD MP 1800+ and > 2GB DDR RAM > > I calculated this to be 48 Mflops . Is this correct ? > if not, what is the correct performance of my cluster. > I also comparatively calculated that my cluster would > be 3 times faster than AlphaServer DS20E ( 833 MHz > alpha 64 bit processor, 4 GB max memory) > > Is my calculation correct or wrong? please help me > ASAP. thanks in advance. > > cheers > suraj. > > ===== > PIL/BMB/SDU/DK > > __________________________________________________ > Do You Yahoo!? > Yahoo! Tax Center - online filing with TurboTax > http://taxes.yahoo.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From becker at scyld.com Thu Apr 11 12:16:47 2002 From: becker at scyld.com (Donald Becker) Date: Tue Mar 16 01:02:19 2010 Subject: scyld slave node problems In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Eric Miller wrote: > >I'm setting up a small cluster for my university using the Scyld distro. > >The master is up and running and now I'm trying to get the nodes to > operate. > >However, the node I'm currently working on is having some difficulties. It > >seems to be communicating with the master just fine, but when copying the > >libraries from the master it starts spitting out "try_do_free_pages failed > >for init" and similar messages. My first guess is that you don't have enough memory (64MB+) on the slave node. But this might also be a memory or disk problem. > There might be a more technical solution, but I had the same problem and was > able to solve it by booting that node diskless. Just disconnect the hard > drive, and re-boot with the boot disk. Like I said, there may be a better > way or technical solution, but ".....free_pages" led me to believe it was a > HDD problem. I booted diskless, and had no problems. You should not need to physically disconnect the hard disk. Just remove any references to /dev/hda that you added in /etc/beowulf/fstab. However if you do have a hardware problem, disconnecting the disk might avoid the symptoms. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From skruglik at gmu.edu Thu Apr 11 12:35:28 2002 From: skruglik at gmu.edu (Stepan Kruglikov) Date: Tue Mar 16 01:02:19 2010 Subject: scyld slave node problems References: <0718ABB23368D2119FC200008362AF6816BDDA@pluto.dsu.edu> Message-ID: <00a901c1e190$0924c730$c932ae81@lyapunov> Hello, I solved this problem by increasing memory on each node to 64MB. You can also get the node up and running with 64MB, setup node partitions, and then delete ram drive and run with 32mB. Although it works, I recommend doing it only in case if you are interested in proof of concept cluster. Stepan Kruglikov ----- Original Message ----- From: "Christoffersen, Neils" To: Sent: Thursday, April 11, 2002 2:09 PM Subject: scyld slave node problems > Hello all, > > I'm setting up a small cluster for my university using the Scyld distro. > The master is up and running and now I'm trying to get the nodes to operate. > However, the node I'm currently working on is having some difficulties. It > seems to be communicating with the master just fine, but when copying the > libraries from the master it starts spitting out "try_do_free_pages failed > for init" and similar messages. It seems to me that maybe the hard drive is > not being recognized and it's trying to run everything on ram and just > running out of memory. > > Does anyone know what could be causing this? I have the node log which I can > attach if you wish (I just don't have it with me at the moment). > > Thanks for any help you can lend. > > Sincerely > Neils Christoffersen > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Thu Apr 11 12:26:50 2002 From: ctierney at hpti.com (Craig Tierney) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de>; from joachim@lfbs.RWTH-Aachen.DE on Thu, Apr 11, 2002 at 08:46:28PM +0200 References: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> Message-ID: <20020411132650.A32674@hpti.com> I talked to a guy at SC2002 from Quadrics and he said that list pricing on a Quadrics network was about $3500 per node when you are in the 100s of nodes and up. The price includes the cards, cables, switches, etc. This doesn't include any sort of discount that you might get. Myrinet is about $2000 for an equivelent network at list price. Dolphin/SCI falls around $2245 list per node (if the system is > 144 nodes and you have to get the 3d card). I heard that Quadrics had a customer that just had to have an Intel/Quadrics system so either they or he was working on porting the drivers. The web page says they support Linux and Tru64. You could probably get the hardware without going through Compaq, but Compaq is most likely buying up most of the supply. Craig -- Craig Tierney (ctierney@hpti.com) On Thu, Apr 11, 2002 at 08:46:28PM +0200, Joachim Worringen wrote: > ...Iwao Makino wrote: > > > > I think ... Quadrics is another one. > [...] > > But pricing is MUCH higher than SCI/Myrinet. > > Do you have any pricing information at all? AFAIK, they are only > distribute with Compaq clusters. > > Joachim > > -- > | _ RWTH| Joachim Worringen > |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen > | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim > |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From epaulson at cs.wisc.edu Thu Apr 11 12:32:02 2002 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: ; from emiller@techskills.com on Thu, Apr 11, 2002 at 10:08:16AM -0400 References: <20020411130008.63691.qmail@web9608.mail.yahoo.com> Message-ID: <20020411143202.C27111@perdita.cs.wisc.edu> On Thu, Apr 11, 2002 at 10:08:16AM -0400, Eric Miller wrote: > > After you get the cluster up and running, that's where the help seems to > drift off. Most of the people in this group are upper-level users who know > how to get these MPI enabled programs to run on thier clusters. If you are > like me, these topics are a little foreign. If you are looking for > something to run continuously, like a display, they say the MandelBrot > renderer has a loop function, but I can't get it to work. Someone suggested > SETI many months ago, which would be perfect, but SETI does not offer an MPI > enabled program. > What possible good would an MPI-enabled SETI@Home do? The whole point of SETI@Home is that it's already parallelized. If you've got N nodes, submit N copies of SETI@home to your queuing system, and your cluster will get an N times speedup over a single node. I don't see how you can hope to do better than that. -Erik From becker at scyld.com Thu Apr 11 13:03:31 2002 From: becker at scyld.com (Donald Becker) Date: Tue Mar 16 01:02:19 2010 Subject: scyld slave node problems In-Reply-To: <00a901c1e190$0924c730$c932ae81@lyapunov> Message-ID: On Thu, 11 Apr 2002, Stepan Kruglikov wrote: > I solved this problem by increasing memory on each node to 64MB. You can > also get the node up and running with 64MB, setup node partitions, and then > delete ram drive and run with 32mB. Although it works, I recommend doing it > only in case if you are interested in proof of concept cluster. It's possible to trim the cached library list in /etc/beowulf/config and fit into 32MB. But only the most trivial application will run with 32MB and no local disk. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From fraser5 at cox.net Thu Apr 11 13:38:44 2002 From: fraser5 at cox.net (Jim Fraser) Date: Tue Mar 16 01:02:19 2010 Subject: Will the dual Tyan board boot without a graphics card installed? Message-ID: <006a01c1e198$dff52c70$0300005a@papabear> I have had this problem on a couple other boards and it can be annoying. Thanks, Jim From emiller at techskills.com Thu Apr 11 14:01:24 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: <20020411143202.C27111@perdita.cs.wisc.edu> Message-ID: >> >> Someone suggested >> SETI many months ago, which would be perfect, but SETI does not offer an MPI >> enabled program. > >What possible good would an MPI-enabled SETI@Home do? The whole point of >SETI@Home is that it's already parallelized. > My definition of parrellelized is MPI or PVM enabled code, not _distributed_ applications like SETI. When demonstrating to students the capabilities of Linux, its not nearly as convincing to just start N number of instances on N nodes. The magic stuff that we newbie cluster builders seek is not found in that. It is found in having a bona-fide cluster with master and slave nodes, and a single instance of a program being managed and executed by a group of machines. Am I alone in this opinion? >If you've got N nodes, submit N copies of SETI@home to your queuing system, >and your cluster will get an N times speedup over a single node. I don't see >how you can hope to do better than that. I was aware of this possibility, but do not have the skills to implement it. Please see my post from weeks ago, March 11th. It was SETI that I was referring to: --For non-parallel applications, is it possible to run individual instances on --diskless nodes? For example, I want to execute a non-MPI program "A" that --is located in the /bin directory of my master node, but I want to run one --instance of "A" on each of my diskless nodes. --What is the syntax that equates to: --#NP=1 "A" on node0 only --#NP=1 "A" on node1 only --#.... --#.... From math at velocet.ca Thu Apr 11 14:25:02 2002 From: math at velocet.ca (Velocet) Date: Tue Mar 16 01:02:19 2010 Subject: Will the dual Tyan board boot without a graphics card installed? In-Reply-To: <006a01c1e198$dff52c70$0300005a@papabear>; from fraser5@cox.net on Thu, Apr 11, 2002 at 04:38:44PM -0400 References: <006a01c1e198$dff52c70$0300005a@papabear> Message-ID: <20020411172502.F19272@velocet.ca> On Thu, Apr 11, 2002 at 04:38:44PM -0400, Jim Fraser's all... > > I have had this problem on a couple other boards and it can be annoying. I have found that clearing the BIOS and setting everything back up can solve this problem sometimes. /kc > > Thanks, > > Jim > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From epaulson at cs.wisc.edu Thu Apr 11 14:26:24 2002 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: ; from emiller@techskills.com on Thu, Apr 11, 2002 at 05:01:24PM -0400 References: <20020411143202.C27111@perdita.cs.wisc.edu> Message-ID: <20020411162624.E27598@perdita.cs.wisc.edu> On Thu, Apr 11, 2002 at 05:01:24PM -0400, Eric Miller wrote: > >> > >> Someone suggested > >> SETI many months ago, which would be perfect, but SETI does not offer an > MPI > >> enabled program. > > > >What possible good would an MPI-enabled SETI@Home do? The whole point of > >SETI@Home is that it's already parallelized. > > > > My definition of parrellelized is MPI or PVM enabled code, not _distributed_ > applications like SETI. When demonstrating to students the capabilities of > Linux, its not nearly as convincing to just start N number of instances on N > nodes. The magic stuff that we newbie cluster builders seek is not found in > that. It is found in having a bona-fide cluster with master and slave > nodes, and a single instance of a program being managed and executed by a > group of machines. Am I alone in this opinion? > Yes. What you'll discover is that there is no magic to cluster building. If your problem can be solved in parallel just by running N unmodified copies of your code, then that's the way to do it. And there's tons of science to be done this way (in fact, I'd bet there's more to be done this way than with big MPI jobs) If your codes to solve your problem need to be parallelized with MPI or PVM for whatever reason (maybe you don't need to solve N instanances of your code, just one instance and minimize the time, or you need more resources than any one machine can handle - ie 32 gigs of RAM or some such) then you don't really have a choice and you have to break down and do it. But again, there's no magic here. There is not a single instance of you program on the cluster - if your code is using N nodes, then there are N copies of your program on the cluster. (Yes, maybe you're using some quasi-SSI thing like Scyld or MOSIX, but as far as I know both of them still transfer the entire memory image over to the machine, and don't page things over as needed) You can write a program that works exactly like an MPI program with 0 MPI calls - whereever you'd write MPI_Send, just use BSD sockets and send things that way. Tons more to do (you have to locate all the other processes in the computation, you have to worry about buffering, failures, etc) but none of it's unknown. > >If you've got N nodes, submit N copies of SETI@home to your queuing system, > >and your cluster will get an N times speedup over a single node. I don't > see > >how you can hope to do better than that. > > I was aware of this possibility, but do not have the skills to implement it. Yes you do. Download Condor, or PBS, or Sun Grid Engine, or buy Platform LSF, and: A. Install it on N nodes B. Submit N copies or, install Scyld or MOSIX. Type: my_program & N times. -Erik From laytonjb at bellsouth.net Thu Apr 11 13:33:58 2002 From: laytonjb at bellsouth.net (Jeff Layton) Date: Tue Mar 16 01:02:19 2010 Subject: How do you keep clusters running.... References: <1D889E10.452B89F1.009FF3AE@netscape.net> Message-ID: <3CB5F336.4AEFE8EA@bellsouth.net> lightdee@netscape.net wrote: > Doug J Nordwall wrote: > > >On Wed, 2002-04-03 at 13:04, Cris Rhea wrote: > > > > What are folks doing about keeping hardware running on large clusters? > > > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 >nodes)... > > > > Sure seems like every week or two, I notice dead fans (each RS-1200 > > has 6 case fans in addition to the 2 CPU fans and 2 power supply > fans). > > > > > >You running lm_sensors on your nodes? That's a handy tool for paying > >attention to things like that. We use ours in combination with ganglia > >and pump it to a web page and to big brother to see when a cpu might be > >getting hot, or a fan might be too slow. We actually saved a dozen > >machines that way...we have 32 4 processor racksaver boxes in a rack, > >and they rack was not designed to handle racksaver's fan system. That is > >to say, there was a solid sidewall on the rack, and it kept in heat. I > >set up lm_sensors on all the nodes (homogenous, so configured on one and > >pushed it out to all), then pumped the data into ganglia > >(ganglia.sourceforge.net) and then to a web page. I noticed that the > >temp on a dozen of the machines was extremely high. So, I took off the > >side panel of the rack. The temp dropped by 15 C on all the nodes, and > >everything was within normal parameters again. > > > > > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > > > > >Ya, we would have seen this on ours earlier...excellent tool > > [snip] > > We use Clusterworx, which isn't open source (from Linux Networx), but it goes a step further than Ganglia. It uses lm_sensors and a power control > box (again from linux networx) to actually shutdown a node if it is getting > too hot, and the event parameters are all tweakable. It's always a good > idea to have some kind of cluster monitoring software installed, but it's > nice to be able to setup event triggers in your software in case something goes wrong and you're not around. You can set a shutdown temperature via the BIOS on most decent motherboards. You can also easily script this up if you have some power control unit connected to a node that you can talk to (e.g. APC's stuff). All of the stuff you need it available as Opensource. You can hook all of this together with Ganglia if you want. In fact, Matt has announced (or hinted) at the next version of Ganglia that will start to have a number of new features built in (but not nodal shutdown if I remember correctly). Jeff Layton > > > ---- > David Henry > Synergy Software, Inc. > lightdee@netscape.net > > __________________________________________________________________ > Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ > > Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Apr 11 15:58:18 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: <20020411162624.E27598@perdita.cs.wisc.edu> Message-ID: On Thu, 11 Apr 2002, Erik Paulson wrote: > > >If you've got N nodes, submit N copies of SETI@home to your queuing system, > > >and your cluster will get an N times speedup over a single node. I don't > > see > > >how you can hope to do better than that. > > > > I was aware of this possibility, but do not have the skills to implement it. > > Yes you do. Download Condor, or PBS, or Sun Grid Engine, or buy Platform LSF, > and: > A. Install it on N nodes > B. Submit N copies > > or, install Scyld or MOSIX. Type: > my_program & > > N times. And not even for SETI will you get an Nx speedup on N nodes. There is ALWAYS a serial fraction even for embarrassingly parallel applications, and the time required to send the jobs out to the nodes (relative to just looping N times on the node) is part of it. In Amdahl's Law N-fold speedup is the upper bound, not the general, practical limit. This is the basis of Eric's observation about embarassingly parallel jobs being ideal for clusters -- they're the ones that often get very close to N-fold speedup on N nodes for nearly arbitrary N. "Real" parallel jobs (ones with nontrivial communications built on MPI or PVM or raw sockets or even shared memory or some sort of specialized communications channel) almost never do this well, and more often than not will only speedup at all up to some maximum number of nodes and then actually run more slowly if further partitioned. It's also interesting that master-slave jobs were cited as being "real" parallel applications as in many cases the master is nothing more than an intelligent front end for an embarassingly parallel application core. What's the difference between using a script or Mosix or even a bunch of rsh's as the "master" that distributes the jobs and collects the results and using PVM to do exactly the same thing? Not much, really, but perhaps a small edge in network efficiency for that part of things. This may matter -- if the jobs run a short time and communicate with the master a long time it will matter -- but in cases where this paradigm makes sense at all (where the ratio of run to communication is the other way around -- lots of computation, a little communication) it won't matter much. Most of this is in any decent book on parallel computing, including at least one that is freely available on the web. Then there is my online book (which I make no claim for being "decent", but it is free:-). Lots of these resources are on or linked to various cluster sites, including: http://www.phy.duke.edu/brahma rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rickey-co at mug.biglobe.ne.jp Thu Apr 11 15:38:24 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <20020411132650.A32674@hpti.com> References: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> <20020411132650.A32674@hpti.com> Message-ID: AFAIK, it's 'BESTBUY' for 16node per node cost. @$3,500 is good guess for 16nodes including software licensing and hardware(switch, card and cables). Actually for 16 nodes costs litttle less. But for 64/128 is about that range. And for larger... 256/512/1024 and beyond, general idea of $5,000/node. These are their offering price so I assume some volume discounts are applied for larger scales. And yes, not only from Compaq we should be able to purchase, but haven't got that details answered. At 13:26 -0600 11.04.2002, Craig Tierney wrote: >I talked to a guy at SC2002 from Quadrics and he said >that list pricing on a Quadrics network was about $3500 >per node when you are in the 100s of nodes and up. >The price includes the cards, cables, switches, >etc. This doesn't include any sort of discount that you >might get. Myrinet is about $2000 for an equivelent >network at list price. Dolphin/SCI falls around $2245 list >per node (if the system is > 144 nodes and you have to get >the 3d card). Dolphin/SCI for smaller nodes(<144?) is from $1,695 and larger with 3D chain from $2,245 list. I haven't tested this new 3D version yet. >I heard that Quadrics had a customer that just had to have >an Intel/Quadrics system so either they or he was working >on porting the drivers. The web page says they support >Linux and Tru64. You could probably get the hardware without >going through Compaq, but Compaq is most likely buying up >most of the supply. I know they works on ServerWorks HE and i860 Xeon, also they are working on Plumas and GC-LE. -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ --> Now Shipping 1U Dual Athlon DDR <- --> Ask me about the new Alpha DDR UP1500 Systems <- From emiller at techskills.com Thu Apr 11 16:52:04 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: Message-ID: >And not even for SETI will you get an Nx speedup on N nodes. There is >ALWAYS a serial fraction even for embarrassingly parallel applications, >and the time required to send the jobs out to the nodes I guess what I am fundamentally saying is, for a cluster to be "working its magic" it a student's eyes consider two scenarios: 1- Running N iterations of a program, and seeing work^N being done. It's like, um... well yeah if I run SETI on 8 systems, then I will crunch 8 times as many units, but I will NOT crunch 1 unit in 1/8th the time as percieved on the front end node. -OR- 2- Having an 8 node cluster running, say, a raytracer. Then, having a solo machine running the same application. Actually seeing ONE instance render an image (roughly) 8 times faster than a single system (esp. when all of the systems were pulled out of the trash can!!), THAT is the magic that newbies and students want to see. That's the "cool" factor that bring annoying #^$%s like me to this forum and post questions that are outside the arena of analyzing proteins and DNA molecules on 256 node AthlonXP rackmounts with Myrinet. We are not experts, we have ALOT of questions, and all we want to do is see Linux do something cool that we can show our freinds/students/selves. Robert, thank you for your positive and informative reply. From sp at scali.com Thu Apr 11 20:29:07 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <20020411132650.A32674@hpti.com> Message-ID: On Thu, 11 Apr 2002, Craig Tierney wrote: > > I talked to a guy at SC2002 from Quadrics and he said > that list pricing on a Quadrics network was about $3500 > per node when you are in the 100s of nodes and up. > The price includes the cards, cables, switches, > etc. This doesn't include any sort of discount that you > might get. Myrinet is about $2000 for an equivelent > network at list price. Dolphin/SCI falls around $2245 list > per node (if the system is > 144 nodes and you have to get > the 3d card). > > This is list prices for the cards only, right ? What about the switches needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which makes the total system cost a bit lower doesn't it ?). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From sp at scali.com Thu Apr 11 20:37:17 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Iwao Makino wrote: > I think ... Quadrics is another one. > Yep, sorry I forgot that one. > Here's quick figures I have on hand.... > > RH7.2, 2.4.9 kernel for i860 cluster. > On their site, they claim; > after protocol, of 340Mbytes/second in each direction. The > process-to-process latency for remote write operations is2us, and 5us for > MPI messages. > But this 340MBytes/second and 2us latency is also chipset dependent, as I mentioned for SCI (in my examples latency was lowest on 760MPX but bandwidth was highest on IA64 460GX...). I can't imagine that the i860 can actually perform as well as 340MByte/sec since the Hub-Link (between the MCH and the P64H) has a limit of 266MByte/sec (AFAIK) .... > But pricing is MUCH higher than SCI/Myrinet. > Certainly. > Best regards, > > At 4:08 +0200 5.04.2002, Steffen Persvold wrote: > >On Thu, 4 Apr 2002, Jim Lux wrote: > > > >> What's high bandwidth? > >> What's low latency? > > > How much money do you want to spend? > >I don't want to start a flamewar here, but I _think_ (not knowing real > >numbers for other high speed interconnects) that SCI has atleast the > >lowest latency and maybe also the highest point to point bandwidth : > > > >SCI application to application latency : 2.5 us > >SCI application to application bandwidth : 325 MByte/sec > > > >Note that these numbers are very chipset specific (as most high speed > >interconnect numbers are), these numbers are from IA64. Here are numbers > >from a popular IA32 platform, the AMD 760MPX : > > > >SCI application to application latency : 1.8 us > >SCI application to application bandwidth : 283 MByte/sec > > Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From justin at cs.duke.edu Thu Apr 11 20:54:16 2002 From: justin at cs.duke.edu (Justin Moore) Date: Tue Mar 16 01:02:19 2010 Subject: DHCP Help Again In-Reply-To: Message-ID: Hello all, Part of a project I've been working on deals some with boot management and DHCP specifically. I'm not familiar with the NPACI solution/codebase, but I hacked a version of proxydhcp to work with a MySQL backend. It has some nice hooks in it which let you know if the machine is booting PXE or booting from dhclient/pump/whatever. I think having the DB backend is a little nicer than having to worry about the leases file (XML or not) since it gives you more fine-grained control over who has access to the information and how that information gets parsed. Plus it can detect when a new host is coming up and add a mapping to the DB without requiring you to parse through /var/log/messages. :) Obviously my code has some parts which are somewhat project-specific for me (I don't think everyone wants to boot off the same ramdisk I do by default :)) but I could post the code in a few weeks (deadlines coming up) if anyone's interested in such a beast. Another nice part of the DB backend is that generating a future dhcpd.conf file is pretty easy: mysql_query("SELECT HWaddr,IPaddr FROM nics ORDER BY IPaddr"); and then spew the output to a file as desired. :) -jdm Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu On Thu, 11 Apr 2002, Robert G. Brown wrote: > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > Very helpful! Thanks! > > > > But I'm still curious about how you make - automagically - the hardware ethernet > > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > > of would be to use kickstart and: > > > > Install the machines and boot them up in sequence and using the range statement > > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > > 192.168.1.102 ...) > > > > Once all nodes are up use some script to extract the mac addresses for all the > > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > > hardwire the ip-addresses to each node. > > > > But I'm sure there are better ways to do this? > > Not that I know of. Maybe somebody else knows of one. I'd just use > perl or bash (either would probably work, although parsing is generally > easier in perl), parse e.g. > > Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 > Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 > > from /var/log/messages on the dhcp server, and write an output routine > to generate > > # golem (Linux/Windows laptop lilith, second/100BT interface) > host golem { > hardware ethernet 00:20:e0:6d:a0:05; > fixed-address 192.168.1.140; > next-server 192.168.1.131; > option routers 192.168.1.1; > option domain-name "rgb.private.net"; > option host-name "golem"; > } > > and > > 192.168.1.140 golem.rgb.private.net golem > > and append them to /etc/dhcpd.conf and /etc/hosts respectively, and then > distribute copies of the resulting /etc/hosts -- as Josip made > eloquently clear your private internal network should resolve > consistently on all PIN hosts and probably should have SOME sort of > domainname defined so that software the might include a > getdomainbyname() call and might not include an adequate check and > handle of a null value can cope. It's hard to know what assumptions > were made by the designer of every single piece of network software you > might want to run... > > Of coures you'll probably want to do the b01, b02, b03... hostname > iteration -- I'm just pulling an example at random out of my own log > tables. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From hartner at cs.utah.edu Thu Apr 11 20:55:00 2002 From: hartner at cs.utah.edu (Mark Hartner) Date: Tue Mar 16 01:02:19 2010 Subject: (no subject) In-Reply-To: Message-ID: > analyzing proteins and DNA molecules on 256 node AthlonXP rackmounts with > Myrinet. We are not experts, we have ALOT of questions, and all we want to > do is see Linux do something cool that we can show our > freinds/students/selves. How about encoding some mp3's www.osl.ui.edu/~jsquyres/bladeenc/ Mark From patrick at myri.com Thu Apr 11 22:40:08 2002 From: patrick at myri.com (Patrick Geoffray) Date: Tue Mar 16 01:02:19 2010 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB67338.8080906@myri.com> Steffen Persvold wrote: >>I talked to a guy at SC2002 from Quadrics and he said >>that list pricing on a Quadrics network was about $3500 >>per node when you are in the 100s of nodes and up. >>The price includes the cards, cables, switches, >>etc. This doesn't include any sort of discount that you >>might get. Myrinet is about $2000 for an equivelent >>network at list price. Dolphin/SCI falls around $2245 list >>per node (if the system is > 144 nodes and you have to get >>the 3d card). > > This is list prices for the cards only, right ? Not for Myrinet. Actually $2000 per node is the total cost (NIC/cable/port/software) for the high-end products (with L9/200 MHz), should be more like $1500 for low-end ones. Craig is spoiled, only buys the top stuff :-) > What about the switches > needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which > makes the total system cost a bit lower doesn't it ?). Dunno for QSW, but the NIC represent roughly 3/4 of the price per node for Myrinet. Sure, as the smallest switch has 8 ports (16 ports chassis and one blade with 8 fibers), It is not interesting for very small configurations, i.e less than 8 nodes, but I don't think it's Myricom's market. It's a common mistake to believe that switchless solutions are by definition cheaper. Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From manel at labtie.mmt.upc.es Fri Apr 12 01:12:49 2002 From: manel at labtie.mmt.upc.es (Manel Soria) Date: Tue Mar 16 01:02:19 2010 Subject: power control Message-ID: <3CB69701.B9AC38C5@labtie.mmt.upc.es> We need a power control unit for our 72 nodes cluster. My first idea was to do it ourselves with a digital i/o card and a set of relais, but I can't find such a card for Linux. Actually, I have an ISA card that is perfect for this application but for some reason with PCI bus it is more difficult. Also, it seems that the "normal" solution is to buy a comercial APC system. Any experiences with in-house made power controls ? Would you recomend us to buy the APC product ? -- =============================================== Dr. Manel Soria ETSEIT - Centre Tecnologic de Transferencia de Calor C/ Colom 11 08222 Terrassa (Barcelona) SPAIN Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 E-Mail: manel@labtie.mmt.upc.es -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020412/9b8380df/attachment.html From manel at labtie.mmt.upc.es Fri Apr 12 01:53:09 2002 From: manel at labtie.mmt.upc.es (Manel Soria) Date: Tue Mar 16 01:02:20 2010 Subject: power control Message-ID: <3CB6A075.5AE6626@labtie.mmt.upc.es> We need a power control unit for our 72 nodes cluster. My first idea was to do it ourselves with a digital i/o card and a set of relais, but I can't find such a card for Linux. Actually, I have an ISA card that is perfect for this application but for some reason with PCI bus it is more difficult. Also, it seems that the "normal" solution is to buy a comercial APC system. Any experiences with in-house made power controls ? Would you recomend us to buy the APC product ? -- =============================================== Dr. Manel Soria ETSEIT - Centre Tecnologic de Transferencia de Calor C/ Colom 11 08222 Terrassa (Barcelona) SPAIN Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 E-Mail: manel@labtie.mmt.upc.es From suraj_peri at yahoo.com Fri Apr 12 03:16:15 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster Message-ID: <20020412101615.77502.qmail@web10507.mail.yahoo.com> Hi group, I was calculating the performance of my cluster. The features are 1. 8 nodes 2. Processor: AMD Athlon XP 1800+ 3. 8 CPUs 4. 8*1.5 GB DDR RAM 5. 1 Server with 2 processorts with AMD MP 1800+ and 2GB DDR RAM I calculated this to be 48 Mflops . Is this correct ? if not, what is the correct performance of my cluster. I also comparatively calculated that my cluster would be 3 times faster than AlphaServer DS20E ( 833 MHz alpha 64 bit processor, 4 GB max memory) Is my calculation correct or wrong? please help me ASAP. thanks in advance. cheers suraj. ===== PIL/BMB/SDU/DK ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From suraj_peri at yahoo.com Fri Apr 12 03:32:34 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster In-Reply-To: <20020411125920.D32605@hpti.com> Message-ID: <20020412103234.7580.qmail@web10508.mail.yahoo.com> Hi Craig, Many thanks for your mail. please excuse me for asking a dumb question and I am novice in this area. I am interested in using this cluster for BLAST purposes. I want to store ESTs( Expressed Sequence Tags) and GenBank ( nucleotide sequence database) and GenPept ( Protein sequence database) and total predicted protein sets of Human genome. I will use BLAST ( basic local alignment search tool algorithm) on this cluster. As the computataions are intensive and time consuming. So I wanted to compare the AlphaServer DS20E and my cluster in their computing abilities. Because there are no one in my friend circles no about this. Please help me if you have used clusters for BLAST purpose. thanks Suraj. --- Craig Tierney wrote: > It depends on what you are trying to do (doesn't > everyone > love that answer). > > The number of flops your cluster can do should > be equal to: > > flops = (no. of cpus) * (Mhz) * (flops per hz) > > So for your cluster > > flops = 8 * 1.53 Ghz * 2 > > I am assuming that with SSE you can get 2 flops > per cycle. > > flops = 24.48 Gflops > > Now, there are some issues with this. First, you > are never > going to get 1.53*2 Gflops out of a single > processor. Second, > leveraging all 8 cpus to get their maximum is going > to be > difficult if there is any communication between the > nodes. > > Compilers play a big role in extracting the best > performance > out of the system. If you don't have a commerical > compiler > from the likes of Intel or Portland Group, I highly > recommend > getting one. You only have to purchase the compiler > for where > you compile, and not where you run. You can get > away with > one copy of the compiler on your server. > > If you are trying to compare the AMD system to the > DS20E system, > it will depend on what you are actually trying to > do. If > you are running single precision floating point > codes that do > not require all the memory bandwidth a DS20E > provides, I would > think that within 10% that AMD processor will do the > work > of one 833 Mhz Alpha Cpu (You didn't say if you had > 2 cpus > in your DS20e). At least this is what I am seeing > for my codes when comparing Dual Xeon's, Dual AMD's, > and > dual API 833 boxes. > > Craig > > > > > > On Sat, Apr 06, 2002 at 03:35:45AM -0800, Suraj Peri > wrote: > > Hi group, > > I was calculating the performance of my cluster. > The > > features are > > > > 1. 8 nodes > > 2. Processor: AMD Athlon XP 1800+ > > 3. 8 CPUs > > 4. 8*1.5 GB DDR RAM > > 5. 1 Server with 2 processorts with AMD MP 1800+ > and > > 2GB DDR RAM > > > > I calculated this to be 48 Mflops . Is this > correct ? > > if not, what is the correct performance of my > cluster. > > I also comparatively calculated that my cluster > would > > be 3 times faster than AlphaServer DS20E ( 833 MHz > > alpha 64 bit processor, 4 GB max memory) > > > > Is my calculation correct or wrong? please help me > > ASAP. thanks in advance. > > > > cheers > > suraj. > > > > ===== > > PIL/BMB/SDU/DK > > > > __________________________________________________ > > Do You Yahoo!? > > Yahoo! Tax Center - online filing with TurboTax > > http://taxes.yahoo.com/ > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Craig Tierney (ctierney@hpti.com) ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From rickey-co at mug.biglobe.ne.jp Thu Apr 11 20:54:59 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: References: Message-ID: At 5:29 +0200 12.04.2002, Steffen Persvold wrote: >On Thu, 11 Apr 2002, Craig Tierney wrote: > >> >> I talked to a guy at SC2002 from Quadrics and he said >> that list pricing on a Quadrics network was about $3500 >> per node when you are in the 100s of nodes and up. >> The price includes the cards, cables, switches, >> etc. This doesn't include any sort of discount that you >> might get. Myrinet is about $2000 for an equivelent >> network at list price. Dolphin/SCI falls around $2245 list >> per node (if the system is > 144 nodes and you have to get >> the 3d card). >> >> > >This is list prices for the cards only, right ? What about the switches >needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which >makes the total system cost a bit lower doesn't it ?). As said on above, QsNet is per node price including all. Do does Myrinet, so SCI/Dolphin and Myrinet is about equal. SCI has good idea of not using switches, but on the other hand, it is little more complex to connect. -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ --> Now Shipping 1U Dual Athlon DDR <- --> Ask me about the new Alpha DDR UP1500 Systems <- From sp at scali.com Fri Apr 12 06:10:34 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: Message-ID: On Fri, 12 Apr 2002, Iwao Makino wrote: > At 5:29 +0200 12.04.2002, Steffen Persvold wrote: > >On Thu, 11 Apr 2002, Craig Tierney wrote: > > > >> > >> I talked to a guy at SC2002 from Quadrics and he said > >> that list pricing on a Quadrics network was about $3500 > >> per node when you are in the 100s of nodes and up. > >> The price includes the cards, cables, switches, > >> etc. This doesn't include any sort of discount that you > >> might get. Myrinet is about $2000 for an equivelent > >> network at list price. Dolphin/SCI falls around $2245 list > >> per node (if the system is > 144 nodes and you have to get > >> the 3d card). > >> > >> > > > >This is list prices for the cards only, right ? What about the switches > >needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which > >makes the total system cost a bit lower doesn't it ?). > > As said on above, QsNet is per node price including all. > Do does Myrinet, so SCI/Dolphin and Myrinet is about equal. > Yes, sorry I missed that statement :) > SCI has good idea of not using switches, but on the other hand, it is > little more complex to connect. > True, but if one of your Myrinet switches breaks down you loose 64 nodes in a 256 node system (standard "CLOS" configuration). I don't know the MBTF for Myrinet switches, but I would expect it to be rather high (redundant power supplies ?). Please don't misunderstand me, I find the Myrinet interconnect very interesting and also competitive with SCI both from a technological point of view and wrt. pricing. The only thing this list is lacking is some head to head performance comparisons of the different interconnects e.g some NAS benchmarks and maybe also PMB. Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From sp at scali.com Fri Apr 12 06:15:37 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:20 2010 Subject: power control In-Reply-To: <3CB6A075.5AE6626@labtie.mmt.upc.es> Message-ID: On Fri, 12 Apr 2002, Manel Soria wrote: > > We need a power control unit for our 72 nodes cluster. My first > idea was to do it ourselves with a digital i/o card and a set of relais, but > I can't find such a card for Linux. Actually, I have an ISA card that > is perfect for this application but for some reason with PCI bus it > is more difficult. Also, it seems that the "normal" solution is to buy a > comercial APC system. > > Any experiences with in-house made power controls ? Would you > recomend us to buy the APC product ? > We've had success with the Baytech (www.baytechdcd.com) RPC-3 units. The only disadvantage is that they only have 8 controllable ports (which means that you need 9 of them...). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From rgb at phy.duke.edu Fri Apr 12 06:17:57 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:20 2010 Subject: (no subject) In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Eric Miller wrote: > Myrinet. We are not experts, we have ALOT of questions, and all we want to > do is see Linux do something cool that we can show our > freinds/students/selves. > > Robert, thank you for your positive and informative reply. I appreciate your interest and understand your goals, actually; whenever I present beowulfery to a new group (which I seem to do three or four times a year) I do exactly the same thing -- a bit of a dog and pony show. In addition to the pvm povray, I like the pvm mandelbrot set demo (xep) which I've hacked so the colormap is effectively deeper and so that it doesn't run out of floating point room so rapidly. I've been using or playing with mandelbrot set demo programs long enough that I can remember when it would take a LONG time to update a single rubberbanded section. Nowadays one can quickly enough get to the bottom of double precision resolution even on a single CPU -- 13 digits isn't really all that many when you rubberband down close to an order of magnitude at a time. Still, with even a small cluster you can get nearly linear speedup and actually "see" the nodes returning their independent strips -- if you have mix of "slow" nodes and faster ones you can even learn some useful things about parallel programming just watching them come in and discussing what you see. The only point I was making is that your class should definitely take the time to go over at least Amdahl's law and one of the improved estimates that account for both the serial fraction and the communications time, and get some understanding of the embarassingly parallel (SETI, distributed monte carlo) -> coarse grained, non-synchronous (pvmpov, xep) -> coarse grained, synchronous (lattice partitioned monte carlo) -> medium-to-fine grained, (non-)synchronous (galactic evolution, weather models) sequencing where for each step up the chain one has to exercise additional care in engineering an effective cluster to deal with it. EP chores (as Eric pointed out) are "perfect" for a cluster because "any" cluster or parallel computer including the simplest SMP boxes will do. Coarse grained tasks will also generally run well on a "standard" linux cluster -- a bunch of boxes on a network, where the kind of network and whether the boxes are workstations, desktops in active use, or dedicated nodes doesn't much matter. When you hit synchronous tasks in general, but especially the finer grained synchronous tasks (tasks where all nodes have to complete a parallel computation sequence -- reach a "barrier" -- and then exchange information before beginning the next parallel computation sequence) then you really have to start paying attention to the network (latency and bandwidth both), it helps to have dedicated nodes that AREN'T doing double duty as workstations (since the rate of progress is determined by the slowest node), and most of these tasks have a strict upper bound on the number of nodes that one can assign to a task and still decrease the time of completion. This last point is a very important one. It is easy to see a coarse grained task speed up N-fold on N nodes and conclude that all problems can them be solved faster if we just add more nodes. Make sure that your students see that this is not so, so that if they ever DO engineer a compute cluster to accomplish some particular task, they don't just buy lots of nodes, but instead do the arithmetic first... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From bob at drzyzgula.org Fri Apr 12 06:43:31 2002 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Tue Mar 16 01:02:20 2010 Subject: power control In-Reply-To: <3CB6A075.5AE6626@labtie.mmt.upc.es> References: <3CB6A075.5AE6626@labtie.mmt.upc.es> Message-ID: <20020412094331.I20839@www2> We have good luck with the Pulizzi "Z-line" controllers: http://www.pulizzi.com/ Their high-end units can be networked into an RS-485 chain, so that dozens of units can be controled from a single serial interface. --BOb On Fri, Apr 12, 2002 at 10:53:09AM +0200, Manel Soria wrote: > > We need a power control unit for our 72 nodes cluster. My first > idea was to do it ourselves with a digital i/o card and a set of relais, but > I can't find such a card for Linux. Actually, I have an ISA card that > is perfect for this application but for some reason with PCI bus it > is more difficult. Also, it seems that the "normal" solution is to buy a > comercial APC system. > > Any experiences with in-house made power controls ? Would you > recomend us to buy the APC product ? > > -- > =============================================== > Dr. Manel Soria > ETSEIT - Centre Tecnologic de Transferencia de Calor > C/ Colom 11 08222 Terrassa (Barcelona) SPAIN > Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 > E-Mail: manel@labtie.mmt.upc.es > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Fri Apr 12 07:19:07 2002 From: timm at fnal.gov (Steven Timm) Date: Tue Mar 16 01:02:20 2010 Subject: power control In-Reply-To: <3CB69701.B9AC38C5@labtie.mmt.upc.es> Message-ID: We run with the APC units on our cluster. We use the vertical-mount strips which are good for 20 amps apiece. They provide three major benefits. One is that you can sequence the power up to have a delay so that the power draw of all systems coming on at once doesn't saturate your circuit. Second is that you can remote reset a node that is hung and not responding without going into the computer room. Third is that you get a real-time monitor of how much current your systems are drawing. They are somewhat expensive but do have good discounts for educational institutions and non-profits. Steve Timm ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Operating Systems Support Scientific Computing Support Group--Computing Farms Operations On Fri, 12 Apr 2002, Manel Soria wrote: > We need a power control unit for our 72 nodes cluster. My first > idea was to do it ourselves with a digital i/o card and a set of relais, but > I can't find such a card for Linux. Actually, I have an ISA card that > is perfect for this application but for some reason with PCI bus it > is more difficult. Also, it seems that the "normal" solution is to buy a > comercial APC system. > > Any experiences with in-house made power controls ? Would you > recomend us to buy the APC product ? > > -- > =============================================== > Dr. Manel Soria > ETSEIT - Centre Tecnologic de Transferencia de Calor > C/ Colom 11 08222 Terrassa (Barcelona) SPAIN > Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 > E-Mail: manel@labtie.mmt.upc.es > > > From muno at aem.umn.edu Fri Apr 12 07:54:05 2002 From: muno at aem.umn.edu (Ray Muno) Date: Tue Mar 16 01:02:20 2010 Subject: power control In-Reply-To: References: <3CB6A075.5AE6626@labtie.mmt.upc.es> Message-ID: <20020412095405.A10200@aem.umn.edu> We are using a variety of Baytech power control units in our 2 clusters. We have 3 RPC28 units powering 48 1U Dual PIII machines in 2 racks. They have 30A inputs and 21 outlets (20 controlled, 1 always on). With the Dual PIII boxes, I can only run 16 from each strip. Each strip is divided in to a pair of 15A segments, 8 machines per segment. At full load, 10 machines was too much for a 15A segment. I was really suprised at the increase in power draw under load when we first started running these. In addition, there are 2 RPC4-20 running the disk arrays, server boxes and ethernet and Myrinet switches in the 2 racks. All told, 130A availble in the 2 racks, all pretty well utilized. We also have a pair of RPC3-20 powering 2 racks of Alpha machines. These have ethernet interfaces but we decided later it was not worth the added cost. I could not be happier with these units. They can be configured to stage the startup of the machine in sequence so you do not try and power up 48 machines all at one time. On Fri, Apr 12, 2002 at 03:15:37PM +0200, Steffen Persvold wrote: > On Fri, 12 Apr 2002, Manel Soria wrote: > > > > > We need a power control unit for our 72 nodes cluster. My first > > idea was to do it ourselves with a digital i/o card and a set of relais, but > > I can't find such a card for Linux. Actually, I have an ISA card that > > is perfect for this application but for some reason with PCI bus it > > is more difficult. Also, it seems that the "normal" solution is to buy a > > comercial APC system. > > > > Any experiences with in-house made power controls ? Would you > > recomend us to buy the APC product ? > > > > We've had success with the Baytech (www.baytechdcd.com) RPC-3 units. The > only disadvantage is that they only have 8 controllable ports (which > means that you need 9 of them...). > > Regards, > -- > Steffen Persvold | Scalable Linux Systems | Try out the world's best > mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ============================================================================= Ray Muno http://www.aem.umn.edu/people/staff/muno University of Minnesota e-mail: muno@aem.umn.edu Aerospace Engineering and Mechanics Phone: (612) 625-9531 110 Union St. S.E. FAX: (612) 626-1558 Minneapolis, Mn 55455 ============================================================================= From patrick at myri.com Fri Apr 12 08:17:08 2002 From: patrick at myri.com (Patrick Geoffray) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB6FA74.6040704@myri.com> Steffen Persvold wrote: > True, but if one of your Myrinet switches breaks down you loose 64 nodes > in a 256 node system (standard "CLOS" configuration). I don't know the > MBTF for Myrinet switches, but I would expect it to be rather high > (redundant power supplies ?). The calculated MTBF of the switches is +50 years. Actually, if all 6 fans go off, it will still work, then the switch will drop more and more packets, then the uC will shutdown the blades one by one if they reach the critical temperature limit. If there is a failure on a blade itself, it will affect only 8 ports. If there is a failure in a crossbar on the backplane, the mapper will use a redondant route (as many redondant routes as crossbars, so a failure in each 8 crossbars on the backplane is required to loose all ports). Chuck made a very nice talk at Cluster2001 about Clos topology. It presents thing very clearly, I like it a lot: http://www.cacr.caltech.edu/cluster2001/program/talks/seitz.pdf Regards. Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From eugen at leitl.org Fri Apr 12 08:55:02 2002 From: eugen at leitl.org (Eugen Leitl) Date: Tue Mar 16 01:02:20 2010 Subject: [gamma_sw] New release of GAMMA available (fwd) Message-ID: ---------- Forwarded message ---------- Date: Fri, 12 Apr 2002 17:34:47 +0200 (MET DST) From: Giuseppe Ciaccio To: GAMMA mailing list Subject: [gamma_sw] New release of GAMMA available A wonderful, new release of GAMMA is available for download: http://www.disi.unige.it/project/gamma, section "How to install" Main features: 1) A driver for the Netgear GA621/GA622 Gigabit Ethernet adapter is now provided. The driver has been excellently implemented by Marco Ehlert (mehlert@cs.uni-potsdam.de), with support of prof. Bettina Schnor (schnor@cs.uni-potsdam.de) and in cooperation with myself (supported by prof. Schnor during a nice stage at Potsdam). I thank Marco and Bettina very much for this beautiful experience. The driver has been tested on a 16-nodes cluster using the GA621 adapters. We still miss tests on the GA622 (which should be backward-compatible). Performance numbers are impressive. The Potsdam testbed was a pair of back-to-back connected PCs, each with CPU Intel Pentium III 1 GHz motherboard: SuperMicro 370DE6 (chipset: ServerSet III HE-SL) 133 MHz FSB PCI bus 66 MHz, 64 bit Netgear GA621 adapter, dedicated to GAMMA Linux 2.4.16 + GAMMA On such a testbed, Marco got the following numbers: MTU size Latency (usec) Throughput (MByte/s) 1500 8.5 118.5 4116 8.5 122 2) Minor changes to the GAMMA user API: the family of set_port() routines has been slightly rearranged. This has implications on MPI/GAMMA, a new release of which is also available for download: http://www.disi.unige.it/project/gamma/mpigamma Older versions of MPI/GAMMA will no longer compile under the current version of GAMMA. 3) Documentation has been updated. The mysterious lock-up problems reported by someone on this mailing list might have been caused by the use of gcc 2.96. Still investigating...but I'm not yet able to reproduce the bug here (because I don't use gcc 2.96 ?). Enjoy! Giuseppe Ciaccio http://www.disi.unige.it/person/CiaccioG/ DISI - Universita' di Genova via Dodecaneso 35 16146 Genova, Italy phone +39 10 353 6638 fax +39 010 3536699 ciaccio@disi.unige.it ------------------------------------------------------------------------ _______________________________________________ gamma_sw mailing list gamma_sw@lists.dsi.uniroma1.it http://lists.dsi.uniroma1.it/mailman/listinfo/gamma_sw From hungjunglu at yahoo.com Fri Apr 12 08:56:35 2002 From: hungjunglu at yahoo.com (Hung Jung Lu) Date: Tue Mar 16 01:02:20 2010 Subject: BLAS-1, AMD, Pentium, gcc Message-ID: <20020412155635.92564.qmail@web12605.mail.yahoo.com> Hi, I am thinking in migrating some calculation programs from Windows to Linux, maybe eventually using a Beowulf cluster. However, I am kind of worried after I read in the mailing list archive about lack of CPU-optimized BLAS-1 code in Linux systems. Currently I run on a Wintel (Windows+Pentium) machine, and I know it's substantially faster than equivalent AMD machine, because I use the Intel's BLAS (MKL) library. (I apologize for any misapprehensions in what follows... I am only starting to explore in this arena.) (1) Does anyone know when gcc will have memory prefetching features? Any time frame? I can notice very significant performance improvement on my Wintel machine, and I think it's due to memory prefetching. (2) I am a bit confused on the following issue: Intel does release MKL for Linux. So, does this mean that if I use Pentium, I still get full benefit of the CPU-optimized features in BLAS-1, despite of gcc does not do memory prefetching? How is this possible? (3) Related to the above: for general linear algebra operations, is Pentium processor then better than AMD, since Intel has the machine-optimized BLAS library? I get contradictory information sometimes... I've seen somewhere that Pentium-4 compares unfavorably with AMD chips in calculation speed... Any opinions? thanks, Hung Jung Lu __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From rochus.schmid at ch.tum.de Fri Apr 12 09:33:18 2002 From: rochus.schmid at ch.tum.de (Dr. Rochus Schmid) Date: Tue Mar 16 01:02:20 2010 Subject: k7s5a mobo based cluster Message-ID: <3CB70C4E.9D1E720E@ch.tum.de> dear wulfers, i recently assembled a tiny and cheap (i know :-) cluster using the ecs k7s5a mobo (SIS 735 chipset). the board is very cheap (~75 ?) and comes with FE (SIS900) onboard. just want to tell about my experience (other hardware, netbooot + stream / netpipe results). i would be happy to know of others using this board. this mail also contains some questions ... maybe someone can answer or help me here? if this doesn't interst you please skip - apologies for the bandwidth. ############# HARDWARE (currently 4 nodes .. hope to get 4 more :-) mobo: k7s5a cpu: athlonxp 1,4 ghz (1600+) ram: 256 MB DDR (266MHz, CL2) graphics: various pci/agp graphics cards i could find floppy small tower with 250W PS nodes are diskless, master has an additional 40GB IDE disk. switch: D-Link DES 1008D 8 port switch. ############## POWER /GRAPHICS the cluster has continuously been up for about 3 weeks now with quite some load for most of the time. as far as i can tell, the 250W seems to be ok for the board and the 1,4 ghz athlonxp. the ami-bios does not allow booting without a graphics adapter. someone on the net (using a lot of the boards for a SETI@home "farm" told me that he did not get around it even with teaking tools for the ami-bios. i am happy to have a console for maintenance but one has to find a cheapo graphics card ... anyone out there managed to avoid this? ############## NETBOOT / OS i run a RH7.2 on it with a 2.4.17 kernel with NFS-root. the bios supports the RPL protocol for netbooting. i tried the rpld for linux and the board seems to communicate with the rpl-server and download something, but i didnt get it to boot. the rpld-developers sent me some patch to "switch off a DMA channel" of the onboard NIC but i have to admit that i didnt really understand what to do, nor did i try it. i currently boot from a syslinux floppy and use NFS-root. did soemone manage to netboot linux with this hardware? ############### STREAM because of the comments on the gcc versions 2.96 versus 2.95 issue mentioned on the ATLAS webpages i reinstalled the gcc 2.95 and found differences also for stream results and therefore i will post both here (both compiled with -O2 for comparison) Array size = 2000000, Offset = 0 gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98) -O2 Function Rate (MB/s) RMS time Min time Max time Copy: 666.0980 0.0489 0.0480 0.0559 Scale: 585.3939 0.0547 0.0547 0.0549 Add: 726.1178 0.0662 0.0661 0.0663 Triad: 679.6655 0.0707 0.0706 0.0707 gcc version 2.95.3 20010315 (release) -O2 Function Rate (MB/s) RMS time Min time Max time Copy: 727.6031 0.0440 0.0440 0.0443 Scale: 627.5864 0.0526 0.0510 0.0649 Add: 798.1775 0.0602 0.0601 0.0603 Triad: 727.6691 0.0660 0.0660 0.0661 i guess it is obvious to reinstall gcc-2.95 when using RH7.1 or RH7.2. these results are not as good as reported recently for the nforce chipset. i tried to set the bios settings for the ddr-ram to optimal, but i didnt test/experiment. ########## NetPipe-2.4 The following results are NOT from a crossconnect cable but measured throug the D-Link switch!! kernel 2.4.17 / NIC-driver SIS900 for MPI: LAM-MPI 6.5.1 NPtcp: latency: 33 us bandwidth: ~89.7 MBit/s NPmpi: latency: 41 us bandwidth: ~82 MBit/s (maximux at about 85 MBit/s) the latency of around 40 microsec seems to be very low as far as i can tell from the information on the net (i am absolutly a beginner in this field). is there anything one can seriously do wrong? i tried it a couple of times. between different nodes always with basically the same result. ################### i hope i did not anoy the pros on this list too much, and this is helpfull for comparison. again: please contact me off list if you also use this type of hardware. thanks and best greetings from munich, rochus -- Dr. Rochus Schmid Technische Universit?t M?nchen Lehrstuhl f. Anorganische Chemie Lichtenbergstrasse 4, 85747 Garching Tel. ++49 89 2891 3174 Fax. ++49 89 2891 3473 Email rochus.schmid@ch.tum.de From ctierney at hpti.com Fri Apr 12 10:01:51 2002 From: ctierney at hpti.com (Craig Tierney) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB67338.8080906@myri.com>; from patrick@myri.com on Fri, Apr 12, 2002 at 01:40:08AM -0400 References: <3CB67338.8080906@myri.com> Message-ID: <20020412110151.A1491@hpti.com> On Fri, Apr 12, 2002 at 01:40:08AM -0400, Patrick Geoffray wrote: > Steffen Persvold wrote: > > >>I talked to a guy at SC2002 from Quadrics and he said > >>that list pricing on a Quadrics network was about $3500 > >>per node when you are in the 100s of nodes and up. > >>The price includes the cards, cables, switches, > >>etc. This doesn't include any sort of discount that you > >>might get. Myrinet is about $2000 for an equivelent > >>network at list price. Dolphin/SCI falls around $2245 list > >>per node (if the system is > 144 nodes and you have to get > >>the 3d card). > > > > This is list prices for the cards only, right ? > > Not for Myrinet. Actually $2000 per node is the total cost > (NIC/cable/port/software) for the high-end products (with L9/200 MHz), > should be more like $1500 for low-end ones. Craig is spoiled, only buys > the top stuff :-) Sorry Patrick. The problem with trying to state numbers is that if you get it wrong, then the real knowledgeable ones can point it out. I figure out list cost on a 256 node system at about $2000 before for basic hardware. I as wrong. I reworked it and it is $1500 for 256 (and would be the same for 512 and 1024). I thought I had decent information. What I was trying to provide was info on the three options. The SCI price is per node (cards, cables, software). It is $2245 list. This would be appropriate for systems over 144 nodes. The Quadrics number I got from the rep might have been the sales number, and not a perfect comparsion to all the required hardware for a system of a few hundred nodes. Craig From ctierney at hpti.com Fri Apr 12 10:15:52 2002 From: ctierney at hpti.com (Craig Tierney) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster In-Reply-To: <20020412103234.7580.qmail@web10508.mail.yahoo.com>; from suraj_peri@yahoo.com on Fri, Apr 12, 2002 at 03:32:34AM -0700 References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> Message-ID: <20020412111552.B1491@hpti.com> All my experience is with oceanography and atmospheric applications. Is the BLAST code something that spends lots of time trying doing lots of little calculations, or doing one big calculation? How important is the speed of access to the database? What is the memory footprint of the code when it runs on the DS20E? Craig On Fri, Apr 12, 2002 at 03:32:34AM -0700, Suraj Peri wrote: > Hi Craig, > Many thanks for your mail. please excuse me for asking > a dumb question and I am novice in this area. > I am interested in using this cluster for BLAST > purposes. I want to store ESTs( Expressed Sequence > Tags) and GenBank ( nucleotide sequence database) and > GenPept ( Protein sequence database) and total > predicted protein sets of Human genome. > I will use BLAST ( basic local alignment search tool > algorithm) on this cluster. As the computataions are > intensive and time consuming. > So I wanted to compare the AlphaServer DS20E and my > cluster in their computing abilities. > Because there are no one in my friend circles no about > this. Please help me if you have used clusters for > BLAST purpose. > thanks > Suraj. > > --- Craig Tierney wrote: > > It depends on what you are trying to do (doesn't > > everyone > > love that answer). > > > > The number of flops your cluster can do should > > be equal to: > > > > flops = (no. of cpus) * (Mhz) * (flops per hz) > > > > So for your cluster > > > > flops = 8 * 1.53 Ghz * 2 > > > > I am assuming that with SSE you can get 2 flops > > per cycle. > > > > flops = 24.48 Gflops > > > > Now, there are some issues with this. First, you > > are never > > going to get 1.53*2 Gflops out of a single > > processor. Second, > > leveraging all 8 cpus to get their maximum is going > > to be > > difficult if there is any communication between the > > nodes. > > > > Compilers play a big role in extracting the best > > performance > > out of the system. If you don't have a commerical > > compiler > > from the likes of Intel or Portland Group, I highly > > recommend > > getting one. You only have to purchase the compiler > > for where > > you compile, and not where you run. You can get > > away with > > one copy of the compiler on your server. > > > > If you are trying to compare the AMD system to the > > DS20E system, > > it will depend on what you are actually trying to > > do. If > > you are running single precision floating point > > codes that do > > not require all the memory bandwidth a DS20E > > provides, I would > > think that within 10% that AMD processor will do the > > work > > of one 833 Mhz Alpha Cpu (You didn't say if you had > > 2 cpus > > in your DS20e). At least this is what I am seeing > > for my codes when comparing Dual Xeon's, Dual AMD's, > > and > > dual API 833 boxes. > > > > Craig > > > > > > > > > > > > On Sat, Apr 06, 2002 at 03:35:45AM -0800, Suraj Peri > > wrote: > > > Hi group, > > > I was calculating the performance of my cluster. > > The > > > features are > > > > > > 1. 8 nodes > > > 2. Processor: AMD Athlon XP 1800+ > > > 3. 8 CPUs > > > 4. 8*1.5 GB DDR RAM > > > 5. 1 Server with 2 processorts with AMD MP 1800+ > > and > > > 2GB DDR RAM > > > > > > I calculated this to be 48 Mflops . Is this > > correct ? > > > if not, what is the correct performance of my > > cluster. > > > I also comparatively calculated that my cluster > > would > > > be 3 times faster than AlphaServer DS20E ( 833 MHz > > > alpha 64 bit processor, 4 GB max memory) > > > > > > Is my calculation correct or wrong? please help me > > > ASAP. thanks in advance. > > > > > > cheers > > > suraj. > > > > > > ===== > > > PIL/BMB/SDU/DK > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Yahoo! Tax Center - online filing with TurboTax > > > http://taxes.yahoo.com/ > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or > > unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > > Craig Tierney (ctierney@hpti.com) > > > ===== > PIL/BMB/SDU/DK > > __________________________________________________ > Do You Yahoo!? > Yahoo! Tax Center - online filing with TurboTax > http://taxes.yahoo.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From tim at dolphinics.com Fri Apr 12 10:02:04 2002 From: tim at dolphinics.com (Tim Wilcox) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? References: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> <20020411132650.A32674@hpti.com> Message-ID: <3CB7130C.60107@dolphinics.com> Craig Tierney wrote: >I talked to a guy at SC2002 from Quadrics and he said >that list pricing on a Quadrics network was about $3500 >per node when you are in the 100s of nodes and up. >The price includes the cards, cables, switches, >etc. This doesn't include any sort of discount that you >might get. Myrinet is about $2000 for an equivelent >network at list price. Dolphin/SCI falls around $2245 list >per node (if the system is > 144 nodes and you have to get >the 3d card). > A couple of corrections to this, the 2D card lists at $1695 per node and is suitable for up to 256 nodes. No one has built one that size and it is correct that is it is recommended to use 3D (lists $2245/node) for larger than 144 nodes. This is due to a potential saturation of a ring for certain communication patterns, it is not always the case. By going to 3D you shorten the rings and avoid this up to 1728 nodes. Anyone interested in building one? > > >I heard that Quadrics had a customer that just had to have >an Intel/Quadrics system so either they or he was working >on porting the drivers. The web page says they support >Linux and Tru64. You could probably get the hardware without >going through Compaq, but Compaq is most likely buying up >most of the supply. > >Craig > The Quadrics looks interesting, but I haven't the resources to afford the pleasure of playing with it. The major issue with it is pricing and lack of nodes out there using it pricing. Myricom and Dolphin tend to come to about the same price per node, chalk it up to friendly competition. Regards, Tim Wilcox From djholm at fnal.gov Fri Apr 12 10:36:22 2002 From: djholm at fnal.gov (Don Holmgren) Date: Tue Mar 16 01:02:20 2010 Subject: BLAS-1, AMD, Pentium, gcc In-Reply-To: <20020412155635.92564.qmail@web12605.mail.yahoo.com> Message-ID: On Fri, 12 Apr 2002, Hung Jung Lu wrote: > Hi, > > I am thinking in migrating some calculation programs > from Windows to Linux, maybe eventually using a > Beowulf cluster. However, I am kind of worried after I > read in the mailing list archive about lack of > CPU-optimized BLAS-1 code in Linux systems. Currently > I run on a Wintel (Windows+Pentium) machine, and I > know it's substantially faster than equivalent AMD > machine, because I use the Intel's BLAS (MKL) library. > (I apologize for any misapprehensions in what > follows... I am only starting to explore in this > arena.) > > (1) Does anyone know when gcc will have memory > prefetching features? Any time frame? I can notice > very significant performance improvement on my Wintel > machine, and I think it's due to memory prefetching. If you mean, "when will gcc's optimizer do automatic prefetching?", I have no idea. But, many programmers have been doing manual prefetching with gcc for quite a while. If you don't mind defining and using assembler macros, gcc handles it just fine now. Here's an example: #define prefetch_loc(addr) \ __asm__ __volatile__ ("prefetchnta %0" \ : \ : \ "m" (*(((char*)(((unsigned int)(addr))&~0x7f))))) > (2) I am a bit confused on the following issue: Intel > does release MKL for Linux. So, does this mean that if > I use Pentium, I still get full benefit of the > CPU-optimized features in BLAS-1, despite of gcc does > not do memory prefetching? How is this possible? The Intel compiler produces object files compatible with gcc, and vice versa. I would assume they implemented the library with the Intel compiler, which has full SSE/SSE2 support (including prefetching). They list the MKL for Linux as compatible with both gnu and Intel compilers. > (3) Related to the above: for general linear algebra > operations, is Pentium processor then better than AMD, > since Intel has the machine-optimized BLAS library? I > get contradictory information sometimes... I've seen > somewhere that Pentium-4 compares unfavorably with AMD > chips in calculation speed... Any opinions? > > thanks, > > Hung Jung Lu For the very simple SU3 linear algebra (3X3 complex matrices and 3X1 complex vectors) used in our codes, the Pentium 4 outperforms the Athlon on most of our SSE-assisted routines. See the table near the bottom of http://qcdhome.fnal.gov/sse/inline.html for Mflops per gigahertz on various routines for P-III, P4, and Athlon. Perhaps re-coding in 3DNow! would give the Athlon a boost. For our codes, which are bound by memory bandwidth, P4's do significantly better than Athlons because of the faster front side bus (400 Mhz effective). See http://qcdhome.fnal.gov/qcdstream/compare.qcdstream for a table comparing memory bandwidth and SU3 linear algebra performance on a 1.2 GHz Athlon, 1.4 GHz P4, and 1.7 GHz P7 (see http://qcdhome.fnal.gov/qcdstream/ for information about this benchmark). Don Holmgren Fermilab From sp at scali.com Fri Apr 12 10:50:26 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <20020412110151.A1491@hpti.com> Message-ID: On Fri, 12 Apr 2002, Craig Tierney wrote: > On Fri, Apr 12, 2002 at 01:40:08AM -0400, Patrick Geoffray wrote: > > Steffen Persvold wrote: > > > > >>I talked to a guy at SC2002 from Quadrics and he said > > >>that list pricing on a Quadrics network was about $3500 > > >>per node when you are in the 100s of nodes and up. > > >>The price includes the cards, cables, switches, > > >>etc. This doesn't include any sort of discount that you > > >>might get. Myrinet is about $2000 for an equivelent > > >>network at list price. Dolphin/SCI falls around $2245 list > > >>per node (if the system is > 144 nodes and you have to get > > >>the 3d card). > > > > > > This is list prices for the cards only, right ? > > > > Not for Myrinet. Actually $2000 per node is the total cost > > (NIC/cable/port/software) for the high-end products (with L9/200 MHz), > > should be more like $1500 for low-end ones. Craig is spoiled, only buys > > the top stuff :-) > > Sorry Patrick. The problem with trying to state numbers is that > if you get it wrong, then the real knowledgeable ones can point it out. > > I figure out list cost on a 256 node system at about $2000 before for > basic hardware. I as wrong. I reworked it and it is $1500 for > 256 (and would be the same for 512 and 1024). So what is wron with my calculations : 256 node L9/2MB/133MHz config : M3F-PCI64B-2 NICs 256 * $1,195 = $305,920 M3-E128 Switch enclosures 6 * $12,800 = $76,800 M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 ----------------------------------------------------- Total cost = $561,920 Node cost = $2,195 and for a L9/2MB/200MHz config : M3F-PCI64B-2 NICs 256 * $1,495 = $382,720 M3-E128 Switch enclosures 6 * $12,800 = $76,800 M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 ----------------------------------------------------- Total cost = $638,720 Node cost = $2,495 And this is without cable cost (since I don't quite know the cable requirements for the total system, but atleast it is approx $100 per node). > > I thought I had decent information. What I was trying to provide was > info on the three options. The SCI price is per node (cards, cables, > software). It is $2245 list. This would be appropriate for systems over > 144 nodes. > Actually, Wulfkit3 comes in two flavors wether you want 1U or can manage with a 2U (or higher) solutuion. The 1U is $2,445 and the 2U solution is $2,245 per node. > The Quadrics number I got from the rep might have been the sales number, > and not a perfect comparsion to all the required hardware for a system of > a few hundred nodes. > Now we have price comparisons for the interconnects (SCI,Myrinet and Quadrics). What about performance ? Does anyone have NAS/PMB numbers for ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks HE-SL based cluster). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From lindahl at keyresearch.com Fri Apr 12 10:43:35 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster In-Reply-To: <20020412111552.B1491@hpti.com>; from ctierney@hpti.com on Fri, Apr 12, 2002 at 11:15:52AM -0600 References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> <20020412111552.B1491@hpti.com> Message-ID: <20020412134335.B1810@wumpus.skymv.com> On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig Tierney wrote: > Is the BLAST code something that spends lots > of time trying doing lots of little calculations, > or doing one big calculation? How important is > the speed of access to the database? What is > the memory footprint of the code when it runs > on the DS20E? It depends. What BLAST does is compare a set of sequences against a big database of sequences. The databases come in small, medium, and large (bigger than 2 GByte) sizes; the sequences can either be a single sequence (imagine a researcher looking up a single protein using a web interface) or a large set of them. If it's a large set, the problem is embarrassingly parallel. The BLAST implementation used by most people isn't parallel. It can be fairly easily parallelized to divide the big database up into pieces. People build fairly different clusters to run BLAST depending on their details. The guys at Celera Geonmics didn't want to use a parallel version, and their database is bigger than 2 GBytes, so they bought Alphas. Most people have small enough databases to fit into 2 GBytes, but search against 1 sequence at a time, so they can't afford to read the entire database over NFS every time, and keep it on a local disk. greg From ctierney at hpti.com Fri Apr 12 11:19:13 2002 From: ctierney at hpti.com (Craig Tierney) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: ; from sp@scali.com on Fri, Apr 12, 2002 at 07:50:26PM +0200 References: <20020412110151.A1491@hpti.com> Message-ID: <20020412121913.B1508@hpti.com> On Fri, Apr 12, 2002 at 07:50:26PM +0200, Steffen Persvold wrote: > On Fri, 12 Apr 2002, Craig Tierney wrote: > > > On Fri, Apr 12, 2002 at 01:40:08AM -0400, Patrick Geoffray wrote: > > > Steffen Persvold wrote: > > > > > > >>I talked to a guy at SC2002 from Quadrics and he said > > > >>that list pricing on a Quadrics network was about $3500 > > > >>per node when you are in the 100s of nodes and up. > > > >>The price includes the cards, cables, switches, > > > >>etc. This doesn't include any sort of discount that you > > > >>might get. Myrinet is about $2000 for an equivelent > > > >>network at list price. Dolphin/SCI falls around $2245 list > > > >>per node (if the system is > 144 nodes and you have to get > > > >>the 3d card). > > > > > > > > This is list prices for the cards only, right ? > > > > > > Not for Myrinet. Actually $2000 per node is the total cost > > > (NIC/cable/port/software) for the high-end products (with L9/200 MHz), > > > should be more like $1500 for low-end ones. Craig is spoiled, only buys > > > the top stuff :-) > > > > Sorry Patrick. The problem with trying to state numbers is that > > if you get it wrong, then the real knowledgeable ones can point it out. > > > > I figure out list cost on a 256 node system at about $2000 before for > > basic hardware. I as wrong. I reworked it and it is $1500 for > > 256 (and would be the same for 512 and 1024). Your calcuations are fine. I shouldn't be allowed to add and multiply numbers. When I redid the numbers I redid them incorrectly. From list prices, cables are about $100 each, and you need two per card. So add about $200 to your prices. > > So what is wron with my calculations : > > 256 node L9/2MB/133MHz config : > > M3F-PCI64B-2 NICs 256 * $1,195 = $305,920 > M3-E128 Switch enclosures 6 * $12,800 = $76,800 > M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 > M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 > ----------------------------------------------------- > Total cost = $561,920 > Node cost = $2,195 > > and for a L9/2MB/200MHz config : > > M3F-PCI64B-2 NICs 256 * $1,495 = $382,720 > M3-E128 Switch enclosures 6 * $12,800 = $76,800 > M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 > M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 > ----------------------------------------------------- > Total cost = $638,720 > Node cost = $2,495 > > And this is without cable cost (since I don't quite know the cable > requirements for the total system, but atleast it is approx $100 per > node). > > > > > I thought I had decent information. What I was trying to provide was > > info on the three options. The SCI price is per node (cards, cables, > > software). It is $2245 list. This would be appropriate for systems over > > 144 nodes. > > > > Actually, Wulfkit3 comes in two flavors wether you want 1U or can manage > with a 2U (or higher) solutuion. The 1U is $2,445 and the 2U solution is > $2,245 per node. > > > The Quadrics number I got from the rep might have been the sales number, > > and not a perfect comparsion to all the required hardware for a system of > > a few hundred nodes. > > > > Now we have price comparisons for the interconnects (SCI,Myrinet and > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > HE-SL based cluster). I don't think that anyone is going to have numbers on the same hardware. Too bad. It would be interesting to see the differences. However, that may end all the discussions and that would be no fun. Craig From patrick at myri.com Fri Apr 12 12:48:00 2002 From: patrick at myri.com (Patrick Geoffray) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB739F0.6000109@myri.com> Steffen Persvold wrote: >>I figure out list cost on a 256 node system at about $2000 before for >>basic hardware. I as wrong. I reworked it and it is $1500 for >>256 (and would be the same for 512 and 1024). >> > So what is wron with my calculations : > > 256 node L9/2MB/133MHz config : > Node cost = $2,195 > and for a L9/2MB/200MHz config : > Node cost = $2,495 Nothing, it's right for 256 nodes. However: 128 nodes L9/133 MHz config: Node cost = $1,595 128 nodes L9/200 MHz config: Node cost = $1,895 For more than 128 ports, the number of switches increases to keep a guaranteed full-bissection, it adds about $500 per node. However, up to 128 nodes, you need only one switch. and the numbers I gave are correct. The switchless cost model makes sense for configs > than the biggest switch size for switched technologies, ie. 128 ports for Quadrics and Myrinet. Surprisingly, the largest SCI cluster is, AFAIK, 132 nodes ;-) > Now we have price comparisons for the interconnects (SCI,Myrinet and > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > HE-SL based cluster). Ok, I will say again what I think about these comparaisons: it's already hard to compare dollars (what about discount, what about support, what about software, etc) despite that it the same dollars, it's wasting time to do that for micro-benchmarks. It's something you do when you want to publish something in a conference next to a beach. When a customer asks me about performance, I don't give him my NAS or PMB numbers, he doesn't care. He wants access to a XXX nodes machine to play with and run his set of applications, or he gives a list of codes to the vendors for the bid and the vendors guarantee the results because it's used officially in the bid process. If someone buys a machine because the NAS look pretty and his CFD code sucks, this guy will take his stuffs and look for a new job. Do you spend time to tune NAS ? I don't. People already told me that the NAS LU test sucks on MPICH-GM. Well, the LU algorithm in HPL is much better. How many application behaves like the NAS LU, how many like HPL ? If a customer comes to me because his code behaves like NAS LU, I will tell him what to tune in his code to be more efficient. The pitfall with benchmarks is that you want to tune your MPI implementation to looks good on them. In real world, you cannot expect to run efficiently a code on a machine without tuning it, specially with MPI. My 2 pennies Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From robert at bay13.de Fri Apr 12 12:25:06 2002 From: robert at bay13.de (Robert Depenbrock) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> <20020412111552.B1491@hpti.com> <20020412134335.B1810@wumpus.skymv.com> Message-ID: <3CB73492.78900CF4@bay13.de> Greg Lindahl wrote: > Hi Greg, > On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig Tierney wrote: > > > Is the BLAST code something that spends lots > > of time trying doing lots of little calculations, > > or doing one big calculation? How important is > > the speed of access to the database? What is > > the memory footprint of the code when it runs > > on the DS20E? > > It depends. > > What BLAST does is compare a set of sequences against a big database of > sequences. The databases come in small, medium, and large (bigger than > 2 GByte) sizes; the sequences can either be a single sequence (imagine > a researcher looking up a single protein using a web interface) or a > large set of them. If it's a large set, the problem is embarrassingly > parallel. > > The BLAST implementation used by most people isn't parallel. It can be > fairly easily parallelized to divide the big database up into pieces. > > People build fairly different clusters to run BLAST depending on their > details. The guys at Celera Geonmics didn't want to use a parallel > version, and their database is bigger than 2 GBytes, so they bought > Alphas. Most people have small enough databases to fit into 2 GBytes, > but search against 1 sequence at a time, so they can't afford to read > the entire database over NFS every time, and keep it on a local disk. Do you have some sample proteins and databases ? I would like to test some machines i have availble to mess around a little bit. (HP PA-Risc Series, SUN Sparc Fire, Itanium, Power PC). I would like to build a little benchmark around these datasets. regards Robert Depenbrock -- nic-hdl RD-RIPE http://www.bay13.de/ e-mail: robert@bay13.de Fingerprint: 1CEF 67DC 52D7 252A 3BCD 9BC4 2C0E AC87 6830 F5DD From sp at scali.com Fri Apr 12 13:41:23 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB739F0.6000109@myri.com> Message-ID: On Fri, 12 Apr 2002, Patrick Geoffray wrote: > Steffen Persvold wrote: > > >>I figure out list cost on a 256 node system at about $2000 before for > >>basic hardware. I as wrong. I reworked it and it is $1500 for > >>256 (and would be the same for 512 and 1024). > >> > > > So what is wron with my calculations : > > > > 256 node L9/2MB/133MHz config : > > Node cost = $2,195 > > and for a L9/2MB/200MHz config : > > Node cost = $2,495 > > Nothing, it's right for 256 nodes. However: > > 128 nodes L9/133 MHz config: > Node cost = $1,595 > 128 nodes L9/200 MHz config: > Node cost = $1,895 > > For more than 128 ports, the number of switches increases to keep a > guaranteed full-bissection, it adds about $500 per node. However, up to > 128 nodes, you need only one switch. and the numbers I gave are correct. > Yes, I was just questioning Craig's numbers. I was actually suprised that the Myrinet node cost didn't increase more when going from 128 to 256 nodes since it basically involves a lot more hardware (i.e 4 additional switch enclousures, and 64 additional "spine" cards). > The switchless cost model makes sense for configs > than the biggest > switch size for switched technologies, ie. 128 ports for Quadrics and > Myrinet. Surprisingly, the largest SCI cluster is, AFAIK, 132 nodes ;-) > The largest SCI cluster (atleast switchless) is indeed 132 nodes. > > Now we have price comparisons for the interconnects (SCI,Myrinet and > > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > > HE-SL based cluster). > > Ok, I will say again what I think about these comparaisons: it's already > hard to compare dollars (what about discount, what about support, what > about software, etc) despite that it the same dollars, it's wasting time > to do that for micro-benchmarks. It's something you do when you want to > publish something in a conference next to a beach. > When a customer asks me about performance, I don't give him my NAS or > PMB numbers, he doesn't care. He wants access to a XXX nodes machine to > play with and run his set of applications, or he gives a list of codes > to the vendors for the bid and the vendors guarantee the results because > it's used officially in the bid process. If someone buys a machine > because the NAS look pretty and his CFD code sucks, this guy will take > his stuffs and look for a new job. > > Do you spend time to tune NAS ? I don't. People already told me that the > NAS LU test sucks on MPICH-GM. Well, the LU algorithm in HPL is much > better. How many application behaves like the NAS LU, how many like HPL > ? If a customer comes to me because his code behaves like NAS LU, I will > tell him what to tune in his code to be more efficient. > > The pitfall with benchmarks is that you want to tune your MPI > implementation to looks good on them. In real world, you cannot expect > to run efficiently a code on a machine without tuning it, specially with > MPI. > I think that most people on this list agrees that it is really the customers application that counts, not NAS nor PMB numbers (and no, I don't spend much time tuning NAS it was a bad example). I also agree with most of your other statements, however I still think that atleast a MPI specific benchmark such as PMB (don't know if it's available for PVM...) will give the customers an initial feeling on what interconnect they need (if they know how their application is architected). > My 2 pennies > Thanks, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From joachim at sonne.lfbs.rwth-aachen.de Fri Apr 12 14:02:58 2002 From: joachim at sonne.lfbs.rwth-aachen.de (joachim) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB739F0.6000109@myri.com> from Patrick Geoffray at "Apr 12, 2002 03:48:00 pm" Message-ID: <200204122102.XAA02537@wikkit.lfbs.rwth-aachen.de> Fully d'accord, but: comparing applications like MM5, GROMACS, ... based on the interconnect and MPI library (on otherwise identical systems) *would* make sense. At least for interconnect and MPI designers, and also for marketing (after carefully chosing the right cases...). And maybe for some buying decisions for smaller "home built" systems. We can discuss this on CAC'02 on monday. ;) regards, Joachim From lindahl at keyresearch.com Fri Apr 12 15:23:46 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster In-Reply-To: <3CB73492.78900CF4@bay13.de>; from robert@bay13.de on Fri, Apr 12, 2002 at 09:25:06PM +0200 References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> <20020412111552.B1491@hpti.com> <20020412134335.B1810@wumpus.skymv.com> <3CB73492.78900CF4@bay13.de> Message-ID: <20020412182346.A1990@wumpus.skymv.com> On Fri, Apr 12, 2002 at 09:25:06PM +0200, Robert Depenbrock wrote: > Do you have some sample proteins and databases ? Robert, I almost got a benchmark suite for BLAST together, but got side-tracked before I had anything useful. Just like 10,000 other projects ;-) greg From mas at ucla.edu Fri Apr 12 15:37:50 2002 From: mas at ucla.edu (Michael Stein) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? (i860) In-Reply-To: ; from sp@scali.com on Fri, Apr 12, 2002 at 05:37:17AM +0200 References: Message-ID: <20020412153750.A5315@mas1.ats.ucla.edu> > bandwidth was highest on IA64 460GX...). I can't imagine that the i860 can > actually perform as well as 340MByte/sec since the Hub-Link (between > the MCH and the P64H) has a limit of 266MByte/sec (AFAIK) .... The data sheet for the i860 shows 3 separate Hub-links A, B and C. A is 266 MByte/sec (and typically runs the 33 Mhz 32 bit stuff). B and C are 533 MByte/sec each and drive the P64Hs. (16 bits * 66 Mhz * 4x data xfers). http://developer.intel.com/design/chipsets/datashts/290713.htm The pdf is about 1.1 MB. From djholm at fnal.gov Fri Apr 12 16:11:41 2002 From: djholm at fnal.gov (Don Holmgren) Date: Tue Mar 16 01:02:20 2010 Subject: very high bandwidth, low latency manner? (i860) In-Reply-To: <20020412153750.A5315@mas1.ats.ucla.edu> Message-ID: Unfortunately the measured performance doesn't match the published specs. DMA rates reported by the Myrinet driver on 64/66 cards are about 315 MB/sec and 225 MB/sec, respectively, for bus writes and reads. See the reported measurements on a number of i860-based motherboards at Greg Lindahl's page, http://www.conservativecomputer.com/myrinet/perf.html This has been a sore point for lots of folks wanting to build clusters with i860-based machines. Don Holmgren Fermilab On Fri, 12 Apr 2002, Michael Stein wrote: > > bandwidth was highest on IA64 460GX...). I can't imagine that the i860 can > > actually perform as well as 340MByte/sec since the Hub-Link (between > > the MCH and the P64H) has a limit of 266MByte/sec (AFAIK) .... > > The data sheet for the i860 shows 3 separate Hub-links A, B and C. > > A is 266 MByte/sec (and typically runs the 33 Mhz 32 bit stuff). > > B and C are 533 MByte/sec each and drive the P64Hs. > (16 bits * 66 Mhz * 4x data xfers). > > http://developer.intel.com/design/chipsets/datashts/290713.htm > > The pdf is about 1.1 MB. > From fraser5 at cox.net Fri Apr 12 16:51:25 2002 From: fraser5 at cox.net (Jim Fraser) Date: Tue Mar 16 01:02:20 2010 Subject: BLAS-1, AMD, Pentium, gcc In-Reply-To: Message-ID: <001001c1e27c$f59f1470$0400005a@papabear> Sure the optimized BLAS by Intel IS faster (on Intel) the data you present while very impressive but are skewed towards Intel because the libs are optimized for ONLY for SSE and intel chips while AMD does not really fully SSE. BUT should replace your stale BLAS code with optimized ATLAS on for your AMD chips....its a whole new world my friend! AMD really kicks some butt when the libs are optimized for cache size. It blew me away. The libs optimize for a specific chip cache and detect for SSE or 3Dnow! and really exploit it and the performance is very impressive. (as well as the makefile that runs for quite some time to produce the libs.) Download the latest developers version compile and sit back and smile. WELL WORTH THE EFFORT, no question. I got into this to port a cfd code over from intel/mkl/scalapack/mpi to amd/atlas/scalapack/mpi. The bang for the buck with AMD is no comparison after you run with this package. BTW, the Atlas libs also run on intel ( runs ANY chip for that matter) and improved performance over the intel MKL package as well (for some chips = on others). I don't have the all numbers off hand but I would suggest you re-run your case with ATLAS, your conclusion may change. try it. Its free. (PS get the developers source and compile instead of downloading the binary, the term) http://www.netlib.org/atlas/ Jim -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Don Holmgren Sent: Friday, April 12, 2002 1:36 PM To: Hung Jung Lu Cc: beowulf@beowulf.org Subject: Re: BLAS-1, AMD, Pentium, gcc On Fri, 12 Apr 2002, Hung Jung Lu wrote: > Hi, > > I am thinking in migrating some calculation programs > from Windows to Linux, maybe eventually using a > Beowulf cluster. However, I am kind of worried after I > read in the mailing list archive about lack of > CPU-optimized BLAS-1 code in Linux systems. Currently > I run on a Wintel (Windows+Pentium) machine, and I > know it's substantially faster than equivalent AMD > machine, because I use the Intel's BLAS (MKL) library. > (I apologize for any misapprehensions in what > follows... I am only starting to explore in this > arena.) > > (1) Does anyone know when gcc will have memory > prefetching features? Any time frame? I can notice > very significant performance improvement on my Wintel > machine, and I think it's due to memory prefetching. If you mean, "when will gcc's optimizer do automatic prefetching?", I have no idea. But, many programmers have been doing manual prefetching with gcc for quite a while. If you don't mind defining and using assembler macros, gcc handles it just fine now. Here's an example: #define prefetch_loc(addr) \ __asm__ __volatile__ ("prefetchnta %0" \ : \ : \ "m" (*(((char*)(((unsigned int)(addr))&~0x7f))))) > (2) I am a bit confused on the following issue: Intel > does release MKL for Linux. So, does this mean that if > I use Pentium, I still get full benefit of the > CPU-optimized features in BLAS-1, despite of gcc does > not do memory prefetching? How is this possible? The Intel compiler produces object files compatible with gcc, and vice versa. I would assume they implemented the library with the Intel compiler, which has full SSE/SSE2 support (including prefetching). They list the MKL for Linux as compatible with both gnu and Intel compilers. > (3) Related to the above: for general linear algebra > operations, is Pentium processor then better than AMD, > since Intel has the machine-optimized BLAS library? I > get contradictory information sometimes... I've seen > somewhere that Pentium-4 compares unfavorably with AMD > chips in calculation speed... Any opinions? > > thanks, > > Hung Jung Lu For the very simple SU3 linear algebra (3X3 complex matrices and 3X1 complex vectors) used in our codes, the Pentium 4 outperforms the Athlon on most of our SSE-assisted routines. See the table near the bottom of http://qcdhome.fnal.gov/sse/inline.html for Mflops per gigahertz on various routines for P-III, P4, and Athlon. Perhaps re-coding in 3DNow! would give the Athlon a boost. For our codes, which are bound by memory bandwidth, P4's do significantly better than Athlons because of the faster front side bus (400 Mhz effective). See http://qcdhome.fnal.gov/qcdstream/compare.qcdstream for a table comparing memory bandwidth and SU3 linear algebra performance on a 1.2 GHz Athlon, 1.4 GHz P4, and 1.7 GHz P7 (see http://qcdhome.fnal.gov/qcdstream/ for information about this benchmark). Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From suraj_peri at yahoo.com Sat Apr 13 03:21:52 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Tue Mar 16 01:02:20 2010 Subject: What could be the performance of my cluster In-Reply-To: <3CB73492.78900CF4@bay13.de> Message-ID: <20020413102152.24030.qmail@web10506.mail.yahoo.com> BLAST ( Basic Local Alignment Search tool) takes the query ( either protein or DNA) sequence and try to match the small pathces ( lets say it breaks your sequence in to small pieces of 6 letters and then try to match them in a the database index file) . Once BLAST algo. finds any small match it tries to extend your query sequence for further match in the database. If it finds more then it makes a score and represent that score. If it doesnt then it represents low score and based on low scores we do not consider lower score hits. Thus, in my opinion it does many claculations and finally show the scores. ( P-value) Interestingly , BLAST is considered a local alignment search tool because it tries to match bits of your query sequence and then extends for more matches. in contrast there is another algorithm called FASTA ( Fast alignment search tool ) this is a global ( means it takes big chunks of sequences and then tries to thread them over database). So Bill Pearson (creator) made a PVM version of FASTA and his students at virginia are using it on a beowulf cluster. ( You can access that at ftp://ftp.virginia.edu/pub/fasta/) In my case my database would be ~80 GB. ( i hope to use this much data over NFS) I am planning to introduce this algorithm in every node and then using MPICH I would like to ask my node to access the whole database using NFS. I am new to this area, but I wonder the ideas I am having are practical or not. We will start configuring our cluster some time in May. cheers suraj. --- Robert Depenbrock wrote: > Greg Lindahl wrote: > > > > Hi Greg, > > > On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig > Tierney wrote: > > > > > Is the BLAST code something that spends lots > > > of time trying doing lots of little > calculations, > > > or doing one big calculation? How important is > > > the speed of access to the database? What is > > > the memory footprint of the code when it runs > > > on the DS20E? > > > > It depends. > > > > What BLAST does is compare a set of sequences > against a big database of > > sequences. The databases come in small, medium, > and large (bigger than > > 2 GByte) sizes; the sequences can either be a > single sequence (imagine > > a researcher looking up a single protein using a > web interface) or a > > large set of them. If it's a large set, the > problem is embarrassingly > > parallel. > > > > The BLAST implementation used by most people isn't > parallel. It can be > > fairly easily parallelized to divide the big > database up into pieces. > > > > People build fairly different clusters to run > BLAST depending on their > > details. The guys at Celera Geonmics didn't want > to use a parallel > > version, and their database is bigger than 2 > GBytes, so they bought > > Alphas. Most people have small enough databases to > fit into 2 GBytes, > > but search against 1 sequence at a time, so they > can't afford to read > > the entire database over NFS every time, and keep > it on a local disk. > > Do you have some sample proteins and databases ? > > I would like to test some machines i have availble > to mess around a > little bit. > (HP PA-Risc Series, SUN Sparc Fire, Itanium, Power > PC). > > I would like to build a little benchmark around > these datasets. > > regards > Robert Depenbrock > > -- > nic-hdl RD-RIPE > http://www.bay13.de/ > e-mail: robert@bay13.de > Fingerprint: 1CEF 67DC 52D7 252A 3BCD 9BC4 2C0E > AC87 6830 F5DD > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Sat Apr 13 11:18:39 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:20 2010 Subject: decent performance from G4 Macs? Message-ID: I'm doing some benchmarks to evaluate whether current Macs would make suitable nodes for a serial farm (lots of nodes, preferably fast CPU and dram, but no serious interconnect.) I've tried a variety of real codes and benchmarks, but can't seem to get something like a Mac G4/800 with PC133 to perform anywhere close to even a P4/1.7/i845/PC133. I'm using either the gcc 2.95 that comes with OSX or a recent 3.1 snapshot (which is MUCH better, but still bad). is it just that the performance Apple brags about is strictly in-cache, and/or when doing something ah specialized like single-precision SIMD (altivec/velocity engine)? I haven't really pushed to track down an account on the very latest dual G4/1000, but afaikt it's got the same boring PC133. is anyone using Macs in clusters, and what kind of performance are you observing? thanks, mark hahn. From wrp at alpha0.bioch.virginia.edu Sat Apr 13 13:11:25 2002 From: wrp at alpha0.bioch.virginia.edu (William R. Pearson) Date: Tue Mar 16 01:02:20 2010 Subject: BLAST and FASTA benchmarks Message-ID: <200204132011.QAA22280@alpha0.bioch.virginia.edu> There was a bit of misinformation about the difference between the BLAST and FASTA programs for protein and DNA sequence comparison program. Both BLAST and FASTA search for local sequence similarity - indeed they have exactly the same goals, though they use somewhat different algorithms and statistical approaches. The advantage of an ES40 or other large shared memory machine for BLAST is that it has been optimized for searching databases that are large memory mapped files, and it runs multithreaded. PVM and MPI versions of BLAST are not available, but, it is important to remember that BLAST is extremely fast, and highly optimized to go through a large amount of memory very quickly; it would be difficult to provide an equally efficient distributed version - but, of course, a distributed memory machine would be much cheaper. PVM and MPI versions of FASTA are available. FASTA actually is a package of about a dozen programs that vary more than 100-fold in speed. It is easy to make efficient PVM/MPI versions of the slower algorithms (Smith-Waterman, TFASTY, TFASTX); parallel versions of the FASTA algorithm are less efficient. How to benchmark BLAST and FASTA - As Greg Lindahl pointed out, the appropriate platform for BLAST (less so for FASTA) depends on the size of the database. Very few databases are larger than 2 Gb (I think the person who said he had an 80 Gb database was mistaken - the largest publically available sequence database, Genbank, currently has 17Gb of sequence data). In contrast, protein sequence databases are much smaller, typically 50 - 500 Mb). If you would like to try searching some protein or DNA sequence databases, they are available from ftp.ncbi.nih.gov/blast/db. nr.Z and swissprot.Z are two representative protein sequence databases, nt.Z and est_mouse.Z are representative DNA databases. Simply select 10 - 100 sequences at random from these databases and run them against the full size databases. Bill Pearson From ron_chen_123 at yahoo.com Sat Apr 13 16:39:29 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:20 2010 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: <20020413233929.65399.qmail@web14706.mail.yahoo.com> --- Mark Hahn wrote: > I'm doing some benchmarks to evaluate whether > current Macs would make suitable nodes for a serial > farm (lots of nodes, preferably fast CPU and dram, > but no serious interconnect.) Physics or bioscience code? > I've tried a variety of real codes and benchmarks, > but can't seem to get something like a Mac G4/800 > with PC133 to perform anywhere close to even a > P4/1.7/i845/PC133. > > I'm using either the gcc 2.95 that comes with OSX or > a recent 3.1 snapshot (which is MUCH better, but > still bad). What compiler are you using for the P4? > is it just that the performance Apple brags about is > strictly in-cache, and/or when doing something ah > specialized like single-precision SIMD >(altivec/velocity engine)? Apple has some libraries that take advantage of the Altivec instructions. http://www.apple.com/downloads/macosx/math_science/applegenentechblast.html > is anyone using Macs in clusters, and what kind of > performance > are you observing? AFAIK, there are several people using MacOS X in clusters, the SGE (Sun Grid Engine) project has a port for Mac OS X. May be you should ask for the experience in setting up Mac OS X compute farms. SGE is specifically written for that environment. SGE home: http://wwws.sun.com/software/gridware/ SGE Open source site: http://gridengine.sunsource.net Search for "Mac OS" in the mailing list Archive. http://gridengine.sunsource.net/servlets/SearchList?listName=dev&by=thread -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From steveb at aei-potsdam.mpg.de Sat Apr 13 17:37:42 2002 From: steveb at aei-potsdam.mpg.de (Steven Berukoff) Date: Tue Mar 16 01:02:20 2010 Subject: DMA difficulties Message-ID: Hi all, This question may be very slightly off-topic, so I apologize. I'm in the process of setting up a network installation procedure using PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, among other things. One particular note is that I don't need/want CDROMs in these systems. Now, a vendor provided me with a couple of test nodes basically to our specifications, except that they included CDROMs and floppies. To make a longish story shorter, I wanted to make sure that the nodes work fine without the CDROM. So, I first looked into the BIOS. I disabled (set to "None") Primary Slave, Secondary Master/Slave (since my HDD is Primary Master), removed the CDROM from the list of boot devices, and disabled the Secondary IDE channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to enforce the HDD to use DMA during the Kickstart installation. Now, here is the kicker: regardless of the BIOS settings, if I have the CDROM plugged in (power+IDE, on the secondary channel) the installation takes ~ 5 times faster than if the thing isn't there. This installation includes installation of ~470 packages plus formatting the HDD. That's right, as long as the CDROM is plugged in, everything is peachy, but once gone, things slow down. I think this is a problem with the DMA settings, b/c when I pass "ide=nodma" to the kernel, WITH the CD attached, performance is slow. However, I can't even force DMA to be used. If anyone has any suggestions or similar experiences, please let me know. Thanks a bunch! Steve ===== Steve Berukoff tel: 49-331-5677233 Albert-Einstein-Institute fax: 49-331-5677298 Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de From alvin at Maggie.Linux-Consulting.com Sat Apr 13 18:09:18 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Tue Mar 16 01:02:20 2010 Subject: DMA difficulties In-Reply-To: Message-ID: hi ya i notice that when the cable is attached... things goes bonkers... even if no power ot the drive ( hd or cdrom ) remove the ide cable from the motherboard if its not used and tell the bios NOT to autodetect ide devices except those that is in fact present 150 nodes.... hummm .... one full cabinet..front and back.. :-) c ya alvin http://www.Linux-1U.net On Sun, 14 Apr 2002, Steven Berukoff wrote: > > Hi all, > > This question may be very slightly off-topic, so I apologize. > > I'm in the process of setting up a network installation procedure using > PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These > nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, > among other things. One particular note is that I don't need/want CDROMs > in these systems. > > Now, a vendor provided me with a couple of test nodes basically to our > specifications, except that they included CDROMs and floppies. To make a > longish story shorter, I wanted to make sure that the nodes work fine > without the CDROM. > > So, I first looked into the BIOS. I disabled (set to "None") Primary > Slave, Secondary Master/Slave (since my HDD is Primary Master), removed > the CDROM from the list of boot devices, and disabled the Secondary IDE > channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to > enforce the HDD to use DMA during the Kickstart installation. > > Now, here is the kicker: regardless of the BIOS settings, if I have the > CDROM plugged in (power+IDE, on the secondary channel) the installation > takes ~ 5 times faster than if the thing isn't there. This installation > includes installation of ~470 packages plus formatting the HDD. That's > right, as long as the CDROM is plugged in, everything is peachy, but once > gone, things slow down. > > I think this is a problem with the DMA settings, b/c when I pass > "ide=nodma" to the kernel, WITH the CD attached, performance is > slow. However, I can't even force DMA to be used. > > If anyone has any suggestions or similar experiences, please let me know. > > Thanks a bunch! > Steve > > > ===== > Steve Berukoff tel: 49-331-5677233 > Albert-Einstein-Institute fax: 49-331-5677298 > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From robl at mcs.anl.gov Sat Apr 13 18:29:56 2002 From: robl at mcs.anl.gov (Robert Latham) Date: Tue Mar 16 01:02:20 2010 Subject: decent performance from G4 Macs? In-Reply-To: References: Message-ID: <20020414012956.GA20390@mcs.anl.gov> On Sat, Apr 13, 2002 at 02:18:39PM -0400, Mark Hahn wrote: > is it just that the performance Apple brags about is strictly > in-cache, and/or when doing something ah specialized like > single-precision SIMD (altivec/velocity engine)? it's the altivec unit that makes G4s at all interesting. if you aren't using the vector unit, yeah, you won't even come close to x86. gcc is multi-platform, sure, but it's optimizer for x86 has received a lot of attention, while the powerpc optimizer has not. your observation that gcc 3.1 performance is better shows that focus on powerpc optimizations has grown, but yeah, it's going to get less attention than x86. too bad, really. register pressure on a powerpc is much less than on x86 ( register pressure on just about any arch not stack-based is less than that on x86 :> ) you are running on mac os x, yes? is there any chance you could put linux on it? if your application is making a significant number of system calls ( file i/o, network traffic... you know, system calls ) os x will hurt you. I'd be curious to hear if your application performs better under linux on powerpc (debian, suse, mandrake, yellowdog; there are many options) than it does under os x on the same hardware. ( if you use linux, you'll have to hand-code some assembly to use the G4. samples abound on the web. but if you are compute-intensive anyway, you might not see gains running under linux) microbenchmarks don't always correlate well with application performance, but here are lmbench numbers. the hardware is constant while i varied the operating system: http://clustermonkey.org/~laz/pbook/lmbench.powerbook.txt (the numbers are nearly 8 months old, but the newer versions of os X do not show any remarkable improvement and in fact regress on some scores) rgb, do you know what the cputest curves look like for a G4 mac? also bear in mind that G4s run significantly cooler than their x86 counterparts, so you might still come out ahead on price/performance, where price takes into account initial purchase + cost of running the cluster. so there you go. there are lots of reasons why you'll have to actually spend a bit of effort to move to a new architecture. i hope no one on this list finds that idea surprising. ==rob -- Rob Latham A215 0178 EA2D B059 8CDF B29D F333 664A 4280 315B From echiu at imservice.com Sat Apr 13 18:57:36 2002 From: echiu at imservice.com (Eric Chiu) Date: Tue Mar 16 01:02:20 2010 Subject: CD from "Building Linux Cluster" References: Message-ID: <00a101c1e357$ece44f90$e3c0fea9@squaw> Has anyone set up a cluster using the CD from Spector's book "Building Linux Clusters" (O'Reilly)? Eric Chiu, author/consultant Imservice, Inc. www.imservice.com From jsmith at structbio.vanderbilt.edu Sat Apr 13 19:45:41 2002 From: jsmith at structbio.vanderbilt.edu (Jarrod Smith) Date: Tue Mar 16 01:02:20 2010 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: On Sat, 13 Apr 2002, Mark Hahn wrote: > is it just that the performance Apple brags about is strictly > in-cache, and/or when doing something ah specialized like > single-precision SIMD (altivec/velocity engine)? I've been making a foray into OS X on G4 hardware recently. After having compiled and benchmarked a couple of our compute-intensive codes, I have wondered the same thing... So far double-precision floating point has not impressed me in the least on the G4. Jarrod Smith From robl at mcs.anl.gov Sat Apr 13 19:58:49 2002 From: robl at mcs.anl.gov (Robert Latham) Date: Tue Mar 16 01:02:20 2010 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <20020414025849.GB20390@mcs.anl.gov> On Sat, Apr 13, 2002 at 06:57:36PM -0700, Eric Chiu wrote: > Has anyone set up a cluster using the CD from Spector's book > "Building Linux Clusters" (O'Reilly)? http://www.oreilly.com/catalog/clusterlinux/ i'm guessing the answer is 'no' ==rob -- Rob Latham A215 0178 EA2D B059 8CDF B29D F333 664A 4280 315B From walke at usna.edu Sat Apr 13 19:58:54 2002 From: walke at usna.edu (Vann H. Walke) Date: Tue Mar 16 01:02:20 2010 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <1018753136.25541.3.camel@walkeonline.com> I don't have the book, but suspect that the included software would be well out of date. If you're just getting into clustering, I would suggest trying the Scyld distribution. You can get it for $3 at linuxcentral.com. Good Luck, Vann On Sat, 2002-04-13 at 21:57, Eric Chiu wrote: > Has anyone set up a cluster using the CD from Spector's book > "Building Linux Clusters" (O'Reilly)? > > Eric Chiu, author/consultant > Imservice, Inc. > www.imservice.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From spoel at xray.bmc.uu.se Sun Apr 14 00:18:02 2002 From: spoel at xray.bmc.uu.se (David van der Spoel) Date: Tue Mar 16 01:02:20 2010 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: On Sat, 13 Apr 2002, Jarrod Smith wrote: >> is it just that the performance Apple brags about is strictly >> in-cache, and/or when doing something ah specialized like >> single-precision SIMD (altivec/velocity engine)? > >I've been making a foray into OS X on G4 hardware recently. After having >compiled and benchmarked a couple of our compute-intensive codes, I have >wondered the same thing... > >So far double-precision floating point has not impressed me in the least >on the G4. We have done some single precision (gcc with altivec code) tests using our molecular dynamics code GROMACS. The results are on http://www.gromacs.org/benchmarks/single.php the numbers are simulation time/real time, i.e. higher is better. The G4 is slightly slower than an Athlon (w 3DNow)/P3 (w SSE) at the same clock. Havent't tested double precision yet. Groeten, David. ________________________________________________________________________ Dr. David van der Spoel, Biomedical center, Dept. of Biochemistry Husargatan 3, Box 576, 75123 Uppsala, Sweden phone: 46 18 471 4205 fax: 46 18 511 755 spoel@xray.bmc.uu.se spoel@gromacs.org http://zorn.bmc.uu.se/~spoel ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From echiu at imservice.com Sat Apr 13 23:22:01 2002 From: echiu at imservice.com (Eric Chiu) Date: Tue Mar 16 01:02:21 2010 Subject: BladeFrame vs Beowulf References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <012201c1e37c$f71fa2a0$e3c0fea9@squaw> Has anyone worked on one of these BladeFrame? http://www.egenera.com/prod_spec_overview.php I'm wondering how this compares to a custom-built Beowulf. I like how they have consolidated the networking and hardware in this proprietary architecture. One of the biggest problems in a Beowulf is keeping track of the boxes and ethernet connections. Eric Chiu, author/consultant Imservice, Inc. www.imservice.com From emiller at techskills.com Sun Apr 14 05:28:21 2002 From: emiller at techskills.com (Eric Miller) Date: Tue Mar 16 01:02:21 2010 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: >Has anyone set up a cluster using the CD from Spector's book >"Building Linux Clusters" (O'Reilly)? Eric, I tried to order that book about a year ago, it was taken out of print (at least the edition then was). An email response from the publisher stated that the book was such low quality that they had to take it off the shelves, too many returns/reader complaints. You may have a newer edition. From opengeometry at yahoo.ca Sun Apr 14 08:59:12 2002 From: opengeometry at yahoo.ca (William Park) Date: Tue Mar 16 01:02:21 2010 Subject: CD from "Building Linux Cluster" In-Reply-To: ; from emiller@techskills.com on Sun, Apr 14, 2002 at 08:28:21AM -0400 References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <20020414115912.A13058@node0.opengeometry.ca> On Sun, Apr 14, 2002 at 08:28:21AM -0400, Eric Miller wrote: > >Has anyone set up a cluster using the CD from Spector's book > >"Building Linux Clusters" (O'Reilly)? > > Eric, > > I tried to order that book about a year ago, it was taken out of print (at > least the edition then was). An email response from the publisher stated > that the book was such low quality that they had to take it off the shelves, > too many returns/reader complaints. > > You may have a newer edition. I have it, but it's so out-of-date now. Try Mosix or Beowulf. -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From hahn at physics.mcmaster.ca Sun Apr 14 09:24:54 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:21 2010 Subject: decent performance from G4 Macs? In-Reply-To: <20020413233929.65399.qmail@web14706.mail.yahoo.com> Message-ID: > --- Mark Hahn wrote: > > I'm doing some benchmarks to evaluate whether > > current Macs would make suitable nodes for a serial > > farm (lots of nodes, preferably fast CPU and dram, > > but no serious interconnect.) > > Physics or bioscience code? why does it matter? we're not trying specifically to run BLAST, if that's what you're asking. I don't see any reason why the department would matter, but it's a mixture of math, chem, physics, astro, biologists, and perhaps a few psychologists. > > I've tried a variety of real codes and benchmarks, > > but can't seem to get something like a Mac G4/800 > > with PC133 to perform anywhere close to even a > > P4/1.7/i845/PC133. > > > > I'm using either the gcc 2.95 that comes with OSX or > > a recent 3.1 snapshot (which is MUCH better, but > > still bad). > > What compiler are you using for the P4? I'm pretty happy with recent snapshots of gcc 3.1 (pre-release). (still mystified why gnu fortran people are stuck at F77, but...) > > is it just that the performance Apple brags about is > > strictly in-cache, and/or when doing something ah > > specialized like single-precision SIMD > >(altivec/velocity engine)? > > Apple has some libraries that take advantage of the > Altivec instructions. linpack/lapack/atlas/fftw? > AFAIK, there are several people using MacOS X in > clusters, the SGE (Sun Grid Engine) project has a port > for Mac OS X. which doesn't give me ANY data on performance. From opengeometry at yahoo.ca Sun Apr 14 09:23:27 2002 From: opengeometry at yahoo.ca (William Park) Date: Tue Mar 16 01:02:21 2010 Subject: [MAILER-DAEMON@x263.net: Undelivered Mail Returned to Sender] Message-ID: <20020414122327.A13288@node0.opengeometry.ca> To list maintainer: Please unsubscribe . Everytime I post to the list, I get rejected notice from . It should go to you! -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin -------------- next part -------------- An embedded message was scrubbed... From: MAILER-DAEMON@x263.net (Mail Delivery System) Subject: Undelivered Mail Returned to Sender Date: Mon, 15 Apr 2002 00:16:41 +0800 (CST) Size: 5390 Url: http://www.scyld.com/pipermail/beowulf/attachments/20020414/7bdf47bb/attachment.mht From heckendo at cs.uidaho.edu Sun Apr 14 10:05:39 2002 From: heckendo at cs.uidaho.edu (Robert B Heckendorn) Date: Tue Mar 16 01:02:21 2010 Subject: MPI/PVM for BLAST and FASTA In-Reply-To: <200204141601.g3EG14G09306@blueraja.scyld.com> Message-ID: <200204141705.KAA16409@brownlee.cs.uidaho.edu> Bill Pearson's paragraph introduces so many great questions that maybe Bill or others can answer. > The advantage of an ES40 or other large shared memory machine for > BLAST is that it has been optimized for searching databases that are > large memory mapped files, and it runs multithreaded. PVM and MPI > versions of BLAST are not available but, it is important to remember > that BLAST is extremely fast, and highly optimized to go through a > large amount of memory very quickly; it would be difficult to provide > an equally efficient distributed version - but, of course, a > distributed memory machine would be much cheaper. I think I could learn a lot by listening to the details of why this is not done. So here goes: Why is it that BLAST is not available for MPI/PVM? I would think clusters would be the prefect host for such an application. Is it there is no need because BLAST is already so fast and no one wants to break the database out onto node-resident disks? Or is it that BLAST is kept running on single processor or shared memory machines BLAST so that the DB is always in memory ready to roll without loading and doing the same for a cluster is not worth it because the same trick is difficult to do on a node given the current way clusters are built? I assume the same is true for FASTA? thanks for the clarification, -- | Robert Heckendorn | We may not be the only | heckendo@cs.uidaho.edu | species on the planet but | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. | CS Dept, University of Idaho | | Moscow, Idaho, USA 83844-1010 | From steveb at aei-potsdam.mpg.de Sun Apr 14 10:24:24 2002 From: steveb at aei-potsdam.mpg.de (Steven Berukoff) Date: Tue Mar 16 01:02:21 2010 Subject: DMA difficulties In-Reply-To: Message-ID: Hi, Sorry, I should have given a bit more info. If the IDE cable is attached, but the power cable is not, the machine will not complete POST; it will hang. If the power cable is attached, but the IDE cable is not, the machine completes POST, and goes forward with the install. However, performance is slow. Only when both cables are attached to the CDROM does the installation run quickly. To address Alvin's comments, all settings in the BIOS relevant to the CDROM are disabled: the CDROM is not listed as a boot device, it's not a Master or Slave on either IDE channel, and the Secondary IDE channel is disabled. Further, no IDE cables are attached where they shouldn't be, i.e., only the HDD cable is plugged in. Finally, there is no option in the BIOS for enabling/disabling autodetection of IDE devices. To address Mark's comments, the kernel that I'm using is the 2.4.7-10 kernel that comes with RH7.2. In particular, I'm using the kernel found in images/pxeboot, which includes support for the network loopback device, initial ramdisk, etc. Also, the boot messages say that the HDD is DMA enabled, although, as I've said, I'm a bit wary of that pronouncement. I thought about compiling my own kernel for this, instead of using the RH distro version. However, going through some of the permutations of kernel configurations didn't produce a useful product. Anyone have insights as to the kernel config that will work for this, or the options in the stock RH kernel, or how to extract such options? TIA again for your insights. Steve > hi ya > > i notice that when the cable is attached... things goes > bonkers... even if no power ot the drive ( hd or cdrom ) > > remove the ide cable from the motherboard if its not used > > and tell the bios NOT to autodetect ide devices > except those that is in fact present > > 150 nodes.... hummm .... one full cabinet..front and back.. :-) > > c ya > alvin > http://www.Linux-1U.net > > > On Sun, 14 Apr 2002, Steven Berukoff wrote: > > > > > Hi all, > > > > This question may be very slightly off-topic, so I apologize. > > > > I'm in the process of setting up a network installation procedure using > > PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These > > nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, > > among other things. One particular note is that I don't need/want CDROMs > > in these systems. > > > > Now, a vendor provided me with a couple of test nodes basically to our > > specifications, except that they included CDROMs and floppies. To make a > > longish story shorter, I wanted to make sure that the nodes work fine > > without the CDROM. > > > > So, I first looked into the BIOS. I disabled (set to "None") Primary > > Slave, Secondary Master/Slave (since my HDD is Primary Master), removed > > the CDROM from the list of boot devices, and disabled the Secondary IDE > > channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to > > enforce the HDD to use DMA during the Kickstart installation. > > > > Now, here is the kicker: regardless of the BIOS settings, if I have the > > CDROM plugged in (power+IDE, on the secondary channel) the installation > > takes ~ 5 times faster than if the thing isn't there. This installation > > includes installation of ~470 packages plus formatting the HDD. That's > > right, as long as the CDROM is plugged in, everything is peachy, but once > > gone, things slow down. > > > > I think this is a problem with the DMA settings, b/c when I pass > > "ide=nodma" to the kernel, WITH the CD attached, performance is > > slow. However, I can't even force DMA to be used. > > > > If anyone has any suggestions or similar experiences, please let me know. > > > > Thanks a bunch! > > Steve > > > > > > ===== > > Steve Berukoff tel: 49-331-5677233 > > Albert-Einstein-Institute fax: 49-331-5677298 > > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > ===== Steve Berukoff tel: 49-331-5677233 Albert-Einstein-Institute fax: 49-331-5677298 Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de From hahn at physics.mcmaster.ca Sun Apr 14 10:43:43 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:21 2010 Subject: decent performance from G4 Macs? In-Reply-To: <20020414012956.GA20390@mcs.anl.gov> Message-ID: > On Sat, Apr 13, 2002 at 02:18:39PM -0400, Mark Hahn wrote: > > is it just that the performance Apple brags about is strictly > > in-cache, and/or when doing something ah specialized like > > single-precision SIMD (altivec/velocity engine)? > > it's the altivec unit that makes G4s at all interesting. if you > aren't using the vector unit, yeah, you won't even come close to x86. as far as I can tell, the requirement to think highly of G4 is: hand-tuned altivec and a tiny working set which pretty much excludes any general-purpose scientific computing. > gcc is multi-platform, sure, but it's optimizer for x86 has received a > lot of attention, while the powerpc optimizer has not. your I'm not sure that's true: I read the gcc developers list and see significant efforts from Apple people. and remember that lots of code is not inherently vectorizable, so would never win big on SIMD. > observation that gcc 3.1 performance is better shows that focus on > powerpc optimizations has grown, but yeah, it's going to get less afaikt, 3.1 improvements are from improved infrastructure, nothing powerpc-specific. > you are running on mac os x, yes? is there any chance you could put > linux on it? if your application is making a significant number of > system calls ( file i/o, network traffic... you know, system calls ) no, I'm really only interested in compute-bound performance. > also bear in mind that G4s run significantly cooler than their x86 > counterparts, so you might still come out ahead on price/performance, I've heard Apple/Moto's PR on that, too. but my recent benchmarking has made me "think different": the G4 appears to be about the same performance as current Intel notebook PIII's. which, of course, burn about the same power as G4's... > where price takes into account initial purchase + cost of running the > cluster. we're in the market for 1-200 CPUs. it's not obvious to me that it matters whether the CPU burns 20 or 50W, since we're already got 30 KW of Alphas in the room ;) G4e/1000 21 probably "design" power PIIIulv/700 8 "design" power PIIIt/1113 28 "design" power P4a/2200 55 "design" power athxp/1800 66 max power > so there you go. there are lots of reasons why you'll have to actually > spend a bit of effort to move to a new architecture. i hope no one on > this list finds that idea surprising. I certainly do. powerpc support in gcc is not immature, and the cpu is supposed to be a general-purpose one. if my observations are true, then it's the slowest shipping GP machine, and is only viable if you can afford to structure your program around its SIMD and cache. regards, mark hahn. From gotero at linuxprophet.com Sun Apr 14 16:54:59 2002 From: gotero at linuxprophet.com (Glen Otero) Date: Tue Mar 16 01:02:21 2010 Subject: CD from "Building Linux Cluster" In-Reply-To: <1018753136.25541.3.camel@walkeonline.com> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> <1018753136.25541.3.camel@walkeonline.com> Message-ID: <1018828499.1838.175.camel@prophet> I tried to build a cluster with the CD when I reviewed that book for Linux Journal. Incredibly, the software was released unfinished, and so building a cluster with it wasn't possible. The book was pulled from circulation for this and other editorial reasons. I recommend Rocks, Scyld, and OSCAR for building clusters. Glen On Sat, 2002-04-13 at 19:58, Vann H. Walke wrote: > I don't have the book, but suspect that the included software would be > well out of date. If you're just getting into clustering, I would > suggest trying the Scyld distribution. You can get it for $3 at > linuxcentral.com. > > Good Luck, > Vann > > On Sat, 2002-04-13 at 21:57, Eric Chiu wrote: > > Has anyone set up a cluster using the CD from Spector's book > > "Building Linux Clusters" (O'Reilly)? > > > > Eric Chiu, author/consultant > > Imservice, Inc. > > www.imservice.com > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Glen Otero, Ph.D. Linux Prophet Office:858.792.5561 Mobile:619.917.1772 www.linuxprophet.com "The Beowulf is primarily a mental phenomenon" From alvin at Maggie.Linux-Consulting.com Sun Apr 14 17:37:32 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Tue Mar 16 01:02:21 2010 Subject: DMA difficulties In-Reply-To: Message-ID: hi ya steven if you have the hdd cable plugged in ( am assuming into the motherboard ) but no ide drive .... you will get whacky results ... ( whether secondary ide is disabled on the bios or not... ) remove cables that dont go nowhere ( is what am trying to say ) remove um if the devices are disabled .. most bios does allow you to autodetect or user define the devices... but donno about your motherboard... the default rh-7.2 kernel should work fine... ( doesnt cough up erroneous messages on boot that i know of.. c ya alvin On Sun, 14 Apr 2002, Steven Berukoff wrote: > > Hi, > > Sorry, I should have given a bit more info. > > If the IDE cable is attached, but the power cable is not, the machine will > not complete POST; it will hang. If the power cable is attached, but the > IDE cable is not, the machine completes POST, and goes forward with the > install. However, performance is slow. Only when both cables are > attached to the CDROM does the installation run quickly. > > To address Alvin's comments, all settings in the BIOS relevant to the > CDROM are disabled: the CDROM is not listed as a boot device, it's not a > Master or Slave on either IDE channel, and the Secondary IDE channel is > disabled. Further, no IDE cables are attached where they shouldn't be, > i.e., only the HDD cable is plugged in. Finally, there is no option in > the BIOS for enabling/disabling autodetection of IDE devices. > > To address Mark's comments, the kernel that I'm using is the 2.4.7-10 > kernel that comes with RH7.2. In particular, I'm using the kernel found > in images/pxeboot, which includes support for the network loopback device, > initial ramdisk, etc. Also, the boot messages say that the HDD is > DMA enabled, although, as I've said, I'm a bit wary of that pronouncement. > > I thought about compiling my own kernel for this, instead of using the RH > distro version. However, going through some of the permutations of kernel > configurations didn't produce a useful product. Anyone have insights as > to the kernel config that will work for this, or the options in the stock > RH kernel, or how to extract such options? > > TIA again for your insights. > > Steve > > > > > hi ya > > > > i notice that when the cable is attached... things goes > > bonkers... even if no power ot the drive ( hd or cdrom ) > > > > remove the ide cable from the motherboard if its not used > > > > and tell the bios NOT to autodetect ide devices > > except those that is in fact present > > > > 150 nodes.... hummm .... one full cabinet..front and back.. :-) > > > > c ya > > alvin > > http://www.Linux-1U.net > > > > > > On Sun, 14 Apr 2002, Steven Berukoff wrote: > > > > > > > > Hi all, > > > > > > This question may be very slightly off-topic, so I apologize. > > > > > > I'm in the process of setting up a network installation procedure using > > > PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These > > > nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, > > > among other things. One particular note is that I don't need/want CDROMs > > > in these systems. > > > > > > Now, a vendor provided me with a couple of test nodes basically to our > > > specifications, except that they included CDROMs and floppies. To make a > > > longish story shorter, I wanted to make sure that the nodes work fine > > > without the CDROM. > > > > > > So, I first looked into the BIOS. I disabled (set to "None") Primary > > > Slave, Secondary Master/Slave (since my HDD is Primary Master), removed > > > the CDROM from the list of boot devices, and disabled the Secondary IDE > > > channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to > > > enforce the HDD to use DMA during the Kickstart installation. > > > > > > Now, here is the kicker: regardless of the BIOS settings, if I have the > > > CDROM plugged in (power+IDE, on the secondary channel) the installation > > > takes ~ 5 times faster than if the thing isn't there. This installation > > > includes installation of ~470 packages plus formatting the HDD. That's > > > right, as long as the CDROM is plugged in, everything is peachy, but once > > > gone, things slow down. > > > > > > I think this is a problem with the DMA settings, b/c when I pass > > > "ide=nodma" to the kernel, WITH the CD attached, performance is > > > slow. However, I can't even force DMA to be used. > > > > > > If anyone has any suggestions or similar experiences, please let me know. > > > > > > Thanks a bunch! > > > Steve > > > > > > > > > ===== > > > Steve Berukoff tel: 49-331-5677233 > > > Albert-Einstein-Institute fax: 49-331-5677298 > > > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > ===== > Steve Berukoff tel: 49-331-5677233 > Albert-Einstein-Institute fax: 49-331-5677298 > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > From ron_chen_123 at yahoo.com Sun Apr 14 18:55:13 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:21 2010 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: <20020415015513.55770.qmail@web14702.mail.yahoo.com> --- Mark Hahn wrote: > > --- Mark Hahn wrote: > > > I'm doing some benchmarks to evaluate whether > > > current Macs would make suitable nodes for a > serial > > > farm (lots of nodes, preferably fast CPU and > dram, > > > but no serious interconnect.) > > > > Physics or bioscience code? > > why does it matter? we're not trying specifically > to run BLAST, > if that's what you're asking. I don't see any > reason why the > department would matter, but it's a mixture of math, > chem, > physics, astro, biologists, and perhaps a few > psychologists. > > > > I've tried a variety of real codes and > benchmarks, > > > but can't seem to get something like a Mac > G4/800 > > > with PC133 to perform anywhere close to even a > > > P4/1.7/i845/PC133. > > > > > > I'm using either the gcc 2.95 that comes with > OSX or > > > a recent 3.1 snapshot (which is MUCH better, but > > > still bad). > > > > What compiler are you using for the P4? > > I'm pretty happy with recent snapshots of gcc 3.1 > (pre-release). > (still mystified why gnu fortran people are stuck at > F77, but...) > > > > is it just that the performance Apple brags > about is > > > strictly in-cache, and/or when doing something > ah > > > specialized like single-precision SIMD > > >(altivec/velocity engine)? > > > > Apple has some libraries that take advantage of > the > > Altivec instructions. > > linpack/lapack/atlas/fftw? > > > AFAIK, there are several people using MacOS X in > > clusters, the SGE (Sun Grid Engine) project has a > port > > for Mac OS X. > > which doesn't give me ANY data on performance. > You can use a better compiler for the PPC: http://www.absoft.com/newproductpage.html Also, I did not say that SGE would provide you ANY data on performance -- all I said was that you could find people using Mac OS X/G4 machines in the cluster world. (or if you don't like SGE, you can choose PBS, they also have a Mac OS X port) -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From wrp at alpha0.bioch.virginia.edu Sun Apr 14 18:57:56 2002 From: wrp at alpha0.bioch.virginia.edu (William R. Pearson) Date: Tue Mar 16 01:02:21 2010 Subject: G4's for scientific computing Message-ID: <200204150157.VAA22514@alpha0.bioch.virginia.edu> One of the advantages of the MacOSX gcc compiler is that in line Altivec instructions are available at a high level. One can define vector arrays, and do vector operations from 'C' code, e.g. while(vec_any_gt(T2, NAUGHT)) { T2 = vec_sub(LSHIFT(T2), RR); FF = vec_max(FF, T2); } We are testing an Altivec FASTA version; a Altivec BLAST was announced several months ago. We like Altivec because we can manipulate 8 16-bit integers or 16 8 bit integers at once - biological sequence comparison code is essentially all integer. We see a 6-fold speedups on when things are done 8-fold parallel. On our codes a dual 533 G4 and Altivec code is 6X-faster than a dual 1 GHz PIII (we don't have a GHz G4 yet). Because of the high level Altivec primitives in the Apple gcc compiler, vectorizing was very very easy; we would have to be much more sophisticated to do the same thing on the PIII (and the potential speed-up would be 1/2 as large, since the vector is 64, not 128 bits). I might have agreed with the statement that one must have hand-tuned Altivec code which pretty much excludes general purpose scientific computing 4 months ago, but our experience has been very positive - our programs are not specialized signal processing programs, but, in retrospect, it was easy to get very dramatic speed up. Bill Pearson From wrp at alpha0.bioch.virginia.edu Sun Apr 14 19:32:20 2002 From: wrp at alpha0.bioch.virginia.edu (William R. Pearson) Date: Tue Mar 16 01:02:21 2010 Subject: Parallel BLAST Message-ID: <200204150232.WAA22617@alpha0.bioch.virginia.edu> > Why is it that BLAST is not available for MPI/PVM? I would think > clusters would be the prefect host for such an application. > Is it there is no need because BLAST is already so fast and > no one wants to break the database out onto node-resident disks? > Or is it that BLAST is kept running on single processor or shared memory > machines BLAST so that the DB is always in memory ready to roll without > loading and doing the same for a cluster is not worth it > because the same trick is difficult to do on a node given the current > way clusters are built? I assume the same is true for FASTA? I suspect that BLAST is not available for MPI/PVM because (1) it is too fast, and (2) there is not much demand for it. 95% of the time, BLAST is almost an in-memory grep (the other 5% of the time it is working on the things it is looking for). Sequence comparison is embarrassingly parallel, and very easily threaded. Distributing the sequence databases and collecting results has more overhead (there probably aren't many distributed grep programs either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is another 5-20X slower than FASTA. Here, the communications overhead is low, and distributed systems work OK for FASTA, and great for Smith-Waterman (where the overhead fraction is very small). Of course, it is a lot easier to compile a threaded program, and just run it, than it is to install and configure the MPI or PVM environment and the programs to run in it. Bioinformatics software is often run by computer savvy biologists, not high-performance computing folks, and not having to install and configure PVM/MPI is a big advantage. The NCBI probably does not make a PVM/MPI parallel BLAST because there is very little demand for it, and it does not meet their computational needs. Bill Pearson From lindahl at keyresearch.com Fri Apr 12 17:08:52 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue Mar 16 01:02:21 2010 Subject: very high bandwidth, low latency manner? (i860) In-Reply-To: ; from djholm@fnal.gov on Fri, Apr 12, 2002 at 06:11:41PM -0500 References: <20020412153750.A5315@mas1.ats.ucla.edu> Message-ID: <20020412200852.B2381@wumpus.skymv.com> On Fri, Apr 12, 2002 at 06:11:41PM -0500, Don Holmgren wrote: > Unfortunately the measured performance doesn't match the published > specs. In fact, this is *always* true for every PCI and memory system out there. Measure, measure, measure. The myrinet perftest and STREAM memory benchmark are your friends. -- greg From rickey-co at mug.biglobe.ne.jp Sat Apr 13 11:57:27 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Tue Mar 16 01:02:21 2010 Subject: decent performance from G4 Macs? In-Reply-To: References: Message-ID: Mark, At 14:18 -0400 13.04.2002, Mark Hahn wrote: >I'm doing some benchmarks to evaluate whether current Macs >would make suitable nodes for a serial farm (lots of nodes, >preferably fast CPU and dram, but no serious interconnect.) I agree about first 2, but for Interconnect, there's Myrinet for MacOS X!!! Myricom now released MPICH-GM for MacOS X as well. I personally haven't purchased Mac to test, but having said by Apple that G4's with Blast is a lot faster than P4, I'm quite interested to evaluate them soon. >I've tried a variety of real codes and benchmarks, but can't >seem to get something like a Mac G4/800 with PC133 to perform >anywhere close to even a P4/1.7/i845/PC133. > >I'm using either the gcc 2.95 that comes with OSX or a >recent 3.1 snapshot (which is MUCH better, but still bad). I think you have to MODIFY code a bit to take advantage of velocity engine for MacOS X gcc. I thought there are interesting post along with other bluff. >is it just that the performance Apple brags about is strictly >in-cache, and/or when doing something ah specialized like >single-precision SIMD (altivec/velocity engine)? I too think that's big part of it... -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ -> HPC cluster specialist<- -> Scientific Imaging/Life Science/Physical Science/Parallel Computing <- From bgb at itcnv.com Sun Apr 14 07:35:00 2002 From: bgb at itcnv.com (bgb@itcnv.com) Date: Tue Mar 16 01:02:21 2010 Subject: BladeFrame vs Beowulf In-Reply-To: <012201c1e37c$f71fa2a0$e3c0fea9@squaw> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> <012201c1e37c$f71fa2a0$e3c0fea9@squaw> Message-ID: <20020414143501.29671.qmail@smtp.itcnv.com> There is also: http://www.rlxtechnologies.com/about/pr_blast.php Eric Chiu writes: > Has anyone worked on one of these BladeFrame? > http://www.egenera.com/prod_spec_overview.php > > I'm wondering how this compares to a custom-built Beowulf. > I like how they have consolidated the networking and > hardware in this proprietary architecture. One of the biggest > problems in a Beowulf is keeping track of the boxes and ethernet > connections. > > Eric Chiu, author/consultant B.G. Bruce Networking Technologies N.V / Internet Technologies (Curacao) N.V. Phone: +599 9 563-1836 Fax: +599 9 465-3594 Alternate Email: bgbruce@it-curacao.com, ancu321@attglobal.net From lindahl at keyresearch.com Sun Apr 14 17:50:47 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue Mar 16 01:02:21 2010 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw>; from echiu@imservice.com on Sat, Apr 13, 2002 at 06:57:36PM -0700 References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <20020414175047.A12785@wumpus.attbi.com> On Sat, Apr 13, 2002 at 06:57:36PM -0700, Eric Chiu wrote: > Has anyone set up a cluster using the CD from Spector's book > "Building Linux Clusters" (O'Reilly)? I sat in an airport line for an hour once with a woman who knew Spector. So she asked me what I thought of the book, and you guys know me well enough to know how good of a spin I put on my answer: "Well, he seemed to have a clue about high availability, but the Beowulf section was pretty crappy." It turns out that he's well aware of that, and was egged on to write a "complete" book by the editors. Ah well, it's a shame no matter how it happened. greg From erayo at cs.bilkent.edu.tr Sun Apr 14 21:08:39 2002 From: erayo at cs.bilkent.edu.tr (Eray Ozkural) Date: Tue Mar 16 01:02:21 2010 Subject: G4's for scientific computing In-Reply-To: <200204150157.VAA22514@alpha0.bioch.virginia.edu> References: <200204150157.VAA22514@alpha0.bioch.virginia.edu> Message-ID: <200204150708.40331.erayo@cs.bilkent.edu.tr> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 15 April 2002 04:57, William R. Pearson wrote: > > I might have agreed with the statement that one must have hand-tuned > Altivec code which pretty much excludes general purpose scientific > computing 4 months ago, but our experience has been very positive - > our programs are not specialized signal processing programs, but, in > retrospect, it was easy to get very dramatic speed up. > I imagine fake vector processing would only work for certain type of problems. That's not SIMD by any measure. Don't you really need multiple data streams for general purpose HPC? Regards, - -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE8ulJHfAeuFodNU5wRAn7oAJ9n7oJC3nfBv29EBYOpypOjBGLUmACcCmPO kY+ZBvrh1ev4iQnFMkQV4IA= =YeV6 -----END PGP SIGNATURE----- From ssy at prg.cpe.ku.ac.th Sun Apr 14 23:09:30 2002 From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul) Date: Tue Mar 16 01:02:21 2010 Subject: Need many C/C++ MPI programs Message-ID: <000d01c1e444$2177e860$0100a8c0@yggdrasil> Hello, I need to test my cluster by running many many MPI parallel program. Is there any MPI program archive or something which I could download the program source? It would be better if the program are written in C/C++ so I could tune its performance and see how is it going in my cluster. Thanks Somsak From markus at markus-fischer.de Mon Apr 15 02:40:49 2002 From: markus at markus-fischer.de (Markus Fischer) Date: Tue Mar 16 01:02:21 2010 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CBAA021.DB753C6F@markus-fischer.de> Steffen Persvold wrote: > > Now we have price comparisons for the interconnects (SCI,Myrinet and > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > HE-SL based cluster). yes, please. I would like to get/see some numbers. I have run tests with SCI for a non linear diffusion algorithm on a 96 node cluster with 32/33 interface. I thought that the poor scalability was due to the older interface, so I switched to a SCI system with 32 nodes and 64/66 interface. Still, the speedup values were behaving like a dog with more than 8 nodes. Especially, the startup time will reach minutes which is probably due to the exporting and mapping of memory. Yes, the MPI library used was Scampi. Thus, I think the (marketing) numbers you provide below are not relevant except for applying for more VC. Even worse, we noticed, that the SCI ring structure has an impact on the communication pattern/performance of other applications. This means we only got the same execution time if other nodes were I idle or did not have communication intensive applications. How will you determine the performance of the algorithm you just invented in such a case ? We then used a 512 node cluster with Myrinet2000. The algorithm scaled very fine up to 512 nodes. Markus > > Regards, > -- > Steffen Persvold | Scalable Linux Systems | Try out the world's best > mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jacobsgd21 at BrandonU.CA Mon Apr 15 09:04:42 2002 From: jacobsgd21 at BrandonU.CA (Geoffrey D. Jacobs) Date: Tue Mar 16 01:02:21 2010 Subject: David HM Spector, "Building Linux Clusters" Message-ID: <3CBAFA1A.9090802@brandonu.ca> A waste of ink and paper. This book has no depth, and the included software is incomplete. Look elsewhere for your reference needs. From tim at dolphinics.com Mon Apr 15 10:28:42 2002 From: tim at dolphinics.com (Tim Wilcox) Date: Tue Mar 16 01:02:21 2010 Subject: Need many C/C++ MPI programs References: <000d01c1e444$2177e860$0100a8c0@yggdrasil> Message-ID: <3CBB0DCA.3000908@dolphinics.com> Somsak Sriprayoonsakul wrote: >Hello, > I need to test my cluster by running many many MPI parallel >program. Is there any MPI program archive or something which I could >download the program source? It would be better if the program are >written in C/C++ so I could tune its performance and see how is it going >in my cluster. > There are several benchmarks available with source, I commonly use these for testing machines. Try Linpack at http://www.netlib.org/benchmark/hpl/ This is good for cpu performance. I also use PMB http://www.pallas.com/e/products/pmb/download.htm this is good for interconnect performance. Tim Wilcox > >Thanks >Somsak > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From SGaudet at turbotekcomputer.com Mon Apr 15 11:11:55 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Tue Mar 16 01:02:21 2010 Subject: Parallel BLAST Message-ID: <3450CC8673CFD411A24700105A618BD6267DC4@911TURBO> > -----Original Message----- > From: William R. Pearson [mailto:wrp@alpha0.bioch.virginia.edu] > Sent: Sunday, April 14, 2002 10:32 PM > To: beowulf@beowulf.org > Subject: Parallel BLAST > > > > > Why is it that BLAST is not available for MPI/PVM? I would think > > clusters would be the prefect host for such an application. > > Is it there is no need because BLAST is already so fast and > > no one wants to break the database out onto node-resident disks? > > Or is it that BLAST is kept running on single processor or > shared memory > > machines BLAST so that the DB is always in memory ready to > roll without > > loading and doing the same for a cluster is not worth it > > because the same trick is difficult to do on a node given > the current > > way clusters are built? I assume the same is true for FASTA? > > I suspect that BLAST is not available for MPI/PVM because (1) it is > too fast, and (2) there is not much demand for it. > > 95% of the time, BLAST is almost an in-memory grep (the other 5% of > the time it is working on the things it is looking for). Sequence > comparison is embarrassingly parallel, and very easily threaded. > Distributing the sequence databases and collecting results has more > overhead (there probably aren't many distributed grep programs > either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is > another 5-20X slower than FASTA. Here, the communications overhead is > low, and distributed systems work OK for FASTA, and great for > Smith-Waterman (where the overhead fraction is very small). > > Of course, it is a lot easier to compile a threaded program, and just > run it, than it is to install and configure the MPI or PVM environment > and the programs to run in it. Bioinformatics software is often run > by computer savvy biologists, not high-performance computing folks, > and not having to install and configure PVM/MPI is a big advantage. > The NCBI probably does not make a PVM/MPI parallel BLAST because there > is very little demand for it, and it does not meet their computational > needs. -------------- There's also a commerical version from Turbogenomics. http://www.turbogenomics.com Offering: 1) Ready to go, plug-n-play solution for parallel BLAST 2) Expertise and 20+ years of experience in parallel computing 3) Dynamic database splitting feature to take advantage of computers that have less memory than the size of the database 4) Smart load balancing - achieve linear to superlinear speedup 5) No modification made to the NCBI BLAST algorithm to ensure identical results with the non-parallel version 6) Easy drop-in update whenever NCBI releases newer versions of their algorithm 7) Excellent support 8) 30-days money back guarantee Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From Hakon.Bugge at scali.com Tue Apr 16 03:24:37 2002 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Tue Mar 16 01:02:21 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CBAA021.DB753C6F@markus-fischer.de> References: Message-ID: <5.1.0.14.0.20020416122156.05491530@62.70.89.10> Hi, I am sorry to hear that you was unable to achieve expected performance on the mentioned SCI based systems. You raise a couple of issues, which I would like to address: 1) Performance. Performance transparency is always goal. Nevertheless, sometimes an implementation will have a performance bug. The two organizations owning the mentioned systems, have both support agreements with Scali. I have checked the support requests, but cannot find any request where your incidents were reported. We find this fact strange if you truly were aiming at achieving good performance. We are happy to look into your application and report findings back to this news group. 2) Startup time. You contribute the bad scalability to high startup time and mapping of memory. This is an interesting hypothesis; and can easily be verified by using a switch when you start the program, and measure the difference between the elapsed time of the application and the time it uses after MPI_Init() has been called. However, the startup time measured on 64-nodes, two processors per node, where all processes have set up mapping to all other processes, is nn second. If this contributes to bad scalability, your application has a very short runtime. 3) SCI ring structure You state that on a multi user, multi-process environment, it is hard to get deterministic performance numbers. Indeed, that is true. True sharing of resources implies that. Whether the resource is a file-server, a memory controller, or a network component, you will probably always be subject to performance differences. Also, lack of page coloring will contribute to different execution times, even for a sequential program. You further indicate that performance numbers reported f. ex. by Pallas PMB benchmark only can be used for applying for more VC. I disagree for two reasons; first, you imply that venture capitalists are naive (and to some extent stupid). That is not my impression, merely the opposite. Secondly, such numbers are a good example to verify/deny your hypothesis that the SCI ring structure is volatile to traffic generated by other applications. PMB's *multi* option is architected to investigate exactly the problem you mention; Run f. ex. MPI_Alltoall() on N/2 of the machine. Then measure how performance is affected when the other N/2 of the machine is also running Alltoall(). This is the reason we are interested in comparative performance numbers to SCI based systems. It is to me strange, that no Pallas PMB benchmark results ever has been published for a reasonable sized system based on alternative interconnect technologies. To quote Lord Kelvin: "If you haven't measured it, you don't know what you're talking about". As a bottom line, I would appreciate that initiatives to compare cluster interconnect performance should be appreciated, rather than be scrutinized and be phrased as "only usable to apply for more VC". H At 11:40 AM 4/15/02 +0200, Markus Fischer wrote: >Steffen Persvold wrote: > > > > Now we have price comparisons for the interconnects (SCI,Myrinet and > > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII > ServerWorks > > HE-SL based cluster). > >yes, please. > >I would like to get/see some numbers. >I have run tests with SCI for a non linear diffusion algorithm on a 96 node >cluster with 32/33 interface. I thought that the poor >scalability was due to the older interface, so I switched to >a SCI system with 32 nodes and 64/66 interface. > >Still, the speedup values were behaving like a dog with more than 8 nodes. > >Especially, the startup time will reach minutes which is probably due to >the exporting and mapping of memory. > >Yes, the MPI library used was Scampi. Thus, I think the >(marketing) numbers you provide >below are not relevant except for applying for more VC. > >Even worse, we noticed, that the SCI ring structure has an impact on the >communication pattern/performance of other applications. >This means we only got the same execution time if other nodes were >I idle or did not have communication intensive applications. >How will you determine the performance of the algorithm you just invented >in such a case ? > >We then used a 512 node cluster with Myrinet2000. The algorithm scaled >very fine up to 512 nodes. > >Markus > > > > > Regards, > > -- > > Steffen Persvold | Scalable Linux Systems | Try out the world's best > > mailto:sp@scali.com | http://www.scali.com | performing MPI > implementation: > > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - > > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS > latency > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf -- H?kon Bugge; VP Product Development; Scali AS; mailto:hob@scali.no; http://www.scali.com; fax: +47 22 62 89 51; Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514; Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway; Mail Addr: Scali AS, Postboks 150, Oppsal, N-0619 Oslo, Norway; From Hakon.Bugge at scali.com Tue Apr 16 03:33:55 2002 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Tue Mar 16 01:02:21 2010 Subject: very high bandwidth, low latency manner? Message-ID: <5.1.0.14.0.20020416123123.054aac70@62.70.89.10> I'm sorry. I forgot to fill in the startup time. Its 14.5 seconds for 128 processes on 64 nodes, when all processes have mapped remote memory of all other 127 processes. H From rauch at inf.ethz.ch Tue Apr 16 06:15:38 2002 From: rauch at inf.ethz.ch (Felix Rauch) Date: Tue Mar 16 01:02:21 2010 Subject: Memory benchmark (was Re: very high bandwidth, low latency manner? (i860)) In-Reply-To: <20020412200852.B2381@wumpus.skymv.com> Message-ID: On Fri, 12 Apr 2002, Greg Lindahl wrote: > The myrinet perftest and STREAM memory benchmark are your friends. If you need more detailed informations about the performance of your memory system than STREAM offers, then you might want to look at the ECT benchmarks (developed by colleagues of mine): Extended Copy Transfer Characterization http://www.cs.inf.ethz.ch/CoPs/ECT/ - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From jayne at sphynx.clara.co.uk Tue Apr 16 07:34:17 2002 From: jayne at sphynx.clara.co.uk (Jayne Heger) Date: Tue Mar 16 01:02:21 2010 Subject: what architecture was MPI and PVM 1st designed for? Message-ID: Hi, Coulld anyone tell me what computer architecture MPI and PVM were first designed for./written on. Thanks, Jayne Heger From eugen at leitl.org Tue Apr 16 06:40:49 2002 From: eugen at leitl.org (Eugen Leitl) Date: Tue Mar 16 01:02:21 2010 Subject: OpenMosix Message-ID: http://newsvac.newsforge.com/article.pl?sid=02/04/13/055227 Saturday April 13, 2002 - [ 05:00 AM GMT ] Bruce Knox writes "Tel Aviv (April 11, 2002) - Dr. Moshe Bar recently announced the creation of openMosix, a new OpenSource project. The project has quickly attracted a team of volunteer developers from around the globe and is off to a very fast start. openMosix, is an extension of the Linux kernel. For thousands of users, MOSIX has been a reliable, fast and cost-efficient clustering platform with users in life sciences, finance, industry, high-tech, research and government environments. The goal of openMosix is to give to these users continued support and an up-to-date fully GPLv2 OpenSource platform. Moshe Bar openMosix began as the last verifiable GPL version of MOSIX. All openMosix extensions are under the full GPLv2 license, the GNU General Public License (GPL) Version 2. The openMosix Copyright is held by Moshe Bar. openMosix is a Linux kernel extension for single-system image clustering. openMosix is perfectly scalable and adaptive. Once you have installed openMosix, the nodes in the cluster start talking to one another and the cluster adapts itself to the workload. There is no need to program applications specifically for openMosix. Since all openMosix extensions are inside the kernel, every application automatically and transparently benefits from the distributed computing concept of openMosix. The cluster behaves much as does a SMP, but this solution scales to well over a thousand nodes which can themselves be SMPs. OpenSource is more than just free access to software source code. The basic idea behind open source is very simple: When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing. the Open Source Initiative Moshe Bar is an Operating Systems researcher, writer of Byte Magazine column Serving With Linux , author of numerous Linux books, and frequent contributor to the Linux tree. Moshe lectures for universities, corporations, and international organizations. He holds a Bachelor degree in mathematics, a M.S. and a Ph.D. in computer science. Moshe runs moshebar.com with a mailing list of over 20,000 members, is Chief Technical Officer of Qlusters, Inc., and is the Project Manager for openMosix. Moshe was born in Israel, grew up in a kibbutz, and now lives in Tel Aviv. The development team of volunteers is truly international. The early team members reside in Chile, Spain, Italy, Norway, Germany, Israel, France and the United States. Plus, other mailing list queries have come from Canada, Pakistan, Oman, Estonia, Finland, India, South Africa, Switzerland, Tonga, and Shanghai China. Projects using openMosix already include astrophysics, medical research, and university laboratories. The openMosix project is hosted on SourceForge.net which provides collaborative development web tools for the project. Downloads, documentation, and additional information are available from www.openmosix.org. MOSIX is a very highly regarded, high performance, low cost, flexible, and scaleable Cluster Computing System for Linux. MOSIX was a GPL OpenSource project until late 2001. MOSIX, operational since 1983, integrates independent computers into a cluster, providing the user with what appears to be a single-machine Linux environment. Both the MOSIX Copyright and the MOSIX Trademark are owned by Professor Amnon Barak. Amnon Barak is a Professor of Computer Science and the Director of the Distributed Computing Laboratory in the Institute of Computer Science at the Hebrew University of Jerusalem on sabbatical leave for one year. openMosix is Copyright ? 2002 by Moshe Bar. Linux is Copyright ? 2002 by Linus Torvalds. Mosix is Copyright ? 2002 by Amnon Barak. openMosix is licensed under the GNU General Public License (GPL) Version 2, June 1991 as published by the Free Software Foundation. All logos and trademarks are the property of their respective owners. Copyright ? 2002 by Moshe Bar" From eugen at leitl.org Tue Apr 16 06:45:27 2002 From: eugen at leitl.org (Eugen Leitl) Date: Tue Mar 16 01:02:21 2010 Subject: GBit Ethernet over Cu evaluation Message-ID: http://www.cs.uni.edu/~gray/gig-over-copper/ Gigabit Over Copper Evaluation DRAFT Prepared by Anthony Betz and Paul Gray April 2, 2002 University of Northern Iowa Department of Computer Science Cedar Falls, IA 50614 Given the relatively low cost, backwards-compatibility, and widely-availability solutions for gigabit over copper network interfaces, the migration to commodity gigabit networks has begun. Copper-based gigabit solutions are now providing an alternative to the often more expensive fiber-based network solutions that are typically integrated in high performance environments such as today's tightly-coupled cluster systems. But how do these cards compare with their fiber based counterparts? Are the Linux-based drivers ready for prime-time? The intent of this paper is to provide an extensive comparison of the various Gigabit over copper network interface cards available. Since performance is based on numerous factors such as bus architecture and the network protocol being used, these are the two main subjects of our investigation. Our bandwidth benchmarks look at sustained throughput using TCP. While other communication protocols are available, indeed preferred, for high- performance computing, TCP-based benchmarks provide an immediate insight into the expected performance of the cards. With PCI-X coming into the marketplace in more and more motherboards as well as the multitude of systems with more traditional 32-bit PCI subsystems, numerous cards are available for today's 64bit and 32bit computer systems. The 64bit cards tested were as follows: Syskonnect SK9821, Syskonnect SK9D21, Asante Giganix, Ark Soho-GA2000T, 3Com 3c996BT and Intel's E1000 XT. The 32bit cards were Ark Soho-GA2500T, D-Link DGE500T. Comparisons for the various cards were made with respect to operation in alternate bus configurations and varied maximum transmission unit (MTU) sizes of TCP frames (jumbo frames). Results were gathered using Netpipe 2.4. By using Netpipe the peak sustained throughput would be provided as well as the transfer rate for varying packet sizes. Note: All cards were tested at 1500, 3000, 4000, and 6000 values for the TCP MTU size. The drivers for the cards were not modified. Cards based upon the dp83820 chipset were limited to 6000MTU due to driver defaults. All other cards were tested through 9000MTU. [results too voluminous to post] From rgb at phy.duke.edu Tue Apr 16 07:18:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:21 2010 Subject: what architecture was MPI and PVM 1st designed for? In-Reply-To: Message-ID: On Tue, 16 Apr 2002, Jayne Heger wrote: > > Hi, > > Coulld anyone tell me what computer architecture MPI and PVM were first > designed for./written on. See http://www.epm.ornl.gov/pvm/ and look under "documentation" for "PVM and MPI: A comparison of features". Read the "Background" section. Among many other sources, but this is terse and probably adequate, close to "horse's mouth" accurate for PVM (but "Project Overview" is also there and IS horse's mouth:-) and of course I'm sure that the primary MPI sites have similar historical stuff linked. In a very terse nutshell, PVM was written for the kitchen sink (whatever you happened to have handy and networked). MPI was written by a consortium of vendors and users to provide a common API for large, expensive massively parallel computers. As I understand it this wasn't really the vendors' idea -- they would've been happy to continue providing only their proprietary interfaces -- but the government finally put its foot down as it learned just how much money it was spending, first on the iron, then on porting code to run on the iron, and then on NEW iron and RE-porting their ported code to run on the NEW iron, etc. Moore's law demanding that they rebuy everything every few years or actually loose ground, of course... rgb > Thanks, > > Jayne Heger > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Sebastien.Cabaniols at Compaq.com Tue Apr 16 03:57:18 2002 From: Sebastien.Cabaniols at Compaq.com (Cabaniols, Sebastien) Date: Tue Mar 16 01:02:21 2010 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? Message-ID: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Hi beowulfs! Would it be interesting to decrease the #define HZ in the linux kernel for CPU/Memory bound computationnal farms ? (I just posted the same question to lkml) I mean we very often have only one running process eating 99% of the CPU, but we (in fact I) don't know if we loose time doing context switches .... Did anyone experiment on that ? Thanks in advance From rgb at phy.duke.edu Tue Apr 16 08:29:03 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Mar 16 01:02:21 2010 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? In-Reply-To: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Message-ID: On Tue, 16 Apr 2002, Cabaniols, Sebastien wrote: > Hi beowulfs! > > Would it be interesting to decrease the #define HZ in the linux kernel > for CPU/Memory bound computationnal farms ? > (I just posted the same question to lkml) > > I mean we very often have only one running process eating 99% of > the CPU, but we (in fact I) don't know if we loose time doing context > switches .... > > Did anyone experiment on that ? > > Thanks in advance This was discussed a long time ago on kernel lists. IIRC (and it was a LONG time ago -- years -- so don't shoot me if I don't) the consensus was that Linus was comfortable keeping HZ where it provided very good interactive response time FIRST (primary design criterion) and efficient for long running tasks SECOND (secondary design criterion) so no, they weren't considering retuning anything anytime soon. Altering HZ isn't by any means guaranteed to improve task granularity (the scheduler already does a damn good job there and is hard to improve). Also, because there are a LOT of things that use it, written by many people some of whom may well not have used it RIGHT, altering HZ may cause odd side effects or break things. I wouldn't recommend it unless you are willing to live without or work pretty hard to fix whatever breaks. The context switch part of the question is a bit easier. By strange chance, I'm at this moment running a copy of xmlsysd and wulfstat (my current-project cluster monitoring toolset) on my home cluster, where (to help Jayne this morning) I also cranked up pvm and the xep mandelbrot set application. So it is easy for me to test this. During a panel update (with all my nodes whonking away on doing mandelbrot set iterations) the context switch rate is negligible -- 12-16/second -- on true nodes (ones doing nothing but computing or twiddling their metaphorical thumbs). The rate hardly changes relative to the idle load when the system is doing a computation -- the scheduler is quite efficient. Interrupt rates on true nodes similarly remains very close to baseline of a bit more than 100/second even when doing the computations, which are of course quite coarse grained with only a bit of network traffic per updated strip per node and strip times on the order of seconds. So for a coarse grained, CPU intensive task running on dedicated nodes I doubt you'd see so much as 1% improvement monkeying with pretty much any "simple" kernel tuning parameter -- I think that single numerical jobs run at well OVER 99% efficiency as is. Note that on workstation-nodes (ones running a GUI and this and that) the story is quite different, although still good. For example, I'm running X, xmms (can't work without music, can we:-), the xep GUI, wulfstat (the monitoring client), galeon, and a dozen other xterms and small apps on my desktop; my sons are running X and screensavers on their systems downstairs (grrr, have to talk to them about that, or just plain disable that:-) and on THESE nodes the context switch rates range closer to 1300-1800/sec (the latter for those MP3's). Interrupt rates are still just over 100/sec -- this tends to vary only when doing some sort of very intensive I/O. Note that even mp3 decoding only takes a few percent of my desktop's CPU. However, beautieously enough, when I do an xep rubberband update, I still get SIMULTANEOUSLY flawlessly decoded mp3's (not so much as a bobble of the music stream) AND the maximum possible amount of CPU diverted to the mandelbrot strip computations and their display. I view this delightful responsiveness of linux as a very important feature. I've never hesitated to distribute CPU-intensive work around on linux workstation nodes with an adequate amount of memory because I'm totally confident that unless the application fills memory or involves a very latency-bounded (e.g. small packet network) I/O stream, the workstation user will notice, basically, "nothing" -- their interactive response will be changed below the 0.1 second threshold where they are likely to be ABLE to notice. The one place I can recall where altering system timings has made a noticeable difference in performance for certain classes of parallel tasks is Josip Loncaric's tcp retuning, and I believe that he worked quite hard at that for a long time to get good results. Even that has a price -- the tunings that he makes (again, IIRC, don't shoot me if I'm wrong Josip:-) aren't really appropriate for use on a WAN as some of the things that slow TCP down are the ones that make it robust and reliable across the routing perils of the open internet. rgb > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From leunen.d at fsagx.ac.be Tue Apr 16 08:39:25 2002 From: leunen.d at fsagx.ac.be (David Leunen) Date: Tue Mar 16 01:02:21 2010 Subject: Cannot find -lpvfs Message-ID: <3CBC45AD.4060600@fsagx.ac.be> Hi all, We've installed Scyld Beowulf 27bz-8 on our cluster. But we cannot make the mpich examples to link the .o files. Here is the error we get: /usr/bin/ld: cannot find -lpvfs This error is thrown on every try to link an mpi program... any idea? Have a good day. David From hahn at physics.mcmaster.ca Tue Apr 16 10:35:54 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:21 2010 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? In-Reply-To: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Message-ID: > Would it be interesting to decrease the #define HZ in the linux kernel > for CPU/Memory bound computationnal farms ? I'm guessing you're unaware that compute-bound processes actually get multiple 10ms slices (200ms or so, as I recall, but I'm remembering a discussion from 2.3.x days. Ingo's new scheduler probably preserves this limit.) > I mean we very often have only one running process eating 99% of > the CPU, but we (in fact I) don't know if we loose time doing context > switches .... think of the numbers a bit: it's basically impossible to buy a <1 GHz processor today, so you're getting at O(100M) instrs/HZ. if you're cache-friendly, you'll probably have >1 instr/cycle, so scale the number appropriately. perhaps you're worried about cache pollution? the kernel's footprint is fairly small, probably <4K or so for timer-irq-scheduler-nopreempt. since a null syscall is ~1 us or ~1000 instrs, and the work is about the same, I really don't think there's anything to worry about. there are people who run HZ=1024 or higher on ia32; I don't personally think they know what the heck they're doing, but they like it, and don't report any serious problems. From lindahl at keyresearch.com Tue Apr 16 08:31:46 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue Mar 16 01:02:21 2010 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? In-Reply-To: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net>; from Sebastien.Cabaniols@compaq.com on Tue, Apr 16, 2002 at 12:57:18PM +0200 References: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Message-ID: <20020416083146.B2918@wumpus.attbi.com> On Tue, Apr 16, 2002 at 12:57:18PM +0200, Cabaniols, Sebastien wrote: > Would it be interesting to decrease the #define HZ in the linux kernel > for CPU/Memory bound computationnal farms ? > (I just posted the same question to lkml) All pre-compiled user programs would then have the wrong HZ. So /bin/time wouldn't work anymore. As for "what HZ would be a good value?", Alpha has always used 1000, and it isn't a significant performance hit. But x86 started life on much slower machines, and now we're stuck with 100, unless you want to rebuild ALL your packages. I suspect IA64 uses 100 for compatibility reasons. I wonder how the x86 emulator on AlphaLinux got around this... hm... greg From lindahl at keyresearch.com Tue Apr 16 08:39:27 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Tue Mar 16 01:02:21 2010 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.0.20020416122156.05491530@62.70.89.10>; from Hakon.Bugge@scali.com on Tue, Apr 16, 2002 at 12:24:37PM +0200 References: <3CBAA021.DB753C6F@markus-fischer.de> <5.1.0.14.0.20020416122156.05491530@62.70.89.10> Message-ID: <20020416083927.C2918@wumpus.attbi.com> On Tue, Apr 16, 2002 at 12:24:37PM +0200, H?kon Bugge wrote: > Also, lack of page coloring will contribute to > different execution times, even for a sequential program. Andrea's kernel patches now have page coloring in them. The code has lived a tortured life, originally written by the Real World Computing guys, rewritten by me, rewritten by Jason Popadopoulos at UMd, and then by Andrea. 3 continents. > I disagree for two reasons; > first, you imply that venture capitalists are naive (and to some extent > stupid). That's what the local Silicon Valley VC tell me about VC. I guess non-Silicon Valley VC are smarter, then ;-) > It is to me strange, that no Pallas PMB > benchmark results ever has been published for a reasonable sized system > based on alternative interconnect technologies. To quote Lord Kelvin: "If > you haven't measured it, you don't know what you're talking about". Maybe that's because other people are measuring their applications, and not yet another synthetic benchmark? All-to-all isn't interesting to me. I have plenty of bisection measurements, though, as that's how I debug Myrinet. Typical variations are around 2%, by the way. Lord Kelvin engaged in a 10 year flamewar in the Letters of the RAS against people who thought the Sun was powered by nuclear fusion. He believed that it was only 10 million years old, and was powered by gravitational collapse. His mistake was ignoring geological evidence because he didn't understand it. He probably wrote that quote during that flamewar. It didn't make him right. greg From raysonlogin at yahoo.com Tue Apr 16 12:39:39 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Tue Mar 16 01:02:21 2010 Subject: again OpenPBS vs SGE In-Reply-To: <200204161902.XAA04166@nocserv.free.net> Message-ID: <20020416193939.95169.qmail@web11403.mail.yahoo.com> If you are looking for _free_ batch systems, you should choose SGE. --- Mikhail Kuzminsky wrote: > III. Some SGE minuses > 1) Do not support "multiclustering" I believe you can setup multiple "SGE_CELL"s to partition your cluster (I've never played with that before) Or you can use Globus, or other 3rd party scheduler on top of SGE. > 2) The schedule algorithms are restricted to only one > default (this is inconsistent w/Chris Black message, as > I understand) You talking about SGE 5.2.x? Chris Black must be talking about SGE 5.3, which has several advanced nice scheduler features: http://www.hardi.se/products/literature/sun_grid_engine.pdf > > IV. Some SGE pluses > > 1) Reliable work Ron Chen has been talking about the new shadow master on the SGE mailing list. which he said will improve fault tolerance, but I've never heard anything yet... > 2) Globus Grid is integrated (?? is it correct ?) correct. > 3) There is support of job migration > Also, you may want to look at job arrays, which is not available in PBS. (the other batch system which has job arrays is LSF) You can download the SGE source from: http://gridengine.sunsource.net Rayson __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From josip at icase.edu Tue Apr 16 12:58:40 2002 From: josip at icase.edu (Josip Loncaric) Date: Tue Mar 16 01:02:21 2010 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? References: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> <20020416083146.B2918@wumpus.attbi.com> Message-ID: <3CBC8270.814F9387@icase.edu> Greg Lindahl wrote: > > As for "what HZ would be a good value?", Alpha has always used 1000, > and it isn't a significant performance hit. But x86 started life on > much slower machines, and now we're stuck with 100, unless you want to > rebuild ALL your packages. > > I suspect IA64 uses 100 for compatibility reasons. I wonder how the > x86 emulator on AlphaLinux got around this... hm... A minor correction: HZ=1024 on Alphas and on ia64 (elsewhere HZ=100). HZ=1024 helps, e.g. it prevents certain kinds of timer-resolved TCP stalls in kernel 2.2 on Alphas. However, recompiling user programs which were built with HZ=100 would be a pain... and one might uncover new problems with the i386 hardware which has not been tested much with HZ=1024. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From ting at fai.fujitsu.com Tue Apr 16 15:08:59 2002 From: ting at fai.fujitsu.com (Ting) Date: Tue Mar 16 01:02:21 2010 Subject: Parallel BLAST - help In-Reply-To: <3450CC8673CFD411A24700105A618BD6267DC4@911TURBO> Message-ID: Hello, All, I have three nodes Beowulf cluster MPI environment up and running now. And download the FASTA from NCBI on the master node. I successful wrote a code to break the data, but unfortunately I could not have the runable code to get the data back from the nodes to the host(master). :-( Can anyone give me some suggestion or web site that I can have the runable code to use? It would help me a lot. Thank you very much. Ting -----Original Message----- From: Steve Gaudet Sent: Monday, April 15, 2002 11:12 AM To: 'William R. Pearson'; beowulf@beowulf.org Subject: RE: Parallel BLAST > -----Original Message----- > From: William R. Pearson > Sent: Sunday, April 14, 2002 10:32 PM > To: beowulf@beowulf.org > Subject: Parallel BLAST > > > > > Why is it that BLAST is not available for MPI/PVM? I would think > > clusters would be the prefect host for such an application. > > Is it there is no need because BLAST is already so fast and > > no one wants to break the database out onto node-resident disks? > > Or is it that BLAST is kept running on single processor or > shared memory > > machines BLAST so that the DB is always in memory ready to > roll without > > loading and doing the same for a cluster is not worth it > > because the same trick is difficult to do on a node given > the current > > way clusters are built? I assume the same is true for FASTA? > > I suspect that BLAST is not available for MPI/PVM because (1) it is > too fast, and (2) there is not much demand for it. > > 95% of the time, BLAST is almost an in-memory grep (the other 5% of > the time it is working on the things it is looking for). Sequence > comparison is embarrassingly parallel, and very easily threaded. > Distributing the sequence databases and collecting results has more > overhead (there probably aren't many distributed grep programs > either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is > another 5-20X slower than FASTA. Here, the communications overhead is > low, and distributed systems work OK for FASTA, and great for > Smith-Waterman (where the overhead fraction is very small). > > Of course, it is a lot easier to compile a threaded program, and just > run it, than it is to install and configure the MPI or PVM environment > and the programs to run in it. Bioinformatics software is often run > by computer savvy biologists, not high-performance computing folks, > and not having to install and configure PVM/MPI is a big advantage. > The NCBI probably does not make a PVM/MPI parallel BLAST because there > is very little demand for it, and it does not meet their computational > needs. -------------- There's also a commerical version from Turbogenomics. http://www.turbogenomics.com Offering: 1) Ready to go, plug-n-play solution for parallel BLAST 2) Expertise and 20+ years of experience in parallel computing 3) Dynamic database splitting feature to take advantage of computers that have less memory than the size of the database 4) Smart load balancing - achieve linear to superlinear speedup 5) No modification made to the NCBI BLAST algorithm to ensure identical results with the non-parallel version 6) Easy drop-in update whenever NCBI releases newer versions of their algorithm 7) Excellent support 8) 30-days money back guarantee Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From aby_sinha at yahoo.com Tue Apr 16 18:18:16 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Tue Mar 16 01:02:21 2010 Subject: Dual Xeon Clusters Message-ID: <3CBCCD58.1080104@yahoo.com> Hi list members, I am building a dual Xeon 4-node cluster; My understanding of hyperthreading leaves me to conclusion that it depends largely on the code to benefit from it. Otherwise in many cases the performance can become worse than before using a hyperthreaded Xeon processors. My question is ; Are there any benchmarks available for the benchmarking of the Xeon processors in hyperthreaded mode ? Will the normal benchmarks that we use ..work on these systems and would it give a fair glance at the power of the Xeon .? If not what other way can i find the performance of Xeon processors in a clustered env. I am using 2.2 Ghz Xeon processors on an E7500 chipset. Thanks in advance to all Abhishek From ron_chen_123 at yahoo.com Tue Apr 16 20:58:31 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Tue Mar 16 01:02:21 2010 Subject: Data management on Beowulf Clusters? Message-ID: <20020417035831.90871.qmail@web14703.mail.yahoo.com> Hi, Is data management a real issue on Beowulf clusters? Does anyone have problems moving data from one node to another, or finds rcp not enough? I recently discovered that the Globus project has released the Globus ToolKit 2.0, which has some components for data grids. Here are some of their nice features that we may be able to take advantage of: 1) do data I/O accounting. 2) we don't depend on a shared filesystem anymore. 3) better security -- GridFTP is integrated with KRB5. 4) better performance in data transfer. I am wondering if anyone knows if we can take advantage of GridFTP and other components to solve data management problems on beowulf clusters? Any experience is welcome! And lastly, Globus is an opensource, non-profit project. -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From sp at scali.com Wed Apr 17 00:46:44 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:21 2010 Subject: Dual Xeon Clusters In-Reply-To: <3CBCCD58.1080104@yahoo.com> Message-ID: On Tue, 16 Apr 2002, Abhishek sinha wrote: > Hi list members, > > I am building a dual Xeon 4-node cluster; My understanding of > hyperthreading leaves me to conclusion that it depends largely on the > code to benefit from it. Otherwise in many cases the performance can > become worse than before using a hyperthreaded Xeon processors. My > question is ; Are there any benchmarks available for the benchmarking of > the Xeon processors in hyperthreaded mode ? Will the normal benchmarks > that we use ..work on these systems and would it give a fair glance at > the power of the Xeon .? If not what other way can i find the > performance of Xeon processors in a clustered env. > > I am using 2.2 Ghz Xeon processors on an E7500 chipset. > Hi, First of all you will have to use a 2.4.18 kernel with these E7500 motherboards. Second, if you take a look at the linux-kernel malinglist you will find a patch (originally developed bu Ingo Molnar, enhanced a bit by me) that will do some IRQ balancing on Xeon chipsets (with the stock 2.4.18 kernel i860 and E7500 chipsets are only able to handle interrupts with CPU0). I don't know if this patch has made it to the 2.4.19-pre kernels yet, but you can check them out too. Finally, I have some bad news about HT. I haven't been able to get it to work stable enough with 2.4.18 (haven't tested 2.4.19-pre). The thing is that in the beginning all works fine, but after a random amount of time things start to slow down. Suddenly you find yourself having 'top' using 50% system time which is not normal. Turning off HT in the BIOS solves this. As a side note I can tell you that the PCI architecture on this chipset is _much_ better than on i860 and you can expect it to perform well with high speed interconnects (Myrinet, SCI, GBE). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From shewa at inel.gov Wed Apr 17 06:45:05 2002 From: shewa at inel.gov (Andrew Shewmaker) Date: Tue Mar 16 01:02:21 2010 Subject: again OpenPBS vs SGE References: <20020416193939.95169.qmail@web11403.mail.yahoo.com> Message-ID: <3CBD7C61.4010108@inel.gov> Rayson Ho wrote: >You can download the SGE source from: > >http://gridengine.sunsource.net > I believe you must join the product (at least as an observer) or else you won't see the source in the download section. Andrew From timm at fnal.gov Wed Apr 17 06:48:05 2002 From: timm at fnal.gov (Steven Timm) Date: Tue Mar 16 01:02:21 2010 Subject: Dual Xeon Clusters In-Reply-To: <3CBCCD58.1080104@yahoo.com> Message-ID: > Hi list members, > > I am building a dual Xeon 4-node cluster; My understanding of > hyperthreading leaves me to conclusion that it depends largely on the > code to benefit from it. Otherwise in many cases the performance can > become worse than before using a hyperthreaded Xeon processors. My > question is ; Are there any benchmarks available for the benchmarking of > the Xeon processors in hyperthreaded mode ? Will the normal benchmarks > that we use ..work on these systems and would it give a fair glance at > the power of the Xeon .? If not what other way can i find the > performance of Xeon processors in a clustered env. > > I am using 2.2 Ghz Xeon processors on an E7500 chipset. > > Thanks in advance to all > > Abhishek The only way we found to do it in hyperthreading mode under Linux was just to keep on starting two instances of the process until we got one started on either 0 or 1 and the other on 2 or 3. It would be interesting to see a comparison of the SPEC rate benchmarks between the same machine with hyperthreading disabled and two processors, which is what we finally did, and hyperthreading enabled. Steve Timm > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From eugen at leitl.org Wed Apr 17 08:14:39 2002 From: eugen at leitl.org (Eugen Leitl) Date: Tue Mar 16 01:02:21 2010 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer Message-ID: http://slashdot.org/articles/02/04/17/1324227.shtml?tid=162 An anonymous reader wrote in to say "Pacific Northwest National Laboratory (US DOE) signed a $24.5 million dollar contract with HP for a Linux supercomputer. This will be one of the top ten fastest computers in the world. Some cool features: 8.3 Trillion Floating Point Operations per Second, 1.8 Terabytes of RAM, 170 Terabytes of disk, (including a 53 TB SAN), and 1400 Intel McKinley and Madison Processors. Nice quote: 'Today?s announcement shows how HP has worked to help accelerate the shift from proprietary platforms to open architectures, which provide increased scalability, speed and functionality at a lower cost,' said Rich DeMillo, vice president and chief technology officer at HP. Read Details of the announcement here or here." From robl at mcs.anl.gov Wed Apr 17 08:16:05 2002 From: robl at mcs.anl.gov (Robert Latham) Date: Tue Mar 16 01:02:21 2010 Subject: Cannot find -lpvfs In-Reply-To: <3CBC45AD.4060600@fsagx.ac.be> References: <3CBC45AD.4060600@fsagx.ac.be> Message-ID: <20020417151605.GG5243@mcs.anl.gov> On Tue, Apr 16, 2002 at 05:39:25PM +0200, David Leunen wrote: > We've installed Scyld Beowulf 27bz-8 on our cluster. But we cannot make > the mpich examples to link the .o files. Here is the error we get: > > /usr/bin/ld: cannot find -lpvfs > > This error is thrown on every try to link an mpi program... > any idea? -lpvfs ... that's the PVFS library. check if you have the pvfs-devel rpm installed. If it *is* installed, your mpicc needs to specify where to find it in LDFLAGS. The scyld guys are quite good at integrating all the software pieces though, so i bet this is not the case :> ==rob -- Rob Latham A215 0178 EA2D B059 8CDF B29D F333 664A 4280 315B From raysonlogin at yahoo.com Wed Apr 17 08:24:48 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Tue Mar 16 01:02:21 2010 Subject: again OpenPBS vs SGE In-Reply-To: <3CBD7C61.4010108@inel.gov> Message-ID: <20020417152448.49905.qmail@web11406.mail.yahoo.com> I am an observer of the project, but I never need to logon to the server to download source via cvs. http://gridengine.sunsource.net/servlets/ProjectSource But I think you need to logon to download the source code archives. Rayson --- Andrew Shewmaker wrote: > Rayson Ho wrote: > > >You can download the SGE source from: > > > >http://gridengine.sunsource.net > > > I believe you must join the product (at least as an observer) or else > you won't see the source > in the download section. > > Andrew > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Wed Apr 17 10:14:15 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Tue Mar 16 01:02:21 2010 Subject: Dual Xeon Clusters In-Reply-To: Message-ID: > The only way we found to do it in hyperthreading mode under Linux was > just to keep on starting two instances of the process until we got one > started on either 0 or 1 and the other on 2 or 3. It would be there are a number of cpu-affinity patches for 2.4 and 2.5 (I doubt anyone has bothered with 2.2) From leandro at ep.petrobras.com.br Wed Apr 17 10:17:57 2002 From: leandro at ep.petrobras.com.br (Leandro Tavares Carneiro) Date: Tue Mar 16 01:02:21 2010 Subject: HIGH MEM suport for up to 64GB Message-ID: <1019063877.1795.13.camel@linux60> Hi everyone, I am writing to ask to you all if anyone have tesed or used an machine with more than 4GB of RAM or paging in virtual memory on intel machines. He have an linux beowulf cluster and one of ours developers have asked us for how much memory an process can allocate to use. In the tests we have made, we cannot allocate much more than 3GB, using an dual PIII with 1GB of ram and 12Gb of swap area for testing. We can use 2 process alocating more or less 3Gb, but one process alone canot pass this test. We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High Mem suport. I have tested the same test aplication on an Itanium machine, with 1GB of ram and 16Gb of swap area, and they passed. The aplication can alocate more than 5GB of memory, using swap. In this machine, we are using turbolinux 7, with kernel version 2.4.4-010508-18smp. Thanks in advance for the help, Best regards, -- Leandro Tavares Carneiro Analista de Suporte EP-CORP/TIDT/INFI Telefone: 2534-1427 From leandro at ep.petrobras.com.br Wed Apr 17 10:30:54 2002 From: leandro at ep.petrobras.com.br (Leandro Tavares Carneiro) Date: Tue Mar 16 01:02:21 2010 Subject: HIGH MEM suport for up to 64GB Message-ID: <1019064654.1795.18.camel@linux60> Hi, Anyone have tesed or used an machine with more than 4GB of RAM or paging in virtual memory on intel machines? He have an linux beowulf cluster and one of ours developers have asked us for how much memory an process can allocate to use. In the tests we have made, we cannot allocate much more than 3GB, using an dual PIII with 1GB of ram and 12Gb of swap area for testing. We can use 2 process alocating more or less 3Gb, but one process alone canot pass this test. We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High Mem suport. I have tested the same test aplication on an Itanium machine, with 1GB of ram and 16Gb of swap area, and they passed. The aplication can alocate more than 5GB of memory, using swap. In this machine, we are using turbolinux 7, with kernel version 2.4.4-010508-18smp. If this works, we can improve our applications. Thanks in advance for the help, and sorry about my bad english. Best regards, -- Leandro Tavares Carneiro Analista de Suporte EP-CORP/TIDT/INFI Telefone: 2534-1427 From sp at scali.com Wed Apr 17 11:44:10 2002 From: sp at scali.com (Steffen Persvold) Date: Tue Mar 16 01:02:21 2010 Subject: HIGH MEM suport for up to 64GB In-Reply-To: <1019064654.1795.18.camel@linux60> Message-ID: On 17 Apr 2002, Leandro Tavares Carneiro wrote: > Hi, > > Anyone have tesed or used an machine with more than 4GB of RAM or paging > in virtual memory on intel machines? > He have an linux beowulf cluster and one of ours developers have asked > us for how much memory an process can allocate to use. In the tests we > have made, we cannot allocate much more than 3GB, using an dual PIII > with 1GB of ram and 12Gb of swap area for testing. > We can use 2 process alocating more or less 3Gb, but one process alone > canot pass this test. > We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High > Mem suport. > I have tested the same test aplication on an Itanium machine, with 1GB > of ram and 16Gb of swap area, and they passed. The aplication can > alocate more than 5GB of memory, using swap. In this machine, we are > using turbolinux 7, with kernel version 2.4.4-010508-18smp. > If this works, we can improve our applications. > There is simply no way you can make a 32 bit machine address more than 4GB of memory in a single application simply because memory pointers are only 32 bit (2^32 = 4GB). The reason why you can only address 3GB on Linux is that normally 1GB of the virtual memory area is reserved for the kernel (can be trimmed down to 512MB, which gives you 3.5GB accessible from userspace). Sure, you can have several applications each using 3GB if you have the memory for it (either swap or real) providing that you use the 64GB option in Linux. Huge memory requirement from applications is one of the reasons people choose a 64bit platform (ppc, sparc, s390, alpha and ia64). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 |