From amir at wira1.cs.usm.my  Thu Mar  1 02:32:33 2001
From: amir at wira1.cs.usm.my (A Hamzah Jaafar)
Date: Thu, 1 Mar 2001 10:32:33 +0000 (MALAYSIA)
Subject: Envrionmental Modelling
Message-ID: <Pine.SUN.4.31.0103011023320.23653-100000@wira1>

Hi!

Just wondering. Do any one of you have any pointers to works in
envrionmental modelling on the cluster?.

p/s:- If none, this could be an area for masters research right? any
suggestion

Thanks :)

-amir-
Research Officer
Parallel and Distributed Computing Centre
School of Computer Science
USM, Malaysia


From Sebastien.Cabaniols at compaq.com  Thu Mar  1 00:22:57 2001
From: Sebastien.Cabaniols at compaq.com (Cabaniols, Sebastien)
Date: Thu, 1 Mar 2001 09:22:57 +0100 
Subject: Matlab & Beowulf
Message-ID: <1FF17ADDAC64D0119A6E0000F830C9EA04B3CCA2@aeoexc1.aeo.cpqcorp.net>


		-----Original Message-----
		From:	Ken [mailto:lowther at att.net]
		Sent:	mercredi 28 f?vrier 2001 22:48
		To:	Cosmik Debris
		Cc:	beowulf at beowulf.org
		Subject:	Re: Matlab & Beowulf

		Cosmik Debris wrote:
		> 
		> Yes we use it extensively. We run matlab on a 16 node dual
processor system
		> which uses Mosix for load balancing. Works a treat for us.
There are a few
		> issues with the new Java front end not migrating.
		> 

		Few days back someone brought up a parallelized version of
Octave. 
		Might want to check it out.
		Search goggle with "octave" + "mpi".


		Try also Scilab Parallel
		-- 
		Ken Lowther
		Youngstown, Ohio
		http://www.atmsite.org

		_______________________________________________
		Beowulf mailing list, Beowulf at beowulf.org
		To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf


From gunnar at linuxkonsult.com  Thu Mar  1 01:20:03 2001
From: gunnar at linuxkonsult.com (Gunnar Lindholm)
Date: Thu, 1 Mar 2001 10:20:03 +0100
Subject: Matlab & Beowulf
In-Reply-To: <1FF17ADDAC64D0119A6E0000F830C9EA04B3CCA2@aeoexc1.aeo.cpqcorp.net>
References: <1FF17ADDAC64D0119A6E0000F830C9EA04B3CCA2@aeoexc1.aeo.cpqcorp.net>
Message-ID: <0103011022082K.00289@gunnar>

> >Few days back someone brought up a parallelized version of
> > Octave. 
> >Might want to check it out.
> >Search goggle with "octave" + "mpi".
> 
> 
> 
>Try also Scilab Parallel
Has any body here been running Scilab Parallel and would like to make
some comments on how well it works? Has anybody looked at the
(upcomming) 2.6 release of Scilab?

Gunnar.


From yocum at linuxcare.com  Thu Mar  1 06:46:11 2001
From: yocum at linuxcare.com (Dan Yocum)
Date: Thu, 01 Mar 2001 08:46:11 -0600
Subject: Questions and Sanity Check
References: <Pine.LNX.4.21.0102271438510.14879-100000@eleanor.wdhq.scyld.com>
Message-ID: <3A9E60B3.1B3DD1@linuxcare.com>

Daniel Ridge wrote:
> 
> On Tue, 27 Feb 2001, Keith Underwood wrote:
> 
> > I would use larger hard drives.  The incremental cost from 10GB to 30GB
> > should be pretty small and you may one day appreciate that space if you
> > use something like PVFS.  I would also consider a gigabit uplink to the
> > head node if you are going to use Scyld.  It drastically improved our
> > cluster booting time to have a faster link to the head.
> >
> >                               Keith


Since I haven't built/booted a Scyld cluster yet, and have only seen Don
talk about it at Fermi, please excuse my potentially naive comments.


> For people who are spending a lot of time booting their Scyld slave nodes
> -- I would suggest trimming the library list.
> 
> This is the list of shared libraries which the nodes cache for improved
> runtime migration performance. These libraries are transferred over to
> the nodes at node boot time.


Hm.  Wouldn't it be better (i.e., more efficient) to cache these libs on
a small, dedicated partition on the worker node (provided you have a
disk available, of course) and simply check that they're up-to-date each
time you boot and only update them when they change, say, via rsync?

Cheers,
Dan


-- 
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.  630.697.8066 tel
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Putting open source to work.


From yocum at linuxcare.com  Thu Mar  1 07:42:48 2001
From: yocum at linuxcare.com (Dan Yocum)
Date: Thu, 01 Mar 2001 09:42:48 -0600
Subject: Cluster via USB
References: <3A93EEE6.C20A6454@brr.de>
Message-ID: <3A9E6DF8.D9381D58@linuxcare.com>

Sven,

Check the linux-ha archives from early October last year for some
discussions between Alan Robertson and Eric Ayers on providing
communication over USB.  The goal was to provide a serial heartbeat over
PPP on USB and if I recall correctly, Eric was successful.  Of course,
with a heartbeat, bandwidth and speed aren't critical, but it's a
starting place.  The archives are here:

http://community.tummy.com/pipermail/linux-ha-dev/

Hope that helps,
Dan


Sven Hiller wrote:
> 
> Hi,
> 
> did somebody tried to link some nodes together via USB?
> The Pros are cheep (standard equipment), probably sufficient fast for
> some
> applications (depends on... -;) and easy to install without to open the
> computer
> (e.g., in the case of warranty or leasing contract).

-- 
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.  630.697.8066 tel
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Putting open source to work.


From yocum at linuxcare.com  Thu Mar  1 08:23:57 2001
From: yocum at linuxcare.com (Dan Yocum)
Date: Thu, 01 Mar 2001 10:23:57 -0600
Subject: Athlon mobo vendor recommendations? [was: Re: Athlon vs Pentium III]
References: <3A92AA91.C6E6871F@lnxi.com>
Message-ID: <3A9E779D.2E10EC74@linuxcare.com>

Cameron Harr wrote:
> 
> I agree with David in general. Asus positioned their A7V as the "best"
> Athlon motherboard, but it is not

OK, this is somewhat off topic, but since this is by far the best forum
for asking about hardware vendors, seeing as that everyone here plays
with lots and lots of hardware from lots and lots of OEM's, I'll ask
anyway.  Feel free to respond to me off the list if you wish.

I'm about to purchase an Athlon 1.2GHz with the 266MHz FSB which is
*finally* hitting the market and I'm wondering what mobo vendors have
people had good experience with.  Mainly I'm concerned with reliability,
but if people have info on performance, I'd take that, too.  So, who's
had good/bad experiences with mobo's from the following vendors?  

Biostar
Gigabyte
FIC
Asus

Am I missing anyone?

Thanks,
Dan


-- 
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.  630.697.8066 tel
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Putting open source to work.


From epaulson at students.wisc.edu  Thu Mar  1 09:40:07 2001
From: epaulson at students.wisc.edu (Erik Paulson)
Date: 01 Mar 2001 11:40:07 -0600
Subject: Sun Grid Software
In-Reply-To: 	<Pine.A41.4.21.0102281207050.10194-100000@gigue.dlib.indiana.edu>
Message-ID: <200103011740.LAA10892@mail5.doit.wisc.edu>

On 28 Feb 2001 12:11:35 -0500, John Marquart wrote:
> 
> Excuse me for being very naive on this subject matter, but how does Sun's
> offering differ from other "grid" software such as globus
> (http://www.globus.org/toolkit/), or CCAT
> (http://www.extreme.indiana.edu/ccat/index.html) ?
> 
> 
> Are there other grid configuration available?  
> How do these compare?
> 

It's just namespace collision; Sun's GridEngine has nothing to do with
the Grid work of Globus, NCSA, NPACI, IPG, Grid Forum, and everyone
else.

Grid Engine is much more like Condor, LSF and others. It'd be possible
for them to use the Grid services from Globus - we're doing that with
Condor, for example...

-Erik


> -j
> 
> 
> 
> John "Jamie" Marquart         |     This message posted 100% MS free.
> Digital Library SysAdmin      |  Work: 812-856-5174   Pager: 812-334-6018
> Indiana University Libraries  |  ICQ: 1131494         D'net Team:  6265
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From fabian at ahuizotl.uam.mx  Thu Mar  1 09:47:32 2001
From: fabian at ahuizotl.uam.mx (=?X-UNKNOWN?Q?Pe=F1a_Arellano_Fabi=E1n_Erasmo?=)
Date: Thu, 1 Mar 2001 11:47:32 -0600 (CST)
Subject: MPICH configuration with Absoft's compilers, gcc and g++.
Message-ID: <Pine.LNX.4.21.0103011026220.3765-100000@ahuizotl.uam.mx>

	Hello,

	I am new to this list and I would like to know how to configure
the MPICH software to work in a cluster of computers using Linux-Mandrake
7.2
	I would like to use the Absoft fortran compilers and GNU's gcc and
g++ that come with Linux. How should I configure MPICH? Is the following
correct?

./configure -prefix=/usr/local/mpich-1.2.1-pgi
--with-device=ch_p4
--with-arch=LINUX -cc=gcc -cflags="-O2" -c++=g++
-c++flags="-O2" -fc=/usr/absoft/bin/f77
-fflags="-N109" -f90=/usr/absoft/bin/f90 -f90flags= -fortnames=CAPS
--without-devdebug -mpe -file_system=nfs+ufs


-prefix=/usr/local/mpich-1.2.1-pgi	directory to install software,
--with-device=ch_p4 		appropiate device for Linux clusters.
--with-arch=LINUX		OS used.
-cc=gcc 			GNU's C compiler.
-cflags="O2"			Options for the C compiler. What does the "O2" option do?
-c++=g++			GNU's C++ compiler.
-c++flags="O2"			Options for the C++ compiles. What does the "O2" option
				do?
-fc=/usr/absoft/bin/f77		Full path to Absoft's f77 compiler.
-fflags="-N109"			For folding all symbolic names to upper case. What for?
-f90=/usr/absoft/bin/f90	Full path to Absoft's 790 compiler.
-f90flags=			Options for f90 compilers. Is it correct
				to have it empty?
-fortnames=CAPS			What is this for? I didn't find this in
				the manual for installation.
--without-devdebug		Since I am not a MPI implementor.
-mpe 				Extensions for logging and X graphics.
-file_system=nfs+ufs		What's ufs?	


	Thanks in advance for reding this e-mail.

	Fabian.


From carlos at nernet.unex.es  Thu Mar  1 09:48:45 2001
From: carlos at nernet.unex.es (=?Windows-1252?Q?Carlos_J._Garc=EDa_Orellana?=)
Date: Thu, 1 Mar 2001 18:48:45 +0100
Subject: Scyld and PVM
Message-ID: <019c01c0a277$dcfd2740$7c12319e@unex.es>

Hello,

Has anybody a Scyld Beowulf running PVM?
How do you do it?

Carlos J. Garcia Orellana.
Universidad de Extremadura
carlos at nernet.unex.es


From newt at scyld.com  Thu Mar  1 09:54:28 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 1 Mar 2001 12:54:28 -0500 (EST)
Subject: Questions and Sanity Check
In-Reply-To: <3A9E60B3.1B3DD1@linuxcare.com>
Message-ID: <Pine.LNX.4.21.0103011243510.5877-100000@eleanor.wdhq.scyld.com>

On Thu, 1 Mar 2001, Dan Yocum wrote:

> Daniel Ridge wrote:

> Since I haven't built/booted a Scyld cluster yet, and have only seen Don
> talk about it at Fermi, please excuse my potentially naive comments.
> 
> 
> > For people who are spending a lot of time booting their Scyld slave nodes
> > -- I would suggest trimming the library list.
> > 
> > This is the list of shared libraries which the nodes cache for improved
> > runtime migration performance. These libraries are transferred over to
> > the nodes at node boot time.
> 
> 
> Hm.  Wouldn't it be better (i.e., more efficient) to cache these libs on
> a small, dedicated partition on the worker node (provided you have a
> disk available, of course) and simply check that they're up-to-date each
> time you boot and only update them when they change, say, via rsync?

Possibly. We're working on making available versions of our software that
simulateously host multiple pid spaces from different frontends. In this
situation, you could wind up needing 1 magic partition per frontend -- as
each master could have its own set of shared libraries.

Also, I think Amdahl's law kicks in and tells us that the potential
speedup is small in most cases (with respect to my trimming comment
above) and that there might be other areas that are worth more attention
in lowering boot times. On my VMware slave nodes, it costs me .5 seconds
to transfer my libraries but still takes me the better part of a minute to
get the damn BIOS out of the way.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From yocum at linuxcare.com  Thu Mar  1 11:47:28 2001
From: yocum at linuxcare.com (Dan Yocum)
Date: Thu, 01 Mar 2001 13:47:28 -0600
Subject: Questions and Sanity Check
References: <Pine.LNX.4.21.0103011243510.5877-100000@eleanor.wdhq.scyld.com>
Message-ID: <3A9EA750.CBF05560@linuxcare.com>

Daniel Ridge wrote:
> 
> On Thu, 1 Mar 2001, Dan Yocum wrote:
> 
> > Daniel Ridge wrote:
> 
> > Since I haven't built/booted a Scyld cluster yet, and have only seen Don
> > talk about it at Fermi, please excuse my potentially naive comments.
> >
> >
> > > For people who are spending a lot of time booting their Scyld slave nodes
> > > -- I would suggest trimming the library list.
> > >
> > > This is the list of shared libraries which the nodes cache for improved
> > > runtime migration performance. These libraries are transferred over to
> > > the nodes at node boot time.
> >
> >
> > Hm.  Wouldn't it be better (i.e., more efficient) to cache these libs on
> > a small, dedicated partition on the worker node (provided you have a
> > disk available, of course) and simply check that they're up-to-date each
> > time you boot and only update them when they change, say, via rsync?
> 
> Possibly. We're working on making available versions of our software that
> simulateously host multiple pid spaces from different frontends. In this
> situation, you could wind up needing 1 magic partition per frontend -- as
> each master could have its own set of shared libraries.
> 
> Also, I think Amdahl's law kicks in and tells us that the potential
> speedup is small in most cases (with respect to my trimming comment
> above) and that there might be other areas that are worth more attention
> in lowering boot times. On my VMware slave nodes, it costs me .5 seconds


You've got a "cluster" on a single machine running multiple versions of
VMware, right?  So, the transfer of the libs would be understandably
faster on a virtual interface - it's not like your sending them via a
real NIC.  

Hold it.  How big are the shared libs?  If they're tiny, then yeah,
ferget it.  No big deal tranferring them over (I don't know big your
libs are).  What I'm concerned about is transferring 40MB, or more, to
hundreds of nodes, hundreds of times.  Then there would be a definite
increase in bootup time to have big libs on the individual nodes. 
Unless, you multicast the libs out to the worker nodes... ;-)

> to transfer my libraries but still takes me the better part of a minute to
> get the damn BIOS out of the way.

Well, yeah, there is that.  Have you tried running Beowulf2 on machines
with Linux BIOS?  Now that'd be cool to see - a Beowulf cluster come up
in 3 seconds.  :)

Cheers,
Dan

-- 
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.  630.697.8066 tel
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Putting open source to work.


From zarquon at zarq.dhs.org  Thu Mar  1 17:01:20 2001
From: zarquon at zarq.dhs.org (R C)
Date: Thu, 1 Mar 2001 20:01:20 -0500
Subject: Channel bonding / HP switch question
Message-ID: <20010301200120.B848@zarq.dhs.org>

Hello, I have searched the list archives for an answer to my problem, and
had no luck, so I decided to ask directly.

I'm working on a small scale beowulf cluster, and am trying to get channel
bonding functional, and am running into problems with the switch.

The problem seems to be that the HP switch we are using, a Procurve 4000M,
does not support duplicate MAC addresses, even on separate VLANs.  It
does support several trunking techniques (Cisco Fast EtherChannel, 
Source Address Trunking, and Source Address / Destination Address trunking).

This of course means no channel bonding as is, and the other trunking 
techniques above limit the peak throughput between individual nodes (they
do allow more bandwidth between multiple nodes, but all traffic between
a specific pair uses the same link).

So my question is, how difficult would it be to change channel bonding
to function with multiple MAC addresses?  Has anyone looked into this
before?  I realize this vastly increases the configuration required for
channel bonding (MAC address/IP/Interface mapping).  

Thanks,
Robert Cicconetti


From clpoh at pl.jaring.my  Thu Mar  1 17:13:36 2001
From: clpoh at pl.jaring.my (clpoh)
Date: Fri, 02 Mar 2001 09:13:36 +0800
Subject: question
Message-ID: <3A9EF3BF.E22B8078@pl.jaring.my>

To everyone in the Beowulf Mailing List,

    I am a college student from Malaysia. I am researching about Beowulf
Project for my Honours Project. Below I have a series of question, if
you feel free to answer it please answer it for me and send it back to
clpoh at pl.jaring.my. Your help is very much appreciated. Thank you.

From,
Poh Chean Leng
clpoh at pl.jaring.my

Interview Question

1.How long have you been involving in research and development of
clustering (parallel processing) technology?

2.Have you involved in any clustering project? If you do, please briefly
explain the project (for example hardware and software configuration).

3.How would you define the meaning of supercomputer?

4.What do you think about the conventional way of implementing
supercomputer? Please state the main barrier of implementing
supercomputer in small organisation.

5.Do you know about the Beowulf Project? If do please briefly explain
what do you understand about the project. If not please refer to the
background studies of the project attached to the question.

6.Do you think that the Beowulf Project has any impact to the latest
clustering technology? Please briefly explain your opinion.

7.Can you state the pros and cons of the Beowulf Project?

8.In your opinion, what are the main factors of the successfulness of
Beowulf Project?


From kragen at pobox.com  Thu Mar  1 23:22:28 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Fri, 2 Mar 2001 02:22:28 -0500 (EST)
Subject: Unisys
Message-ID: <200103020722.CAA15209@kirk.dnaco.net>

Eugene Leitl <Eugene.Leitl at lrz.uni-muenchen.de> writes:
> . . . but there is no reason why distributed memory could not be
> emulated by hardware on an efficient message-passing infrastructure.

There's a difference between "could not be" and "is not".


From kragen at pobox.com  Thu Mar  1 23:22:44 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Fri, 2 Mar 2001 02:22:44 -0500 (EST)
Subject: QNX
Message-ID: <200103020722.CAA15280@kirk.dnaco.net>

Alan Grimes <alangrimes at starpower.net> writes:
> I've recieved some intel that indicates that not only is the QNX
> operating system vastly superior to linux in almost every way, it also
> directly supports computational clusters! =)

Good.  Go use QNX and stop posting here.  Feel free to come back if
you think you're ready to post things related to Beowulfs on the
Beowulf mailing list.


From makmorbi at hotmail.com  Fri Mar  2 08:27:31 2001
From: makmorbi at hotmail.com (mehul kochar)
Date: Fri, 02 Mar 2001 08:27:31 
Subject: 8 node cluster help!
Message-ID: <F74FRaqbC8wSA1XvpFV00025a26@hotmail.com>

Hello Guys,
I have an 8 node cluster and am running Redhat Linux 6.2. I have installed 
the HPLinpack benchmark and am running it. What kind of performance should I 
expect? The machines are Pentium III 550 MHz. I am getting a performance of 
0.79 GHz. That seems less. What can be the reasons? What should I do to 
improve my performance.
I would really appreciate it.
Thanks
Mehul
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.


From Eugene.Leitl at lrz.uni-muenchen.de  Fri Mar  2 01:28:30 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 2 Mar 2001 10:28:30 +0100 (MET)
Subject: Unisys
In-Reply-To: <200103020722.CAA15209@kirk.dnaco.net>
Message-ID: <Pine.GSO.4.03.10103021012020.721-100000@sun1.lrz-muenchen.de>

On Fri, 2 Mar 2001 kragen at pobox.com wrote:

> Eugene Leitl <Eugene.Leitl at lrz.uni-muenchen.de> writes:
> > . . . but there is no reason why distributed memory could not be
> > emulated by hardware on an efficient message-passing infrastructure.
> 
> There's a difference between "could not be" and "is not".

Yes, so hardware designers should take heed. HyperTransport could
use some distributed memory emulation.


From admin at madtimes.com  Fri Mar  2 06:22:17 2001
From: admin at madtimes.com (Timm Murray)
Date: Fri, 2 Mar 2001 09:22:17 -0500
Subject: Cluster via USB
Message-ID: <002501c0a335$63041520$0353aad0@classified>

The new standard looks good (512 Gb/s, IIRC), but current USB
just isn't fast enough for a lot of things (like you said, it all depeneds).
However, the new standard may give a better price/performance
ratio then gigabit ethernet or Myrinet, depending on how well
IP-over-USB scales.

On a somewhat related note, what about FireWire?

Sven Hiller wrote on 2/28/01 7:26 am:

>Hi,
>
>did somebody tried to link 
>some nodes together via 
>USB? The Pros are cheep 
>(standard equipment), 
>probably sufficient fast for 
>some
>applications (depends on... 
>-;) and easy to install 
>without to open the 
>computer
>(e.g., in the case of warranty 
>or leasing contract).
>
>Any comments are very 
>welcome.
>
>Sven
>
>--
>-----------------------------
>------
>Dr. Sven Hiller
>Turbine Aerodynamics/CFD
>Rolls-Royce Deutschland Ltd 
>& Co KG
>Eschenweg 11
>D-15827 
>Dahlewitz/Germany
>Tel: +49-33708-6-1142
>Fax: +49-33708-6-3292
>e-mail: 
>sven.hiller at rolls-royce.com
>-----------------------------
>------
>
>
>
>
>_____________________________
>__________________ Beowulf 
>mailing list, 
>Beowulf at beowulf.org
>To change your subscription 
>(digest mode or 
>unsubscribe) visit 
>http://www.beowulf.org/mai
>lman/listinfo/beowulf


Timm Murray

-----------
Great spirits have allways encountered violent opposition from mediocre minds
--Albert Einstein


From becker at scyld.com  Fri Mar  2 06:57:13 2001
From: becker at scyld.com (Donald Becker)
Date: Fri, 2 Mar 2001 09:57:13 -0500 (EST)
Subject: Questions and Sanity Check
In-Reply-To: <3A9EA750.CBF05560@linuxcare.com>
Message-ID: <Pine.LNX.4.10.10103020925260.879-100000@vaio.greennet>

On Thu, 1 Mar 2001, Dan Yocum wrote:
> Daniel Ridge wrote:
> > On Thu, 1 Mar 2001, Dan Yocum wrote:
> > > Daniel Ridge wrote:
> > > > For people who are spending a lot of time booting their Scyld slave nodes
> > > > -- I would suggest trimming the library list.
> > > >
> > > > This is the list of shared libraries which the nodes cache for improved
> > > > runtime migration performance. These libraries are transferred over to
> > > > the nodes at node boot time.
> > >
> > > Hm.  Wouldn't it be better (i.e., more efficient) to cache these libs on
> > > a small, dedicated partition...
..
> > Also, I think Amdahl's law kicks in and tells us that the potential
> > speedup is small in most cases (with respect to my trimming comment
> > above) and that there might be other areas that are worth more attention
> > in lowering boot times. On my VMware slave nodes, it costs me .5 seconds
>
> Hold it.  How big are the shared libs?  If they're tiny, then yeah,
> ferget it.  No big deal tranferring them over...

The cached libraries on the slave nodes are 10-40MB uncompressed.
That's on the order of 1 second of Fast Ethernet time to transfer the
compressed version.  The boot time isn't a significant issue.

A project that's on the "to do" list but not yet scheduled(*) is to
dynamically adjust the shared library list.

The Scyld Beowulf system could be booted with just a few cached elements
on the slaves, with frequently referenced libraries slowly added to the
cached list.

The existing caching technique isn't limited to libraries.  A subtle
aspect of the current ld.so design is that there is very little
difference between a library and an executable.  Full programs, say
a frequently-run 10MB simulation engine, could be cached on the slave
nodes without changing the code.

It's a larger step extending that concept to a persistent disk-based
cache.  We want to avoid that for philosophical reason: unless done
carefully, it reintroduces the risk of version skew, and there is a
slippery slope returning to the old full-node-install model.

(*) Yes, that's a hint to anyone looking for a project.

> > to transfer my libraries but still takes me the better part of a minute to
> > get the damn BIOS out of the way.
> 
> Well, yeah, there is that.  Have you tried running Beowulf2 on machines
> with Linux BIOS?  Now that'd be cool to see - a Beowulf cluster come up
> in 3 seconds.  :)

Ron Minnich uses Scyld Beowulf with his LinuxBIOS work.  He was demoing
the resulting "instant boot" clusters at SC2000 and the Extreme Linux
Developers Forum last week.  Some tuning must be done to reach a 3
second boot time -- some device drivers have needless delays and IDE
disks might take long time to respond after a reset.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From becker at scyld.com  Fri Mar  2 07:07:26 2001
From: becker at scyld.com (Donald Becker)
Date: Fri, 2 Mar 2001 10:07:26 -0500 (EST)
Subject: Cluster via USB
In-Reply-To: <002501c0a335$63041520$0353aad0@classified>
Message-ID: <Pine.LNX.4.10.10103020957570.879-100000@vaio.greennet>

On Fri, 2 Mar 2001, Timm Murray wrote:

> The new standard looks good (512 Gb/s, IIRC),

480Mb/sec, with larger block sizes so the overhead isn't as bad as with
the current version.  Expect to see the first chipset implementations no
sooner than a year from now, with the hardware showing up before then
being developer prototypes.

> but current USB just isn't fast enough for a lot of things...

Yes, it's way too slow.  The effective transfer rate is 6-10Mb/s, with
7Mb/s the typical real-life result.  That's slower than old Ethernet,
which was too slow for typical applications in 1994.

> However, the new standard may give a better price/performance
> ratio then gigabit ethernet or Myrinet, depending on how well
> IP-over-USB scales.
> 
> On a somewhat related note, what about FireWire?

Both USB 2.0 and Firewire have the same problem: the availability of
switches to build large system.  

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From nfuhriman at lnxi.com  Fri Mar  2 07:09:04 2001
From: nfuhriman at lnxi.com (nate fuhriman)
Date: Fri, 02 Mar 2001 08:09:04 -0700
Subject: [Fwd: 8 node cluster help!]
Message-ID: <3A9FB790.7020605@lnxi.com>

2 things to look at. First how large a chunk of data did you try to 
process. The first question in this FAQ
http://www.netlib.org/benchmark/hpl/faqs.html
talks about setting the chunk to large hence swapping  and lower numbers.

Second  the default 6.2 kernel does have some problems. You might want 
to try upgrading to a newer kernel.

Other than that I would need  more information about the setup of your 
cluster to be of any help. (i.e. hard drive type and size. Amount of 
memory and type. Network connection etc......)

Nate
nfuhriman at lnxi.com
Linux NetworX

-------- Original Message --------
Subject: 8 node cluster help!
Date: Fri, 02 Mar 2001 08:27:31
From: "mehul kochar" <makmorbi at hotmail.com>
To: beowulf at beowulf.org


Hello Guys,
I have an 8 node cluster and am running Redhat Linux 6.2. I have installed 
the HPLinpack benchmark and am running it. What kind of performance should I 
expect? The machines are Pentium III 550 MHz. I am getting a performance of 
0.79 GHz. That seems less. What can be the reasons? What should I do to 
improve my performance.
I would really appreciate it.
Thanks
Mehul
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010302/0bcec93f/attachment.html>

From jim at ks.uiuc.edu  Fri Mar  2 07:59:30 2001
From: jim at ks.uiuc.edu (Jim Phillips)
Date: Fri, 2 Mar 2001 09:59:30 -0600 (CST)
Subject: Questions and Sanity Check
In-Reply-To: <Pine.LNX.4.10.10103020925260.879-100000@vaio.greennet>
Message-ID: <Pine.GSO.4.10.10103020952540.9459-100000@verdun.ks.uiuc.edu>

On Fri, 2 Mar 2001, Donald Becker wrote:
> 
> The cached libraries on the slave nodes are 10-40MB uncompressed.
> That's on the order of 1 second of Fast Ethernet time to transfer the
> compressed version.  The boot time isn't a significant issue.
> 

Of course, if you reboot 64 nodes at once and they all try to download
from the front end node at the same time, then five seconds to download
one node turns into five minutes to start up the entire cluster.

However, I agree that this isn't really significant.  People who really
care can put gigabit on the master node and cut that down to 30 seconds.

-Jim


From becker at scyld.com  Fri Mar  2 08:26:40 2001
From: becker at scyld.com (Donald Becker)
Date: Fri, 2 Mar 2001 11:26:40 -0500 (EST)
Subject: Questions and Sanity Check
In-Reply-To: <Pine.GSO.4.10.10103020952540.9459-100000@verdun.ks.uiuc.edu>
Message-ID: <Pine.LNX.4.10.10103021113550.879-100000@vaio.greennet>

On Fri, 2 Mar 2001, Jim Phillips wrote:
> On Fri, 2 Mar 2001, Donald Becker wrote:
> > 
> > The cached libraries on the slave nodes are 10-40MB uncompressed.
> > That's on the order of 1 second of Fast Ethernet time to transfer the
> > compressed version.  The boot time isn't a significant issue.
> 
> Of course, if you reboot 64 nodes at once and they all try to download
> from the front end node at the same time, then five seconds to download
> one node turns into five minutes to start up the entire cluster.

The "initial ramdisk" (a slight misnomer) is compressed, typically 3:1.
It's transferred efficiently over TCP, not with slower NFS.
Two or three minutes to boot isn't very long compared to how long some
machines take to count 512MB of memory.  Does anyone have the number for
booting a 64 node SP/2?  I've heard some pretty horrible numbers.

> However, I agree that this isn't really significant.

A dynamic library caching system is interesting mostly for run-time
efficiency and to reduce system administration effort.  The reduction in
time to boot would mostly be useful for demos and benchmarks.


Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From hahn at coffee.psychology.mcmaster.ca  Fri Mar  2 08:41:54 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Fri, 2 Mar 2001 11:41:54 -0500 (EST)
Subject: Cluster via USB
In-Reply-To: <Pine.LNX.4.10.10103020957570.879-100000@vaio.greennet>
Message-ID: <Pine.LNX.4.10.10103021135590.6774-100000@coffee.psychology.mcmaster.ca>

> the current version.  Expect to see the first chipset implementations no
> sooner than a year from now, with the hardware showing up before then
> being developer prototypes.

PCI adapters, though, seem to be available RSN:
http://www.orangemicro.com/pr010501.html


From James.P.Lux at jpl.nasa.gov  Fri Mar  2 10:02:41 2001
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 2 Mar 2001 10:02:41 -0800
Subject: Fw: Cluster via USB
Message-ID: <001601c0a342$f9873290$61064f89@cerulean.jpl.nasa.gov>

-----Original Message-----
From: Jim Lux <James.P.Lux at jpl.nasa.gov>
To: Mark Hahn <hahn at coffee.psychology.mcmaster.ca>
Date: Friday, March 02, 2001 10:02 AM
Subject: Re: Cluster via USB


>The big problem with USB is that it is a "master/slave" kind of connection.
>The design model is of a PC connected to an array of peripherals.  In fact,
>if you look at USB cables, the two ends of the cable are different.
>
>There are USB "hubs" but they are really more of "distribution points"
where
>they have one slave in and several masters out.
>
>And, of course, USB is pretty darn slow in real life (fast compared to
>CDROMs, Connectix CCD cameras, mice, keyboards, and the like, but slow
>compared to ethernet (even 10 Mbps)).
>
>I don't know that USB 2.0 would greatly change the "one master/many slave"
>design model, inasmuch as it has to be USB 1.0 compatible.
>
>-----Original Message-----
>From: Mark Hahn <hahn at coffee.psychology.mcmaster.ca>
>To: beowulf at beowulf.org <beowulf at beowulf.org>
>Date: Friday, March 02, 2001 8:53 AM
>Subject: Re: Cluster via USB
>
>
>>> the current version.  Expect to see the first chipset implementations no
>>> sooner than a year from now, with the hardware showing up before then
>>> being developer prototypes.
>>
>>PCI adapters, though, seem to be available RSN:
>>http://www.orangemicro.com/pr010501.html
>>
>>
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
>>
>


From agrajag at linuxpower.org  Fri Mar  2 10:38:43 2001
From: agrajag at linuxpower.org (Jag)
Date: Fri, 2 Mar 2001 10:38:43 -0800
Subject: Questions and Sanity Check
In-Reply-To: <Pine.LNX.4.10.10103020925260.879-100000@vaio.greennet>; from becker@scyld.com on Fri, Mar 02, 2001 at 09:57:13AM -0500
References: <3A9EA750.CBF05560@linuxcare.com> <Pine.LNX.4.10.10103020925260.879-100000@vaio.greennet>
Message-ID: <20010302103843.B1547@kotako.analogself.com>

On Fri, 02 Mar 2001, Donald Becker wrote:

> A project that's on the "to do" list but not yet scheduled(*) is to
> dynamically adjust the shared library list.
> 
> The Scyld Beowulf system could be booted with just a few cached elements
> on the slaves, with frequently referenced libraries slowly added to the
> cached list.
> 
> The existing caching technique isn't limited to libraries.  A subtle
> aspect of the current ld.so design is that there is very little
> difference between a library and an executable.  Full programs, say
> a frequently-run 10MB simulation engine, could be cached on the slave
> nodes without changing the code.
> 
> It's a larger step extending that concept to a persistent disk-based
> cache.  We want to avoid that for philosophical reason: unless done
> carefully, it reintroduces the risk of version skew, and there is a
> slippery slope returning to the old full-node-install model.

I'm not sure hwo the caching of actual programs would work, but the
dynamic caching of libraries sounds like a really good idea, especially
for the people running diskless nodes.  This way they only need a very
small ram disk for the library caching, and if they run out of space on
the ramdisk, the caching system should helpfully be able to remove the
less used libraries in favor of the new ones.

However, I can see the full-node-install problem that you run into if
the slave nodes have a local hd for caching as they will have enough
space that they'll probablly never have to remove libraries to save
space.  A possible solution is to have them wipe the harddrives every
boot, however that's still similar to if you had a system where
everytime a slave node booted it dd'ed a full install image onto the hd.

There is another problem that I'm not really sure if its covered even by
the current method.  What happens when you update a cached library on
the master node?  Should you have to reboot the slave nodes try to clear
their cache, or run a program that simply recaches all the libraries on
the slave nodes?  or run a program that just updates the cache for the
libraries you specify? (this last one can be dangerous if a sysadmin
does rpm -Uhv libfoo.rpm but doesn't check to see if the rpm actually
had more than one library in it)


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010302/b93f13ec/attachment.sig>

From lindahl at conservativecomputer.com  Fri Mar  2 11:38:25 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Fri, 2 Mar 2001 14:38:25 -0500
Subject: Questions and Sanity Check
In-Reply-To: <20010302103843.B1547@kotako.analogself.com>; from agrajag@linuxpower.org on Fri, Mar 02, 2001 at 10:38:43AM -0800
References: <3A9EA750.CBF05560@linuxcare.com> <Pine.LNX.4.10.10103020925260.879-100000@vaio.greennet> <20010302103843.B1547@kotako.analogself.com>
Message-ID: <20010302143825.B1595@wumpus>

On Fri, Mar 02, 2001 at 10:38:43AM -0800, Jag wrote:

> I'm not sure hwo the caching of actual programs would work, but the
> dynamic caching of libraries sounds like a really good idea,

You can cache programs the same way as libraries. The trick is that
you need to know when a program has changed, and so if you're too
cheap to do a MD5 hash over it, you might go with just size and date,
which might lead you astry.

Legion caches binaries like that, but in the Legion case, you have to
explicitly tell the system "I'm giving you a binary for AlphaLinux and
x86Linux now". So Legion knows it hasn't changed without having to MD5
it. In Linux you can change the binary at any time without the system
noticing.

-- g


From kragen at pobox.com  Fri Mar  2 11:44:04 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Fri, 2 Mar 2001 14:44:04 -0500 (EST)
Subject: Channel bonding / HP switch question
Message-ID: <200103021944.OAA23815@kirk.dnaco.net>

R C <zarquon at zarq.dhs.org> writes:
> The problem seems to be that the HP switch we are using, a Procurve 4000M,
> does not support duplicate MAC addresses, even on separate VLANs.
> . . .
> So my question is, how difficult would it be to change channel bonding
> to function with multiple MAC addresses?  Has anyone looked into this
> before?  I realize this vastly increases the configuration required for
> channel bonding (MAC address/IP/Interface mapping).  

I don't know, but I'm willing to bet it would cost you more than
buying a second switch for the second channel.


From kragen at pobox.com  Fri Mar  2 11:44:11 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Fri, 2 Mar 2001 14:44:11 -0500 (EST)
Subject: 8 node cluster help!
Message-ID: <200103021944.OAA23844@kirk.dnaco.net>

"mehul kochar" <makmorbi at hotmail.com> writes:
> I have an 8 node cluster and am running Redhat Linux 6.2. I have installed 
> the HPLinpack benchmark and am running it. What kind of performance should I 
> expect? The machines are Pentium III 550 MHz. I am getting a performance of 
> 0.79 GHz. That seems less. What can be the reasons? What should I do to 
> improve my performance.

What do you mean by '0.79 GHz'?


From hahn at coffee.psychology.mcmaster.ca  Sat Mar  3 11:07:00 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Sat, 3 Mar 2001 14:07:00 -0500 (EST)
Subject: Fw: Cluster via USB
In-Reply-To: <001601c0a342$f9873290$61064f89@cerulean.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.10.10103031402310.11097-100000@coffee.psychology.mcmaster.ca>

On Fri, 2 Mar 2001, Jim Lux wrote:
>The big problem with USB is that it is a "master/slave" kind of connection.

as I understand it, the direct-connect devices that exist now for ptp usb
networking have a small chunk of memory that both masters can access.  I
don't see any reason that a USB2  device with a pair of significant-sized fifos 
wouldn't work fine and remain cheap.  these days, quite large amounts of ram
(32M or so) can be put on a single chip.


From rjones at merl.com  Sat Mar  3 11:08:41 2001
From: rjones at merl.com (Ray Jones)
Date: 03 Mar 2001 14:08:41 -0500
Subject: Size of pipe from beowulf to world
Message-ID: <1d3dcullfq.fsf@jitter.merl.com>

Thanks for everyone's advice on our planned Beowulf (under the thread,
"Questions and Sanity Check").

I have another question that came up yesterday, as we were hashing out
more details: how large a pipe should there be from the Beowulf to the
outside world?  We were planning on having a single 100bT connection
from the head node out to the world, and 100bT from the head to the
switch, but realized that this could easily end up as a choke point.

We could replace one of the 16-port modules in our planned switch with
a 2-port Gbit ethernet module (one to the head, one to a fileserver,
most likely), but if that's not necessary, we'd rather not lose the
compute power.

Thanks for any advice,
Ray Jones
MERL


From becker at scyld.com  Sat Mar  3 15:52:44 2001
From: becker at scyld.com (Donald Becker)
Date: Sat, 3 Mar 2001 18:52:44 -0500 (EST)
Subject: Fw: Cluster via USB
In-Reply-To: <Pine.LNX.4.10.10103031402310.11097-100000@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.10.10103031844200.879-100000@vaio.greennet>

On Sat, 3 Mar 2001, Mark Hahn wrote:

> On Fri, 2 Mar 2001, Jim Lux wrote:
> >The big problem with USB is that it is a "master/slave" kind of connection.
> 
> as I understand it, the direct-connect devices that exist now for ptp usb
> networking have a small chunk of memory that both masters can access.

That's not interface they present.  They have a Tx stream and an Rx
stream.  A common way to use them is to have the driver packetize the
data and make them look like a point-to-point Ethernet connection.

> don't see any reason that a USB2  device with a pair of significant-sized fifos 
> wouldn't work fine and remain cheap.  these days, quite large amounts of ram
> (32M or so) can be put on a single chip.

USB 2.0 will be high overhead and only about twice the speed of Fast
Ethernet.  We haven't seen USB 2.0 controllers, but USB v1 controllers
use the PCI bus inefficiently, especially in "bandwidth reclaimation"
mode.

I don't the market force that will cause USB 2.0 switches to appear, let
alone grow to support 500 (or even 16) devices.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From mkochar at utsi.edu  Thu Mar  1 09:05:29 2001
From: mkochar at utsi.edu (Mehul Kochar)
Date: Thu, 1 Mar 2001 11:05:29 -0600 (CST)
Subject: 8 node cluster help!
Message-ID: <Pine.GSO.4.10.10103011059550.22319-100000@ent3500>

Can somebody help me with this? I have an 8 node cluster and have run the
Linpack benchmark. I want to know if somebody else has an 8 node cluster
with Pentium III 550 MHz and what kind of results they got and what were
the expectations? What could have been the reasons for the cluster not
meeting the expectations?

I would really appreciate it.

Thanks
Mehul Kochar
mkochar at utsi.edu


From hack at nt-nv.com  Sat Mar  3 14:11:01 2001
From: hack at nt-nv.com (hack at nt-nv.com)
Date: Sat, 3 Mar 2001 18:11:01 -0400
Subject: Scyld and SCI
Message-ID: <3AA16102.228B1C69@nt-nv.com>

Does anyone know if Scyld will work with Dolphin's ICS SCI adapters?  Is
it necessary to purchase the Wulfkit to make this work?  If so, how easy
is it to integrate Scali's SSP with Scyld?

Thanks,
Brian.


From squinlan at modulargenetics.com  Sat Mar  3 11:03:51 2001
From: squinlan at modulargenetics.com (root)
Date: Sat, 03 Mar 2001 14:03:51 -0500
Subject: 3ware Escalade problems during Scyld instal
Message-ID: <3AA14017.706308F@modulargenetics.com>

Hopefully someone has already resolved this issue. When installing Scyld
from cd in expert mode, I enter the driver disk as usualWhen trying to
install, I enter expert node to add the driver for the raid controller,
using the driver disk provided by 3ware. I have succesfully installed
RedHat 7 on two other machines with the 3ware controllers in this
manner. When I go to add device, scsi, the 3ware controller is there at
the top of the list. When I select it, it appears to start reading the
floppy for the drivers as normal, but seems to finish to fast. After, it
does not get added to the list of drivers included, and instalation
fails because it can not find a drive to install to.

I am installing v2 from a cd purchased through a link on Scyld's
website. The cd arrived only a couple weeks ago. It is the 3ware
Escalade 6200 (two chanel) RAID controller with two 10g drives using
raid 1 (mirror). The system is a 1.2g Athalon on a microstar motherboard
with 512m ram.
TIA!

Sean Quinlan


From leo.magallon at grantgeo.com  Fri Mar  2 23:33:43 2001
From: leo.magallon at grantgeo.com (Leonardo Magallon)
Date: Sat, 03 Mar 2001 01:33:43 -0600
Subject: Tale of 3 Intel 510T switches  and the network that wouldn't work 
 without crc and alignment errors.
References: <200103021944.OAA23815@kirk.dnaco.net>
Message-ID: <3AA09E57.3EB308F7@grantgeo.com>

Greetings to all beowulfers,


  My last week hasn't been fun at all.   Upon preparation to add 29 new boxes or 58 new processors to our cluster
I had to move our present cluster to give way to shelves (Thanks a lot to all of those who gave me their comments
about the shelves) and relocation of our Oracle systems, I encountered problems with our networking generating crc
and code alignment errors  that did actually take a long while to figure out.
   First of all we are using 3 Intel 510T switches that are connected together in a stack.   We have 8 VA Linux
dual 440BX based computers and 16 white boxes (all 550MHZ) I put together with Gigabyte motherboards and Intel
EtherExpress cards.
    The problem came up when I moved the current group of computers to the opposite wall of the computer room.
For some reason or another all of the white boxes started hitting errors when I tried to move any file across
nfs.  So i tried ftp. It didn't fix it.   I then switched cat5 cables, no luck.   I then moved the cables to
another switch. Still no luck.   I then said.  Well, it's not the cable, its' not the protocol.  It's the card.  I
then placed a 3com card in the box and took my Intel one out.   It didn't work either.
   After that I said well, then maybe my driver needs to be updated.  We are running kernel 2.2.14-6.0.1smp.  I
then upgraded to 2.2.18 and it still didn't work.  So I then went to scyld.com and downloaded the latest drivers
for the Intel card.  By the way, I did put that card back in the box because I knew it wasn't the card..
  I generated the rpm and installed it.  After a reboot the same errors where there.  Hmmmmm.  So I said , hey I
going to force this.   I downloaded the .c drivers, compiled them and inserted them as a module.  After another
reboot, nothing.
   So I then new:
  1)  It is not the kernel.
  2) It is not the driver
  3)  It is not the cable
  4) The problem is with the white boxes ( The VA Linux where in a rack and did not have any errors )
   5)  It is not the NIC.
   6)  It is not the Switch because if I move one of the cables from one of the VA Linux computers to a port that
I know is taking errors, it works just fine.

  To make this LONG story short,  I then thought: " What do this white boxes have in common that is not common
among the VA Linux Computers?".
   I looked and the only thing I could find was that they shared the same power supply.     It was a Toshiba 1400
Se Series giving 2.3Kv.    Not all 16 clones were connected to this single power supply.  So I changed it with one
of the new Toshiba 3000Net I had purchased for the new computers.
    And guess what?   It worked!
  Apparently that power supply was generating noise on the power line that was propagating to the NICs on the
computers  and/or continuing on to the switch.

    Not to mention that before this I moved the rack to another place thinking that there was some kind of RF
interfering with proper functions of the switches.  We even brought an RF meter that did let us know that all
power supplies generate  a big non-oscillating RF field that spans between 6 and 12MHZ but it is stronger at 8
MHZ.  Go figure.   I had placed calls to Intel in three ocassions and finally with Jeff ( I actually didn't get
his last name but if someone asks I can call him back and get it from him; he gave me his direct line) from Intel
support were able to deduce the common thing among all the clones(or drones as we call them -- The Borg ring a
bell?)  was that power supply.

   Sorry about the long email but I thought that this would help anyone in the future that would probably run into
the same kind of problems that I faced this week.


   All is good now,


Regards,

Leo Magallon
Grant Geophysical Inc.
Houston, Texas.


From bgbruce at it-curacao.com  Sat Mar  3 04:49:11 2001
From: bgbruce at it-curacao.com (B.G. Bruce)
Date: Sat, 3 Mar 2001 08:49:11 -0400
Subject: Scyld and SCI
Message-ID: <01030308535706.10978@core.it-curacao.com>

Does anyone know if Scyld will work with Dolphin's ICS SCI adapters?  Do you
have to purchase the Wulfkit to make this work, if so, will Scali's SSP
integrate with Scyld?  and how many hoops do you have to jump through to make
this happen?

Thanks,
Brian.


From Robert.Land at t-online.de  Fri Mar  2 09:06:54 2001
From: Robert.Land at t-online.de (robert_wilhelm_land)
Date: Fri, 02 Mar 2001 18:06:54 +0100
Subject: CAD applications ?
References: <200102271356_MC2-C704-2656@compuserve.com>
Message-ID: <3A9FD32E.E0FE35F3@csi.com>

"Schilling, Richard" wrote:
> Not sure if any exist, but one of the things I've started looking at is
> porting Aero, a cad/modeling application for X windows to be a Beowulf
> application.
> 
> Richard Schilling


Are you talking about http://www.aero-simulation.de/?

This doesn't look as a CAD system to me.


Robert


From bogdant at hercules.ro  Fri Mar  2 08:51:33 2001
From: bogdant at hercules.ro (Bogdan Taru)
Date: Fri, 2 Mar 2001 18:51:33 +0200 (EET)
Subject: parallelcrunchers.net (fwd)
Message-ID: <Pine.LNX.4.21.0103021849430.2302-100000@main.hercules.ro>

	Hi, everyone,

  I'm sorry for the broke link. I forgot an "l"...

 So, you can find the site at http://www.parallelcrunchers.net

 Enjoy,
 bogdan


---------- Forwarded message ----------
Date: Tue, 27 Feb 2001 18:48:43 +0200 (EET)
From: Bogdan Taru <bogdant at hercules.ro>
To: beowulf at beowulf.org
Cc: mosix-list at cs.huji.ac.il
Subject: parallelcrunchers.net


	Hi everyone,

 I've built a Beowulf portal site at http://www.parallecrunchers.net

 Please come visit & enjoy,
 bogdan


From mail at thomas-boehme.de  Sun Mar  4 05:58:20 2001
From: mail at thomas-boehme.de (Thomas R Boehme)
Date: Sun, 4 Mar 2001 08:58:20 -0500 
Subject: Reaccuring mails
Message-ID: <37E1E2BB9C28D311AB390008C707D2A60C59BD83@nycexis01.mi8.com>

Hi,

is there something wrong with the mailing list processor?
Some older mails seem to be sent to the list more than once.
Could you please fix whatever causes the problem?

From ole at scali.no  Mon Mar  5 01:53:50 2001
From: ole at scali.no (Ole W. Saastad)
Date: Mon, 05 Mar 2001 10:53:50 +0100
Subject: myrinet vs gigabit
Message-ID: <3AA3622E.8F7A487C@scali.no>

> Woo Chat Ming wrote:
> 
> >  I am going to set up a 100-nodes beowulf cluster
> > to do scientific simulation. Is Gigabit ethernet or
> > Myrinet better ? Does anyone has performance comparation
> > of them ?

When making judgements about what interconnect to make, the interesting
parameters depend on your appication requirements. Are your applications
communication intensive, and if they are, what is the mix between long
and short messages, the amount of collective operations and simultaneous
traffic etc. etc.

These are all important parameters and they vary with the appplication
and
the way the algorithms are designed. If you do not know enough of these
parameters and their importance for your system, a sound way to act is
to optimize bandwidth and minimize latency to build a system that
performs
well for most possible applications.

Then there are other arguments like fault scalability (systems with
switches do not scale well above certain limits due to the complexity
of cross-bar switches) fault resilience (systems with switches have a
single point of failure) and bisection bandwidth.

It is also important to point out that the performance numbers that you
find for the different networks are not necessarily true for all
combinations of processors and PCI bridge interface chip-sets. The
performance of the bridge chip set can vary significantly and you have
to select the right combination to get the optimum performance from your
interconnect.

Last but not least, what kind of software is available for the network
and cluster management and what kind of knowledge and experience do
you have to put it all together and to tune it to work well with your
combination of hardware and software.


-- 
Ole W. Saastad, Dr.Scient.    |      Scali AS        | Scalable Linux
Systems
mailto:ole at scali.no           | http://www.scali.com |       subscribe
to our
Tel:+47 22 62 89 68 (dir)     |  P.O.Box 70 Bogerud  |       mailing
lists at
Tel:+47 22 95 21 45 (home)    |   0621 Oslo NORWAY   | 
www.scali.com/support


From rajkumar at csse.monash.edu.au  Mon Mar  5 02:45:04 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Mon, 05 Mar 2001 21:45:04 +1100
Subject: CCGrid 2001 Advance Program & Call for Participation
Message-ID: <3AA36E30.B3380AA9@csse.monash.edu.au>

Dear Friends,

Please find enclosed advance program and call for participation for the: 
 CCGrid 2001: First ACM/IEEE International Symposium on Cluster Computing & the Grid

to be held in Brisbane, Australia (15-18 May 2001). We would like to take this
opportunity to invite you to participate in this upcoming meeting. The highlights
of the conference program is enclosed for your kind consideration. The deadline for
early registration is: 31 March, 2001. 

The conference also hosts poster and research exhibition sessions and the submissions 
for such poster papers is still open. 

Please forward the enclosed Call for Participation to interested colleagues.  

We are looking forward to welcome and see you in Brisbane!

Thank you very much.

Sincerely Yours,

CCGrid 2001 Team
htt://www.ccgrid.org
-------------------------------------------------------------------------------------

########################################################################
#                                                                      #
#    ###   ###   ####  #####  ###  ####     ####    ###    ###   ##    #
#   #     #     #      #   #   #   #   #        #  #   #  #   #   #    #
#   #     #     #  ##  ####    #   #    #     #    #   #  #   #   #    #
#   #     #     #   #  #   #   #   #   #    #      #   #  #   #   #    #
#    ###   ###   ####  #   #  ###  ####     #####   ###    ###   ###   #
#                                                                      #
########################################################################

First ACM/IEEE International Symposium on Cluster Computing & the Grid
                         (CCGrid 2001)

           http://www.ccgrid.org | www.ccgrid2001.qut.edu.au

      15-18 May 2001, Rydges Hotel, South Bank, Brisbane, Australia

                   
                    CALL FOR PARTICIPATION
                    ----------------------

             *** Early bird registration 31 March ***


Keynotes
********

  * Ian Foster
    The Anatomy of the Grid: Enabling Scalable Virtual Organizations

  * Andrzej Goscinski
    Making Parallel Processing on Clusters Efficient, Transparent and Easy
    for Programmers

  * Domenico Laforenza
    Programming High Performance Applications in Grid Environments

  * Bruce Maggs
    Challenges in Scalable Content Distribution over the Internet

  * Satoshi Matsuoka
    Grid RPC meets Data Grid: Network Enabled Services for Data Farming on
    the Grid

  * Greg Pfister
    The Promise of InfiniBand for Cluster Computing

Invited Plenary Talks
*********************

  * The World Wide Computer: Prospects for Parallel and Distributed Computing on the Web
    Gul A. Agha, University of Illinois, Urbana-Champaign (UIUC)

  * Terraforming Cyberspace
    Jeffrey M. Bradshaw, University of West Florida

Industry Plenary Talks
**********************

  * High Performance Computing at Intel: The OSCAR software solution stack for cluster computing
    Tim Mattson, Intel Corporation, USA

  * MPI/FT: Architecture and Taxonomies for Fault-Tolerant, Massage-Passing Middleware for
    Performance-Portable Parallel Computing
    Tony Skjellum, MPISoft Technology, USA

  * Effective Internet Grid Computing for Industrial Users    
    Ming Xu, Platform Corporation, Canada

  * Sun Grid Engine: Towards Creating a Compute Power Grid
    Wolfgang Gentzsch, Sun Microsystems, USA

FREE Tutorials  
**************

  * The Globus Toolkit for Grid Computing
    Ian Foster, Argonne National Laboratory, USA

  * An Introduction to OpenMP
    Tim Mattson, Intel Corporation, USA

Panel
*****

  * The Grid: Moving it to Prime Time
    Moderator: David Abramson, Monash University, Australia.

Symposium Mainstream Sessions
*****************************
(Features 45 papers selected out of 126 submissions by peer review)

  * Component and Agent Approaches
  * Distributed Shared Memory
  * Grid Computing
  * Input/Output and Databases
  * Message Passing and Communication
  * Performance Evaluation
  * Scheduling and Load balancing
  * Tools for Management, Monitoring and Debugging

Workshops
*********
(Features 37 peer-reviewed papers selected by workshop organisers)

  * Agent based Cluster and Grid Computing
  * Cluster Computing Education
  * Distributed Shared Memory on Clusters
  * Global Computing on Personal Devices
  * Internet QoS for Global Computing
  * Object & Component Technologies for Cluster Computing
  * Scheduling and Load Balancing on Clusters

Important Dates
***************

  * Early bird registration            31 March
    (register online, check out web site)
  * Tutorials & workshops              15 May
  * Symposium main stream & workshops  16-18 May

Call for Poster/Research Exhibits:
**********************************

Those interested in exhibiting poster papers, please contact Poster Chair
Hai Jin (hjin at hust.edu.cn) or browse conference website for details.

Sponsors
********

  * IEEE Computer Society (www.computer.org)
  * IEEE Task Force on Cluster Computing (www.ieeetfcc.org)
  * Association for Computing Machinery  (ACM) and SIGARCH (www.acm.org)
  * IEEE Technical Committee on Parallel Processing (TCPP)
  * Queensland Uni. of Technology (QUT), Australia (www.qut.edu.au)
  * Platform Computing, Canada (www.platform.com)
  * Australian Partnership for Advanced Computing (APAC) (www.apac.edu.au)
  * Society for Industrial and Applied Mathematics (SIAM, USA) (www.siam.org)
  * MPI Software Technology Inc., USA (www.mpi-softtech.com)
  * International Business Machines (IBM) (www.ibm.com)
  * Akamai Technologies, Inc., USA (www.akamai.com)
  * Sun Microsystems, USA (www.sun.com)
  * Intel Corporation, USA (www.intel.com)

Further Information
*******************

Please browse the symposium web site:
  http://www.ccgrid.org | www.ccgrid2001.qut.edu.au

For specific clarifications, please contact one of the following:
 Conference Chairs: R. Buyya (rajkumar at buyya.com) or G. Mohay (mohay at fit.qut.edu.au)
 PC Chair: Paul Roe (ccgrid2001 at qut.edu.au)
------------------------------------------------------------------------------------


From john.hearns at framestore.co.uk  Mon Mar  5 03:33:50 2001
From: john.hearns at framestore.co.uk (John Hearns)
Date: Mon, 05 Mar 2001 11:33:50 +0000
Subject: Global Grid Forum webcast
Message-ID: <3AA3799E.83A16CB8@framestore.co.uk>

For those people interested in grid computing,
the Global Grid Forum in the Netherlands is being webcast

http://www.globalgridforum.nl


John Hearns


From jared_hodge at iat.utexas.edu  Mon Mar  5 05:09:33 2001
From: jared_hodge at iat.utexas.edu (Jared Hodge)
Date: Mon, 05 Mar 2001 07:09:33 -0600
Subject: Size of pipe from beowulf to world
References: <1d3dcullfq.fsf@jitter.merl.com>
Message-ID: <3AA3900D.BB7B07B9@iat.utexas.edu>

I would say the biggest choke point you have to worry about it from your
server to the nodes.  Assuming you'll launch all of your jobs from the
server (where you compiled them), it needs to have a pretty good link to
send the executables out to the nodes, update their information, and
collect the data.  As always, this depends on the applications you'll be
running, their communications overhead, and weather they're synchronous
or asynchronous (asynchronous don't all ask for information at the same
time usually).  A single 100bT connection to the world has been plenty
for us.  You just want to have a way for developers to log onto the
system and write their code their, or ftp file in, etc.

Good luck.

Ray Jones wrote:
> 
> Thanks for everyone's advice on our planned Beowulf (under the thread,
> "Questions and Sanity Check").
> 
> I have another question that came up yesterday, as we were hashing out
> more details: how large a pipe should there be from the Beowulf to the
> outside world?  We were planning on having a single 100bT connection
> from the head node out to the world, and 100bT from the head to the
> switch, but realized that this could easily end up as a choke point.
> 
> We could replace one of the 16-port modules in our planned switch with
> a 2-port Gbit ethernet module (one to the head, one to a fileserver,
> most likely), but if that's not necessary, we'd rather not lose the
> compute power.
> 
> Thanks for any advice,
> Ray Jones
> MERL
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Jared Hodge
Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
FAX: 512-471-9096
Email: Jared_Hodge at iat.utexas.edu


From nfuhriman at lnxi.com  Mon Mar  5 07:53:20 2001
From: nfuhriman at lnxi.com (nate fuhriman)
Date: Mon, 05 Mar 2001 08:53:20 -0700
Subject: [Fwd: 8 node cluster help!]
References: <F102davN3qljWwY0Vnk00003b2b@hotmail.com>
Message-ID: <3AA3B670.1060703@lnxi.com>

This is the result with a data size of 4000. 8000 crashed the machine. 
Remember this is with HEAVY swapping because it was on a single machine. 
(hd light was constant)

============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
W00L2L2         4000     1     2     2           52143.23          8.187e-04
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0335992 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0310282 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0070523 ...... PASSED
============================================================================  

Nate

mehul kochar wrote:

> Thanks Nate. The GHz was a typo error on my part . I apologize for it. 
> It  should be GFlops.
> Regarding your cluster results you should run them with a problem size 
> of  8000 or according to your machine. The HPL tuning talks more about 
> it. I can  compare it then.
> 
> Thanks
> Mehul


From Dean.Carpenter at pharma.com  Mon Mar  5 08:53:47 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Mon, 5 Mar 2001 11:53:47 -0500 
Subject: Typical hardware
Message-ID: <759FC8B57540D311B14E00902727A0C002EC475C@a1mbx01.pharma.com>

We're just now beginning to mess around with clustering - initial
proof-of-concept for the local code and so on.  So far so good, using spare
equipment we have lying around, or on eval.

Next step is to use some "real" hardware, so we can get a sense of the
throughput benefit.  For example, right now it's a mishmosh of hardware
running on a 3Com Switch 1000, 100m to the head node, and 10m to the slaves.
The throughput one will be with 100m switched all around, possibly with a
gig uplink to the head node.

Based on this, we hunt for money for the production cluster(s) ...

What hardware are people using ?  I've done a lot of poking around at the
various clusters linked to off beowulf.org, and seen mainly two types :

1.  Commodity white boxes, perhaps commercial ones - typical desktop type
cases.  These take up a chunk of real estate, and give no more than 2 cpus
per box.  Lots of power supplies, shelf space, noise, space etc etc.

2.  1U or 2U rackmount boxes.  Better space utilization, still 2 cpus per
box, but costing a whole lot more $$$.

We, like most out there I'm sure, are constrained, by money and by space.
We need to get lots of cpus in as small a space as possible.  Lots of 1U
VA-Linux or SGI boxes would be very cool, but would drain the coffers way
too quickly.  Generic motherboards in clone cases is cheap, but takes up too
much room.

So, a colleague and I are working on a cheap and high-density 1U node.  So
far it looks like we'll be able to get two dual-CPU (P3) motherboards per 1U
chassis, with associated dual-10/100, floppy, CD and one hard drive.  And
one PCI slot.  Although it would be nice to have several Ultra160 scsi
drives in raid, a generic cluster node (for our uses) will work fine with a
single large UDMA-100 ide drive.

That's 240 cpus per 60U rack.  We're still working on condensed power for
the rack, to simplify things.  Note that I said "for our uses" above.  Our
design goals here are density and $$$.  Hence some of the niceties are being
foresworn - things like hot-swap U160 scsi raid drives, das blinken lights
up front, etc.

So, what do you think ?  If there's interest, I'll keep you posted on our
progress.  If there's LOTS of interest, we may make a larger production run
to make these available to others.

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


From john.ziriax at navy.brooks.af.mil  Mon Mar  5 09:21:01 2001
From: john.ziriax at navy.brooks.af.mil (Ziriax John M Civ NHRC/DET)
Date: Mon, 5 Mar 2001 11:21:01 -0600 
Subject: Eth0 Errors that don't stop MPICH program
Message-ID: <F7C1C34F3444D3118BE0009027463B570344D608@nugget.brooks.af.mil>

We cranked up a small Beowulf cluster consisting of 10 200 MHz PCs.
Each node has one SMB eznet-10/100 NICs connected to a 24 port SMC
10/100 switch.

Mandrake 7.1 is installed on the slaves and Red Hat 6.2 on the master.
The Master is exporting /home/export and /usr/local to the slaves.
MPICH 1.2.0 is installed on /usr/local.

The system was running a problem over the weekend which appears
to be functioning just fine. The code was conpiled with Lahey/Fujitsu 
Fortran 95 Linux Express version 5.5.

However, on at least 2 of the slave nodes (the others lack monitors)
there was a series of errror messages which repeated except for the 
number at the end of the line:


eth0: RTL8139 Interrupt Line Blocked Status  1

The series of numbers associated with the messages was 1,5,5,4,4,4,4,4,1,4.

Any clues what this might mean? The program is still running.

Thanks

John


From canon at nersc.gov  Mon Mar  5 09:23:35 2001
From: canon at nersc.gov (Shane Canon)
Date: Mon, 05 Mar 2001 09:23:35 -0800
Subject: Typical hardware 
In-Reply-To: Message from "Carpenter, Dean" <Dean.Carpenter@pharma.com> 
   of "Mon, 05 Mar 2001 11:53:47 EST." <759FC8B57540D311B14E00902727A0C002EC475C@a1mbx01.pharma.com> 
Message-ID: <200103051723.JAA11761@pookie.nersc.gov>

Dean,

I would nix the CD-ROM drives unless there is some real need
for it (beowulf mp3 ripper).

Don't forget some type of serial console and real estate for it.

I haven't seen a case designed to accommodate two motherboards.
Is this a commercial product?  Or are you doing a back and
front side arrangement?

--Shane Canon


Dean.Carpenter at pharma.com said:
> We, like most out there I'm sure, are constrained, by money and by
> space. We need to get lots of cpus in as small a space as possible.
> Lots of 1U VA-Linux or SGI boxes would be very cool, but would drain
> the coffers way too quickly.  Generic motherboards in clone cases is
> cheap, but takes up too much room.

> So, a colleague and I are working on a cheap and high-density 1U node.
>  So far it looks like we'll be able to get two dual-CPU (P3)
> motherboards per 1U chassis, with associated dual-10/100, floppy, CD
> and one hard drive.  And one PCI slot.  Although it would be nice to
> have several Ultra160 scsi drives in raid, a generic cluster node (for
> our uses) will work fine with a single large UDMA-100 ide drive.

> That's 240 cpus per 60U rack.  We're still working on condensed power
> for the rack, to simplify things.  Note that I said "for our uses"
> above.  Our design goals here are density and $$$.  Hence some of the
> niceties are being foresworn - things like hot-swap U160 scsi raid
> drives, das blinken lights up front, etc.

> So, what do you think ?  If there's interest, I'll keep you posted on
> our progress.  If there's LOTS of interest, we may make a larger
> production run to make these available to others.

> -- Dean Carpenter deano at areyes.com dean.carpenter at pharma.com
> dean.carpenter at purduepharma.com 94TT :)


-- 
------------------------------------------------------------------------
Shane Canon                             voice: 510-486-6981
National Energy Research Scientific     fax:   510-486-7520
  Computing Center                       
1 Cyclotron Road Mailstop 50D-106       
Berkeley, CA 94720                      canon at nersc.gov
------------------------------------------------------------------------


From lindahl at conservativecomputer.com  Mon Mar  5 09:33:15 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 5 Mar 2001 12:33:15 -0500
Subject: Typical hardware
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC475C@a1mbx01.pharma.com>; from Dean.Carpenter@pharma.com on Mon, Mar 05, 2001 at 11:53:47AM -0500
References: <759FC8B57540D311B14E00902727A0C002EC475C@a1mbx01.pharma.com>
Message-ID: <20010305123315.A3298@wumpus>

On Mon, Mar 05, 2001 at 11:53:47AM -0500, Carpenter, Dean wrote:

> We, like most out there I'm sure, are constrained, by money and by space.
> We need to get lots of cpus in as small a space as possible.  Lots of 1U
> VA-Linux or SGI boxes would be very cool, but would drain the coffers way
> too quickly.

Before you rush out to build your own 1U box, you might want to check
out vendors other than VA and SGI. Racksaver, Rackable, I'm sure there
are others. Even if you can build it cheaper than the rest, you might
also want to consider the risk: what do you do when all your cpus die
due to overheating?

-- g


From Dean.Carpenter at pharma.com  Mon Mar  5 09:55:41 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Mon, 5 Mar 2001 12:55:41 -0500 
Subject: Typical hardware 
Message-ID: <759FC8B57540D311B14E00902727A0C002EC475D@a1mbx01.pharma.com>

I tend to agree about the CDs, but we'll leave room for them if needed.
There's not enough room to be able to squeeze in a 3rd motherboard, so would
just be wasted real estate in the chassis.

There are no commercial products out there (that I've been able to find that
is) that will do this.  Essentially this will be an open 1U case, holding
two motherboards, drives, floppies and power supplies.  The plan is to use a
closed rack, with plenty of fan airflow through the entire rack to handle
the cooling.

All the connections will be to a custom rear panel with pigtails to the
actual motherboards.

Any electrical engineers out there ?  Would like to find a mongo power
supply with up to 120 individual ATX connectors on it.  This would go in the
bottom of the rack a-la UPS style, one connector per motherboard.  Be nice
if each one was individually powerable too :)

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


-----Original Message-----
From: Shane Canon [mailto:canon at nersc.gov]
Sent: Monday, March 05, 2001 12:24 PM
To: Carpenter, Dean
Cc: beowulf at beowulf.org; 'gaborb at athomepc.net'
Subject: Re: Typical hardware 


Dean,

I would nix the CD-ROM drives unless there is some real need
for it (beowulf mp3 ripper).

Don't forget some type of serial console and real estate for it.

I haven't seen a case designed to accommodate two motherboards.
Is this a commercial product?  Or are you doing a back and
front side arrangement?

--Shane Canon


From becker at scyld.com  Mon Mar  5 09:58:56 2001
From: becker at scyld.com (Donald Becker)
Date: Mon, 5 Mar 2001 12:58:56 -0500 (EST)
Subject: Eth0 Errors that don't stop MPICH program
In-Reply-To: <F7C1C34F3444D3118BE0009027463B570344D608@nugget.brooks.af.mil>
Message-ID: <Pine.LNX.4.10.10103051235530.879-100000@vaio.greennet>

On Mon, 5 Mar 2001, Ziriax John M Civ NHRC/DET wrote:

> We cranked up a small Beowulf cluster consisting of 10 200 MHz PCs.
> Each node has one SMB eznet-10/100 NICs connected to a 24 port SMC
> 10/100 switch.
...
> However, on at least 2 of the slave nodes (the others lack monitors)
> there was a series of errror messages which repeated except for the 
> number at the end of the line:
> eth0: RTL8139 Interrupt Line Blocked Status  1

What version of the driver were you using?
Were these SMP systems?

This is a sanity check intended to detect physically blocked interrupt
lines.  Blocked interrupt lines are usually caused by a bug in the Linux
APIC handling code that permanently disables the interrupt line.  The
check may be falsely triggered by certain questionable but still valid
SMP options that temporarily block interrupts from being handled.


Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From lindahl at conservativecomputer.com  Mon Mar  5 10:01:40 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 5 Mar 2001 13:01:40 -0500
Subject: Typical hardware
In-Reply-To: <200103051723.JAA11761@pookie.nersc.gov>; from canon@nersc.gov on Mon, Mar 05, 2001 at 09:23:35AM -0800
References: <Dean.Carpenter@pharma.com> <200103051723.JAA11761@pookie.nersc.gov>
Message-ID: <20010305130140.C3298@wumpus>

On Mon, Mar 05, 2001 at 09:23:35AM -0800, Shane Canon wrote:

> I haven't seen a case designed to accommodate two motherboards.
> Is this a commercial product?  Or are you doing a back and
> front side arrangement?

rackable.com sells 1/2 depth 1U systems. racksaver.com has one 1U case
that holds 2 2-cpu motherboards.

-- g


From JParker at coinstar.com  Mon Mar  5 10:05:55 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Mon, 5 Mar 2001 10:05:55 -0800
Subject: Typical hardware
Message-ID: <OF42ED621F.1A156951-ON08256A06.0061EEBB@coinstar.com>

G'Day !

I work for a vending machine type company.  Our solution was too build aluminum
cases out of sheet metal (13" w x 6-1/2" h x 11" d).  No thrills (no floppy, cd,
etc), just bare minimum of what you need (size was determined so we can use
generic expansion boards).  Still have to deal with a power supply for each
node, but is is reasonably compact package (hd screwed into sheet metal wall).
It is also easy to build with ~.100" aluminum sheet, a bender and rivets.  An
added benefit is that you can standardize on screw sizes and threads so you
spare part kits becomes very minimal.

You can also build larger cases to handle multiple harddrives etc.  Our latest
version has a cd.  We increased the height slightly and put a screwed a shelf
in.  It would very easy to change dimensions for any hardware configuration you
can think of.

Think home built rack-mount cases ....

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


                    "Carpenter, Dean"                                                                                                 
                    <Dean.Carpenter at p        To:     beowulf at beowulf.org                                                              
                    harma.com>               cc:     "'gaborb at athomepc.net'" <gaborb at athomepc.net>                                    
                    Sent by:                 Subject:     Typical hardware                                                            
                    beowulf-admin at beo                                                                                                 
                    wulf.org                                                                                                          
                                                                                                                                      
                                                                                                                                      
                    03/05/01 08:53 AM                                                                                                 
                                                                                                                                      
                                                                                                                                      
We're just now beginning to mess around with clustering - initial
proof-of-concept for the local code and so on.  So far so good, using spare
equipment we have lying around, or on eval.

Next step is to use some "real" hardware, so we can get a sense of the
throughput benefit.  For example, right now it's a mishmosh of hardware
running on a 3Com Switch 1000, 100m to the head node, and 10m to the slaves.
The throughput one will be with 100m switched all around, possibly with a
gig uplink to the head node.

Based on this, we hunt for money for the production cluster(s) ...

What hardware are people using ?  I've done a lot of poking around at the
various clusters linked to off beowulf.org, and seen mainly two types :

1.  Commodity white boxes, perhaps commercial ones - typical desktop type
cases.  These take up a chunk of real estate, and give no more than 2 cpus
per box.  Lots of power supplies, shelf space, noise, space etc etc.

2.  1U or 2U rackmount boxes.  Better space utilization, still 2 cpus per
box, but costing a whole lot more $$$.

We, like most out there I'm sure, are constrained, by money and by space.
We need to get lots of cpus in as small a space as possible.  Lots of 1U
VA-Linux or SGI boxes would be very cool, but would drain the coffers way
too quickly.  Generic motherboards in clone cases is cheap, but takes up too
much room.

So, a colleague and I are working on a cheap and high-density 1U node.  So
far it looks like we'll be able to get two dual-CPU (P3) motherboards per 1U
chassis, with associated dual-10/100, floppy, CD and one hard drive.  And
one PCI slot.  Although it would be nice to have several Ultra160 scsi
drives in raid, a generic cluster node (for our uses) will work fine with a
single large UDMA-100 ide drive.

That's 240 cpus per 60U rack.  We're still working on condensed power for
the rack, to simplify things.  Note that I said "for our uses" above.  Our
design goals here are density and $$$.  Hence some of the niceties are being
foresworn - things like hot-swap U160 scsi raid drives, das blinken lights
up front, etc.

So, what do you think ?  If there's interest, I'll keep you posted on our
progress.  If there's LOTS of interest, we may make a larger production run
to make these available to others.

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at conservativecomputer.com  Mon Mar  5 10:56:28 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 5 Mar 2001 13:56:28 -0500
Subject: Typical hardware
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC475D@a1mbx01.pharma.com>; from Dean.Carpenter@pharma.com on Mon, Mar 05, 2001 at 12:55:41PM -0500
References: <759FC8B57540D311B14E00902727A0C002EC475D@a1mbx01.pharma.com>
Message-ID: <20010305135628.B3470@wumpus>

On Mon, Mar 05, 2001 at 12:55:41PM -0500, Carpenter, Dean wrote:
> 
> There are no commercial products out there (that I've been able to find that
> is) that will do this.  Essentially this will be an open 1U case, holding
> two motherboards, drives, floppies and power supplies.

Ah, if you're wanting an *open* case, then why not just screw the
motherboards onto standard rack shelves? That's how the Centurion II
cluster was done at the University of Virginia. The vendor who thought
it up was a small company called Atlantek. That was 2U/motherboard
because we needed a couple of PCI and benders weren't cheap yet. At 2
motherboards per U, getting the cooling right is a bit more difficult.
That's why I advise pricing cases from the vendors who sell cases.

-- g


From carlos at nernet.unex.es  Mon Mar  5 11:30:57 2001
From: carlos at nernet.unex.es (=?iso-8859-1?Q?Carlos_J._Garc=EDa_Orellana?=)
Date: Mon, 5 Mar 2001 20:30:57 +0100
Subject: Starting PVM on Scyld
Message-ID: <004401c0a5aa$cdce8120$b461319e@casa>

Hello,

I think that I have got to run PVM under Scyld.
The steps that I have followed are:

1.- I've done a small perl script to use as 'rsh' replacement. (See below,
pvmrsh).
2.- I've set the variable PVM_RSH to run that script.
3.- In the hostfile, I've created entries with form:
      node0 ip=.0
      node1 ip=.1
      .....

4.- Next, in /etc/beowulf/fstab I've added these lines:
         $(MASTER):/bin  /bin nfs 0  0
         $(MASTER):/usr  /usr nfs 0  0

    With this, we can run shell scripts on nodes and we have all PVM
directory tree.

5.- We must not copy the libraries in /usr to nodes, so I've changed the
next line in
    setup_libs (/usr/lib/beoboot/setup_libs)

    if ! vmadlib -l | sed -e 's!^/!!' | tar cf - -T - | \

    by

    if ! vmadlib -l | grep -v /usr/ | sed -e 's!^/!!' | tar cf - -T - | \

    With this we can do the ramdisk around 15 Mbytes smaller.

And that's all.

Please, if anybody has a better solution, tell me.

Other thing, thanks to Keith McDonald and Andreas Boklund for their
interest.

Carlos J. Garc?a Orellana
Universidad de Extremadura
Badajoz - SPAIN

------------------------ pvmrsh --------------------------------


#!/usr/bin/perl

$r="";

foreach $i (@ARGV)
{
 $r=$r . " " . $i;
}

#print $r . "\n";
$r="/usr/bin/bpsh " . $r;
system($r);

-------------------------End pvmrsh ----------------------------------


From mathboy at velocet.ca  Mon Mar  5 22:13:13 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 01:13:13 -0500
Subject: high physical density cluster design - power/heat/rf questions
Message-ID: <20010306011313.A99898@velocet.ca>

I have some questions about a cluster we're designing. We really need
a relatively high density configuration here, in terms of floor space.

To be able to do this I have found out pricing on some socket A boards with
onboard NICs and video (dont need video though). We arent doing anything
massively parallel right now (just running Gaussian/Jaguar/MPQC calculations)
so we dont need major bandwidth.* We're booting with root filesystem over
NFS  on these boards. Havent decided on FreeBSD or Linux yet. (This email
isnt about software config, but feel free to ask questions).

(* even with NFS disk we're looking at using MFS on freebsd (or possibly
the new md system) or the new nbd on linux or equivalent for gaussian's
scratch files - oodles faster than disk, and in our case, with no
disk, it writes across the network only when required. Various tricks
we can do here.)

The boards we're using are PC Chip M810 boards (www.pcchips.com). Linux seems
fine with the NIC on board (SiS chip of some kind - Ben LaHaise of redhat is
working with me on some of the design and has been testing it for Linux, I
have yet to play with freebsd on it).

The configuration we're looking at to achieve high physical density is
something like this:

               NIC and Video connectors
              /
 ------------=--------------	 board upside down
    | cpu |  =  |   RAM   |
    |-----|     |_________|
    |hsync|
    |     |      --fan--
    --fan--      |     | 
   _________     |hsync|
  |         |    |-----|
  |  RAM    | =  | cpu |
 -------------=-------------	board right side up

as you can see the boards kind of mesh together to take up less space. At
micro ATX factor (9.25" I think per side) and about 2.5 or 3" high for the
CPU+Sync+fan (tallest) and 1" tall for the ram or less, I can stack two of
these into 7" (4U). At 9.25" per side, 2 wide inside a cabinet gives me 4
boards per 4U in a standard 24" rack footprint. If I go 2 deep as well (ie 2x2
config), then for every 4U I can get 16 boards in.

The cost for this is amazing, some $405 CDN right now for Duron 800s with
128Mb of RAM each without the power supply (see below; standard ATX power is
$30 CDN/machine). For $30000 you can get a large ass-load of machines ;)

Obviously this is pretty ambitious. I heard talk of some people doing
something like this, with the same physical confirguration and cabinet
construction, on the list. Wondering what your experiences have been.


Problem 1
"""""""""
The problem is in the diagram above, the upside down board has another board
.5" above it - are these two boards going to leak RF like mad and interefere
with eachothers' operations? I assume there's not much to do there but to put
a layer of grounded (to the cabinet) metal in between.  This will drive up the
cabinet construction costs. I'd rather avoid this if possible.

Our original construction was going to be copper pipe and plexiglass sheeting,
but we're not sure that this will be viable for something that could be rather
tall in our future revisions of our model. Then again, copper pipe can be
bolted to our (cement) ceiling and floor for support.

For a small model that Ben LaHaise built, check the pix at
http://trooper.velocet.ca/~mathboy/giocomms/images

Its quick a hack, try not to laugh. It does engender the 'do it damn cheap'
mentality we're operating with here.

The boards are designed to slide out the front once the power and network
are disconnected.

An alternate construction we're considering is sheet metal cutting and
folding, but at much higher cost.


Problem 2 - Heat Dissipation
"""""""""""""""""""""""""""" 
The other problem we're going to have is heat. We're going to need to build
our cabinet such that its relatively sealed, except at front, so we can get
some coherent airflow in between boards. I am thinking we're going to need to
mount extra fans on the back (this is going to make the 2x2 design a bit more
tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
stacks of 32, just 16U high). I dont know what you can suggest here, its all
going to depend on physical configuration. The machine is housed in a proper
environment (Datavaults.com's facilities, where I work :) thats climate
controlled, but the inside of the cabinet will still need massive airflow,
even with the room at 68F.


Problem 3 - Power
"""""""""""""""""
The power density here is going to be high. I need to mount 64 power supplies
in close proximity to the boards, another reason I might need to maintain
the 2x1 instead of 2x2 design. (2x1 allows easier access too). 

We dont really wanna pull that many power outlets into the room - I dont know
what a diskless Duron800 board with 256Mb or 512Mb ram will use, though I
guess around .75 to 1 A. Im gonna need 3 or 4 full circuits in the room (not
too bad actually). However thats alot of weight on the cabinet to hold 60 odd
power supplies, not to mention the weight of the cables themselves weighing
down on it, and a huge mess of them to boot.

I am wondering if someone has a reliable way of wiring together multiple
boards per power supply? Whats the max density per supply? Can we
go with redundant power supplies, like N+1? We dont need that much
reliability (jobs are short, run on one machine and can be restarted
elsewhere), but I am really looking for something thats going to
reduce the cabling.

As well, I am hoping there is some economy of power converted here -
a big supply will hopefully convert power for multiple boards more
efficiently than a single supply per board. However, as always, the
main concern is cost.

Any help or ideas are appreciated.

/kc
-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From wyy at cersa.admu.edu.ph  Mon Mar  5 22:23:02 2001
From: wyy at cersa.admu.edu.ph (Horatio B. Bogbindero)
Date: Tue, 6 Mar 2001 14:23:02 +0800 (PHT)
Subject: Octave on MPI
Message-ID: <Pine.LNX.4.10.10103061421270.845-100000@cersa.admu.edu.ph>

i have just recently installed octave on a LAM/MPI cluster. the examples
are working as expected. however, i was wondering if there is any other
parallelization of octave that is a wee bit more transparent that the one
provided by octave-mpi. also, who is developing octave-mpi? why does'nt
the stuff place in octave-mpi patch be merged the official distribution?

 
--------------------------------------
William Emmanuel S. Yu
Ateneo Cervini-Eliazo Networks (ACENT)
email  :  william.s.yu at ieee.org
web    :  http://cersa.admu.edu.ph/
phone  :  63(2)4266001-5925/5904
 
Charity begins at home.
		-- Publius Terentius Afer (Terence)
 

From lowther at att.net  Mon Mar  5 23:02:30 2001
From: lowther at att.net (Ken)
Date: Tue, 06 Mar 2001 02:02:30 -0500
Subject: high physical density cluster design - power/heat/rf questions
References: <20010306011313.A99898@velocet.ca>
Message-ID: <3AA48B86.8D2418C2@att.net>

Velocet wrote:
> 
> 
> Problem 1
> """""""""
> The problem is in the diagram above, the upside down board has another board
> .5" above it - are these two boards going to leak RF like mad and interefere
> with eachothers' operations? I assume there's not much to do there but to put
> a layer of grounded (to the cabinet) metal in between.  This will drive up the
> cabinet construction costs. I'd rather avoid this if possible.

I keep promising myself I won't post after midnight.  They never look
the same in the morning. ;)  My first inclination would be to put two of
them together and see what happens.  You might think about aluminum
screen if you can keep it stretched tight enough.  Would also help with
air flow.  Maybe something more like aluminum gridding.


> 
> We dont really wanna pull that many power outlets into the room - I dont know
> what a diskless Duron800 board with 256Mb or 512Mb ram will use, though I
> guess around .75 to 1 A. I

I've measured a dual Celeron board at about 1 amp at 125 volts.  Thats
going to be a lot of heat in a small space.  Since you are going to
build a tight fitting case, you may want to stick an actual cooling unit
on top.  

-- 
Ken Lowther
Youngstown, Ohio
http://www.atmsite.org


From szii at sziisoft.com  Mon Mar  5 23:29:19 2001
From: szii at sziisoft.com (szii at sziisoft.com)
Date: Mon, 5 Mar 2001 23:29:19 -0800
Subject: high physical density cluster design - power/heat/rf questions
References: <20010306011313.A99898@velocet.ca>
Message-ID: <009a01c0a60f$2d345ae0$fd02a8c0@surfmetro.com>

We were pondering the exact same questions ourselves about a month
ago and while our project is on hold, here's what we came up with...

Mounting:  Plexiglass/plastic was our choice as well.  Strong, cheap,
                   and can be metal-reinforced if needed.

We were going to orient the boards on their sides, stacked 2 deep.  At
a height of 3" per board, you can get about 5 comfortably (6 if you try)
into a stock 19" rack.  They can also slide out this way.  Theoretically
you can get 10-12 boards in 5-6U (not counting powersupplies or hard drives)
and depending on board orientation. We were looking at ABIT VP6 boards.
They're cheap, they're DUAL CPU boards, and they're FC-PGA so they're thin.
20-24 CPUs in 5-6U.  *drool*  If AMD ever gets around to their dual boards,
those will rock as well.

For powersupplies and HA, we were going to use "lab" power supplies
and run a diode array to keep them from fighting too much.
Instead of x-smaller supplies, you can use 4-5 larger supplies and run them
into a common harness to supply power.  You'll need 3.3v, 5v, 12v supplies,
but it beats running 24 serparate supplies (IMHO) and if one dies, you don't
lose the board, you just take a drop in supply until you replace it.

For heat dissapation, we're in a CoLo facility.  Since getting to/from the
individual video/network/mouse/keyboard/etc stuff is very rare (hopefully)
once it's up, we were going to put a pair of box-fans (wind tunnel style)
in front and behind the box.  =)  In a CoLo, noise is not an issue.
Depending
on exact design, you might even get away with dropping the fans off of the
individual boards and letting the windtunnel do that part, but that's got
problems if the tunnel dies and affects every processor in the box.

I'm not an EE guy, so the power-supply issue is being handled by someone
else.  I'll field whatever questions I can, and pass on what I cannot.

If you even wander down an isle and see a semi-transparent blue piece
of plexiglass with a bunch of surfboards on it, you'll know what
it is - the Surfmetro "Box O' Boards."

Does anyone have a better way to do it?  Always room for improvement...

-Mike

----- Original Message -----
From: Velocet <mathboy at velocet.ca>
To: <beowulf at beowulf.org>
Sent: Monday, March 05, 2001 10:13 PM
Subject: high physical density cluster design - power/heat/rf questions


> I have some questions about a cluster we're designing. We really need
> a relatively high density configuration here, in terms of floor space.
>
> To be able to do this I have found out pricing on some socket A boards
with
> onboard NICs and video (dont need video though). We arent doing anything
> massively parallel right now (just running Gaussian/Jaguar/MPQC
calculations)
> so we dont need major bandwidth.* We're booting with root filesystem over
> NFS  on these boards. Havent decided on FreeBSD or Linux yet. (This email
> isnt about software config, but feel free to ask questions).
>
> (* even with NFS disk we're looking at using MFS on freebsd (or possibly
> the new md system) or the new nbd on linux or equivalent for gaussian's
> scratch files - oodles faster than disk, and in our case, with no
> disk, it writes across the network only when required. Various tricks
> we can do here.)
>
> The boards we're using are PC Chip M810 boards (www.pcchips.com). Linux
seems
> fine with the NIC on board (SiS chip of some kind - Ben LaHaise of redhat
is
> working with me on some of the design and has been testing it for Linux, I
> have yet to play with freebsd on it).
>
> The configuration we're looking at to achieve high physical density is
> something like this:
>
>                NIC and Video connectors
>               /
>  ------------=-------------- board upside down
>     | cpu |  =  |   RAM   |
>     |-----|     |_________|
>     |hsync|
>     |     |      --fan--
>     --fan--      |     |
>    _________     |hsync|
>   |         |    |-----|
>   |  RAM    | =  | cpu |
>  -------------=------------- board right side up
>
> as you can see the boards kind of mesh together to take up less space. At
> micro ATX factor (9.25" I think per side) and about 2.5 or 3" high for the
> CPU+Sync+fan (tallest) and 1" tall for the ram or less, I can stack two of
> these into 7" (4U). At 9.25" per side, 2 wide inside a cabinet gives me 4
> boards per 4U in a standard 24" rack footprint. If I go 2 deep as well (ie
2x2
> config), then for every 4U I can get 16 boards in.
>
> The cost for this is amazing, some $405 CDN right now for Duron 800s with
> 128Mb of RAM each without the power supply (see below; standard ATX power
is
> $30 CDN/machine). For $30000 you can get a large ass-load of machines ;)
>
> Obviously this is pretty ambitious. I heard talk of some people doing
> something like this, with the same physical confirguration and cabinet
> construction, on the list. Wondering what your experiences have been.
>
>
> Problem 1
> """""""""
> The problem is in the diagram above, the upside down board has another
board
> .5" above it - are these two boards going to leak RF like mad and
interefere
> with eachothers' operations? I assume there's not much to do there but to
put
> a layer of grounded (to the cabinet) metal in between.  This will drive up
the
> cabinet construction costs. I'd rather avoid this if possible.
>
> Our original construction was going to be copper pipe and plexiglass
sheeting,
> but we're not sure that this will be viable for something that could be
rather
> tall in our future revisions of our model. Then again, copper pipe can be
> bolted to our (cement) ceiling and floor for support.
>
> For a small model that Ben LaHaise built, check the pix at
> http://trooper.velocet.ca/~mathboy/giocomms/images
>
> Its quick a hack, try not to laugh. It does engender the 'do it damn
cheap'
> mentality we're operating with here.
>
> The boards are designed to slide out the front once the power and network
> are disconnected.
>
> An alternate construction we're considering is sheet metal cutting and
> folding, but at much higher cost.
>
>
> Problem 2 - Heat Dissipation
> """"""""""""""""""""""""""""
> The other problem we're going to have is heat. We're going to need to
build
> our cabinet such that its relatively sealed, except at front, so we can
get
> some coherent airflow in between boards. I am thinking we're going to need
to
> mount extra fans on the back (this is going to make the 2x2 design a bit
more
> tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
> stacks of 32, just 16U high). I dont know what you can suggest here, its
all
> going to depend on physical configuration. The machine is housed in a
proper
> environment (Datavaults.com's facilities, where I work :) thats climate
> controlled, but the inside of the cabinet will still need massive airflow,
> even with the room at 68F.
>
>
> Problem 3 - Power
> """""""""""""""""
> The power density here is going to be high. I need to mount 64 power
supplies
> in close proximity to the boards, another reason I might need to maintain
> the 2x1 instead of 2x2 design. (2x1 allows easier access too).
>
> We dont really wanna pull that many power outlets into the room - I dont
know
> what a diskless Duron800 board with 256Mb or 512Mb ram will use, though I
> guess around .75 to 1 A. Im gonna need 3 or 4 full circuits in the room
(not
> too bad actually). However thats alot of weight on the cabinet to hold 60
odd
> power supplies, not to mention the weight of the cables themselves
weighing
> down on it, and a huge mess of them to boot.
>
> I am wondering if someone has a reliable way of wiring together multiple
> boards per power supply? Whats the max density per supply? Can we
> go with redundant power supplies, like N+1? We dont need that much
> reliability (jobs are short, run on one machine and can be restarted
> elsewhere), but I am really looking for something thats going to
> reduce the cabling.
>
> As well, I am hoping there is some economy of power converted here -
> a big supply will hopefully convert power for multiple boards more
> efficiently than a single supply per board. However, as always, the
> main concern is cost.
>
> Any help or ideas are appreciated.
>
> /kc
> --
> Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
CANADA
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From Eugene.Leitl at lrz.uni-muenchen.de  Tue Mar  6 01:44:26 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Tue, 6 Mar 2001 10:44:26 +0100 (MET)
Subject: LinuxBIOS
Message-ID: <Pine.GSO.4.03.10103061030570.7838-100000@sun1.lrz-muenchen.de>

I don't recall it being mentioned here, and even if,
it can use some repetition:

	http://www.acl.lanl.gov/linuxbios/index.html

Very interesting work:

http://www.acl.lanl.gov/linuxbios/status/index.html

[...]
SiS SiS630

			      This is now my
                              desktop
                              machine, with
                              full X11
                              support, and
                              our standard
                              cluster node.
                              We boot
                              directly into
                              Scyld from
                              FLASH. Now
                              booting Linux
                              to multiuser
                              from power on!
                              Now using
                              /dev/fb. See
                              also the note
                              on Millenium
                              Disk-On-Chip
                              2001 -- we're
                              no longer
                              constrained to
                              512Kbytes!
[...]
______________________________________________________________
ICBMTO  : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 
57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3


From Marcin.Kolbuszewski at nrc.ca  Tue Mar  6 07:00:47 2001
From: Marcin.Kolbuszewski at nrc.ca (Kolbuszewski, Marcin)
Date: Tue, 6 Mar 2001 10:00:47 -0500 
Subject: Typical hardware
Message-ID: <9258C238472FD411AA860004AC369AF9064DD9ED@nrcmrdex1.imsb.nrc.ca>

For truly minimalist approach to building beowulfs see

http://www.clustercompute.com

One could make them diskless with several boards per power supply 

Marcin
 

----------------------------------------------------------------------------
--
High Performance Computing Group             Coordination Office    
Institute for Information Technology         C3.ca Association    
National Research Council of Canada

Rm 286, M-50, 1500 Montreal Road             tel 613-998-7749
Ottawa, Canada                               fax 613-998-5400
K1A 0R6                                      e-mail
Marcin.Kolbuszewski at nrc.ca


-----Original Message-----
From: JParker at coinstar.com [mailto:JParker at coinstar.com]
Sent: Monday, March 05, 2001 1:06 PM
To: Carpenter, Dean
Cc: beowulf at beowulf.org; beowulf-admin at beowulf.org;
'gaborb at athomepc.net'
Subject: Re: Typical hardware


G'Day !

I work for a vending machine type company.  Our solution was too build
aluminum
cases out of sheet metal (13" w x 6-1/2" h x 11" d).  No thrills (no floppy,
cd,
etc), just bare minimum of what you need (size was determined so we can use
generic expansion boards).  Still have to deal with a power supply for each
node, but is is reasonably compact package (hd screwed into sheet metal
wall).
It is also easy to build with ~.100" aluminum sheet, a bender and rivets.
An
added benefit is that you can standardize on screw sizes and threads so you
spare part kits becomes very minimal.

You can also build larger cases to handle multiple harddrives etc.  Our
latest
version has a cd.  We increased the height slightly and put a screwed a
shelf
in.  It would very easy to change dimensions for any hardware configuration
you
can think of.

Think home built rack-mount cases ....

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more
important
than that !!!


                    "Carpenter, Dean"

                    <Dean.Carpenter at p        To:     beowulf at beowulf.org

                    harma.com>               cc:     "'gaborb at athomepc.net'"
<gaborb at athomepc.net>                                    
                    Sent by:                 Subject:     Typical hardware

                    beowulf-admin at beo

                    wulf.org

 
                    03/05/01 08:53 AM

 
We're just now beginning to mess around with clustering - initial
proof-of-concept for the local code and so on.  So far so good, using spare
equipment we have lying around, or on eval.

Next step is to use some "real" hardware, so we can get a sense of the
throughput benefit.  For example, right now it's a mishmosh of hardware
running on a 3Com Switch 1000, 100m to the head node, and 10m to the slaves.
The throughput one will be with 100m switched all around, possibly with a
gig uplink to the head node.

Based on this, we hunt for money for the production cluster(s) ...

What hardware are people using ?  I've done a lot of poking around at the
various clusters linked to off beowulf.org, and seen mainly two types :

1.  Commodity white boxes, perhaps commercial ones - typical desktop type
cases.  These take up a chunk of real estate, and give no more than 2 cpus
per box.  Lots of power supplies, shelf space, noise, space etc etc.

2.  1U or 2U rackmount boxes.  Better space utilization, still 2 cpus per
box, but costing a whole lot more $$$.

We, like most out there I'm sure, are constrained, by money and by space.
We need to get lots of cpus in as small a space as possible.  Lots of 1U
VA-Linux or SGI boxes would be very cool, but would drain the coffers way
too quickly.  Generic motherboards in clone cases is cheap, but takes up too
much room.

So, a colleague and I are working on a cheap and high-density 1U node.  So
far it looks like we'll be able to get two dual-CPU (P3) motherboards per 1U
chassis, with associated dual-10/100, floppy, CD and one hard drive.  And
one PCI slot.  Although it would be nice to have several Ultra160 scsi
drives in raid, a generic cluster node (for our uses) will work fine with a
single large UDMA-100 ide drive.

That's 240 cpus per 60U rack.  We're still working on condensed power for
the rack, to simplify things.  Note that I said "for our uses" above.  Our
design goals here are density and $$$.  Hence some of the niceties are being
foresworn - things like hot-swap U160 scsi raid drives, das blinken lights
up front, etc.

So, what do you think ?  If there's interest, I'll keep you posted on our
progress.  If there's LOTS of interest, we may make a larger production run
to make these available to others.

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From joishi at amnh.org  Tue Mar  6 08:28:34 2001
From: joishi at amnh.org (Jeffrey Oishi)
Date: Tue, 6 Mar 2001 11:28:34 -0500 (EST)
Subject: redhat 7.0 upgrade woes
Message-ID: <Pine.GSO.3.96.1010306112358.15108A-100000@research.amnh.org>

Hi--

I'm trying to upgrade a 130 node cluster of machines with no video cards
from RH6.1 to RH7.0. I have created an nfs-method kickstart system that
works--if a video card is in the machine. If not, and I add
console=ttyS0,9600n8 to the SYSLINUX.CFG, then the installer runs ok and
starts spitting stuff out the serial port. However, the installer then
crashes right before it starts upgrading the packages. It happily works up
until the standard redhat screen showing each of the packages zipping by
comes up. There it hangs. This has happened on a number of boxes.

Does anyone have any idea if the install program will even work with the
console on a serial port?

If this doesn't work soon, I'm just going to reclone all the drives...

thanks,

j
-----
jeff oishi
Rose Center for Earth and Space
American Museum of Natural History
joishi at amnh.org


From mathboy at velocet.ca  Tue Mar  6 09:01:50 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 12:01:50 -0500
Subject: high physical density cluster design - power/heat/rf questions
In-Reply-To: <009a01c0a60f$2d345ae0$fd02a8c0@surfmetro.com>; from szii@sziisoft.com on Mon, Mar 05, 2001 at 11:29:19PM -0800
References: <20010306011313.A99898@velocet.ca> <009a01c0a60f$2d345ae0$fd02a8c0@surfmetro.com>
Message-ID: <20010306120150.T84763@velocet.ca>

On Mon, Mar 05, 2001 at 11:29:19PM -0800, szii at sziisoft.com's all...
> We were pondering the exact same questions ourselves about a month
> ago and while our project is on hold, here's what we came up with...
> 
> Mounting:  Plexiglass/plastic was our choice as well.  Strong, cheap,
>                    and can be metal-reinforced if needed.
> 
> We were going to orient the boards on their sides, stacked 2 deep.  At
> a height of 3" per board, you can get about 5 comfortably (6 if you try)
> into a stock 19" rack.  They can also slide out this way.  Theoretically
> you can get 10-12 boards in 5-6U (not counting powersupplies or hard drives)
> and depending on board orientation. We were looking at ABIT VP6 boards.
> They're cheap, they're DUAL CPU boards, and they're FC-PGA so they're thin.
> 20-24 CPUs in 5-6U.  *drool*  If AMD ever gets around to their dual boards,
> those will rock as well.

The price/performance for intel is just so much lower than with Athlons.
We've run a number of tests and the Athlons are at least 90% the speed
of a similarly clocked Intel chip for Gaussian and MPQC calculations. I
say at least as it may be more. Considering the price is usually 50-70%
depending on the speed of an intel chip, theres just a huge win.

Prices for me, in Toronto, are around $240 CDN for the VP6, and $300
per 800Mhz P3 (granted its 133Mhz FSB, where the Duron is 100).
Thats $840 for the board and CPUs not including ram. I have no savings
with ram either, I basically have to double it on a dual board. Compare
this to $260 * 2 = $560 for two boards + CPUs for the M810 with Durons.
The VP6 may well take up a bit less space, it being standard ATX factor
where as the M810 is 2x9.5" wide. For this increased cost, I think that
even the $560 vs $840 will more than makeup for the 100 vs 133Mhz bus
as well. (Actually, the M810 runs up to 2x133 FSB for Tbirds, and the
TBirds are about $50CDN more expensive. I am running some tests to see
the performance difference between Duron and TBird with Gaussian - anyone
got any stats?).

The problem with dual CPU boards with gaussian is that it thrashes the cache.
I dont know much behind the math (yet) of gaussian style calculations (quantum
computational chemistry) but I gather there are large matrices involved. These
often do not fit in L1 or L2 cache and the memory bus gets a nice hard
workout. On top of that, main memory often isnt big enough depending on the
calculation, and disk can be thrashed heavily as well with its scratch files.
You get two CPUs that wanna hammer both disk and ram and things start slowing
down.

With the current state of both freeBSD and Linux resource locking, we're
finding that for some jobs there are huge bottlenecks at the memory and PCI
(disk) bus.  Certain jobs run SLOWER on two CPUs on one board than on a single
CPU (this was from my tests of scan jobs with gaussian on a BP6 with 2xC400s
as well as 2xC366s O/C'd to 550Mhz). Usually performance isnt that
pathalogically bad, but most jobs run at a loss of efficiency (ie its not
twice as fast). Seeing as we need this cluster now and cant wait for dual
Athlon boards to do some real tests, I am relatively confident that the M810
single Duron/Tbird boards sporting a D800 are going to do us quite well for
the price/performance ratio. (Considering the cost of DDR boards and ram,
thats not a possibility either.)

> For powersupplies and HA, we were going to use "lab" power supplies
> and run a diode array to keep them from fighting too much.

Saw someone post that most diodes will steal too much voltage to be
able to maintain a steady ~+3V supply to the finicky CPUs unless you
are very careful. I do have an electrical engineering friend tho... ;)

> Instead of x-smaller supplies, you can use 4-5 larger supplies and run them
> into a common harness to supply power.  You'll need 3.3v, 5v, 12v supplies,
> but it beats running 24 serparate supplies (IMHO) and if one dies, you don't
> lose the board, you just take a drop in supply until you replace it.
> 
> For heat dissapation, we're in a CoLo facility.  Since getting to/from the
> individual video/network/mouse/keyboard/etc stuff is very rare (hopefully)
> once it's up, we were going to put a pair of box-fans (wind tunnel style)
> in front and behind the box.  =)  In a CoLo, noise is not an issue.
> Depending
> on exact design, you might even get away with dropping the fans off of the
> individual boards and letting the windtunnel do that part, but that's got
> problems if the tunnel dies and affects every processor in the box.

True. Without the fans on the boards we can actually get the boards closer -
which means more heat though :) Its too bad we cant get the vanes of the
heatsinks 90 degrees from what they traditionally are so that the air will
flow through them (since we're mounting the sides of the boards its hard
to flow air in from the side).

Like I said we may just take some accordion airduct hose from the
ceiling and latch it onto the whole array. The Liebert is far more
reliable than any box fan and is alarmed up the yingyang. Its possible
it would stop, however, so we'll have poweroff protection for when things
get rather warm. Better to lose 6 hours of calculations than fry the boards.
I spose we're lucky in that way, that we're running mainly jobs that
are no longer than 1-2 days of calculations in most cases. GIves us alot
of flexibility.

/kc


> I'm not an EE guy, so the power-supply issue is being handled by someone
> else.  I'll field whatever questions I can, and pass on what I cannot.
> 
> If you even wander down an isle and see a semi-transparent blue piece
> of plexiglass with a bunch of surfboards on it, you'll know what
> it is - the Surfmetro "Box O' Boards."
> 
> Does anyone have a better way to do it?  Always room for improvement...
> 
> -Mike
> 
> ----- Original Message -----
> From: Velocet <mathboy at velocet.ca>
> To: <beowulf at beowulf.org>
> Sent: Monday, March 05, 2001 10:13 PM
> Subject: high physical density cluster design - power/heat/rf questions
> 
> 
> > I have some questions about a cluster we're designing. We really need
> > a relatively high density configuration here, in terms of floor space.
> >
> > To be able to do this I have found out pricing on some socket A boards
> with
> > onboard NICs and video (dont need video though). We arent doing anything
> > massively parallel right now (just running Gaussian/Jaguar/MPQC
> calculations)
> > so we dont need major bandwidth.* We're booting with root filesystem over
> > NFS  on these boards. Havent decided on FreeBSD or Linux yet. (This email
> > isnt about software config, but feel free to ask questions).
> >
> > (* even with NFS disk we're looking at using MFS on freebsd (or possibly
> > the new md system) or the new nbd on linux or equivalent for gaussian's
> > scratch files - oodles faster than disk, and in our case, with no
> > disk, it writes across the network only when required. Various tricks
> > we can do here.)
> >
> > The boards we're using are PC Chip M810 boards (www.pcchips.com). Linux
> seems
> > fine with the NIC on board (SiS chip of some kind - Ben LaHaise of redhat
> is
> > working with me on some of the design and has been testing it for Linux, I
> > have yet to play with freebsd on it).
> >
> > The configuration we're looking at to achieve high physical density is
> > something like this:
> >
> >                NIC and Video connectors
> >               /
> >  ------------=-------------- board upside down
> >     | cpu |  =  |   RAM   |
> >     |-----|     |_________|
> >     |hsync|
> >     |     |      --fan--
> >     --fan--      |     |
> >    _________     |hsync|
> >   |         |    |-----|
> >   |  RAM    | =  | cpu |
> >  -------------=------------- board right side up
> >
> > as you can see the boards kind of mesh together to take up less space. At
> > micro ATX factor (9.25" I think per side) and about 2.5 or 3" high for the
> > CPU+Sync+fan (tallest) and 1" tall for the ram or less, I can stack two of
> > these into 7" (4U). At 9.25" per side, 2 wide inside a cabinet gives me 4
> > boards per 4U in a standard 24" rack footprint. If I go 2 deep as well (ie
> 2x2
> > config), then for every 4U I can get 16 boards in.
> >
> > The cost for this is amazing, some $405 CDN right now for Duron 800s with
> > 128Mb of RAM each without the power supply (see below; standard ATX power
> is
> > $30 CDN/machine). For $30000 you can get a large ass-load of machines ;)
> >
> > Obviously this is pretty ambitious. I heard talk of some people doing
> > something like this, with the same physical confirguration and cabinet
> > construction, on the list. Wondering what your experiences have been.
> >
> >
> > Problem 1
> > """""""""
> > The problem is in the diagram above, the upside down board has another
> board
> > .5" above it - are these two boards going to leak RF like mad and
> interefere
> > with eachothers' operations? I assume there's not much to do there but to
> put
> > a layer of grounded (to the cabinet) metal in between.  This will drive up
> the
> > cabinet construction costs. I'd rather avoid this if possible.
> >
> > Our original construction was going to be copper pipe and plexiglass
> sheeting,
> > but we're not sure that this will be viable for something that could be
> rather
> > tall in our future revisions of our model. Then again, copper pipe can be
> > bolted to our (cement) ceiling and floor for support.
> >
> > For a small model that Ben LaHaise built, check the pix at
> > http://trooper.velocet.ca/~mathboy/giocomms/images
> >
> > Its quick a hack, try not to laugh. It does engender the 'do it damn
> cheap'
> > mentality we're operating with here.
> >
> > The boards are designed to slide out the front once the power and network
> > are disconnected.
> >
> > An alternate construction we're considering is sheet metal cutting and
> > folding, but at much higher cost.
> >
> >
> > Problem 2 - Heat Dissipation
> > """"""""""""""""""""""""""""
> > The other problem we're going to have is heat. We're going to need to
> build
> > our cabinet such that its relatively sealed, except at front, so we can
> get
> > some coherent airflow in between boards. I am thinking we're going to need
> to
> > mount extra fans on the back (this is going to make the 2x2 design a bit
> more
> > tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
> > stacks of 32, just 16U high). I dont know what you can suggest here, its
> all
> > going to depend on physical configuration. The machine is housed in a
> proper
> > environment (Datavaults.com's facilities, where I work :) thats climate
> > controlled, but the inside of the cabinet will still need massive airflow,
> > even with the room at 68F.
> >
> >
> > Problem 3 - Power
> > """""""""""""""""
> > The power density here is going to be high. I need to mount 64 power
> supplies
> > in close proximity to the boards, another reason I might need to maintain
> > the 2x1 instead of 2x2 design. (2x1 allows easier access too).
> >
> > We dont really wanna pull that many power outlets into the room - I dont
> know
> > what a diskless Duron800 board with 256Mb or 512Mb ram will use, though I
> > guess around .75 to 1 A. Im gonna need 3 or 4 full circuits in the room
> (not
> > too bad actually). However thats alot of weight on the cabinet to hold 60
> odd
> > power supplies, not to mention the weight of the cables themselves
> weighing
> > down on it, and a huge mess of them to boot.
> >
> > I am wondering if someone has a reliable way of wiring together multiple
> > boards per power supply? Whats the max density per supply? Can we
> > go with redundant power supplies, like N+1? We dont need that much
> > reliability (jobs are short, run on one machine and can be restarted
> > elsewhere), but I am really looking for something thats going to
> > reduce the cabling.
> >
> > As well, I am hoping there is some economy of power converted here -
> > a big supply will hopefully convert power for multiple boards more
> > efficiently than a single supply per board. However, as always, the
> > main concern is cost.
> >
> > Any help or ideas are appreciated.
> >
> > /kc
> > --
> > Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
> CANADA
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From marsden at scripps.edu  Tue Mar  6 09:39:42 2001
From: marsden at scripps.edu (Brian Marsden)
Date: Tue, 6 Mar 2001 09:39:42 -0800 (PST)
Subject: Power-managment of slave nodes
Message-ID: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu>

Dear all,

 I look after a 96 processor, 48 node, Pentium III linux cluster. The
owners have just received their first electricity bill for the machine and
unsuprisingly have had a nasty shock! They are now desperate to find ways
to keep the bill as low as possible.

 One solution put forward has been to have nodes shut themselves down
using APM when not in use. Then when a node is needed for a job, it could
be switched back on via wake-on-LAN on the ethernet card. I see a number of
problems associated with this:

1) APM is not supported under Linux 2.2 for SMP. However I believe that it
   is for 2.4 - can anyone comment on this?
2) Wake-on-LAN - I'm not 100% clear on whether this listens for a specific
   packet or whether it will just fire the machine up if a packet comes
   along with the NICs MAC address. If the later is the case I think we
   are snookered since we use PBS as the queueing system which I believe
   sends out packets to query nodes every now and then.

 Before I spend more time delving deeper into these problems, has anyone
ever attempted to try to do all of this? If so, what are the perils and
pitfalls? Is this a completely crazy idea?

Thanks

Brian.

-- 
---------------------------------------------------------------------
 Brian Marsden        			  Email: marsden at scripps.edu
 TSRI, San Diego, USA.  Phone: +1 858 784 8698  Fax: +1 858 784 8299
---------------------------------------------------------------------


From James.P.Lux at jpl.nasa.gov  Tue Mar  6 10:36:47 2001
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 6 Mar 2001 10:36:47 -0800
Subject: high physical density cluster design - power/heat/rf questions
Message-ID: <002e01c0a66c$66d99c30$61064f89@cerulean.jpl.nasa.gov>

>> For powersupplies and HA, we were going to use "lab" power supplies
>> and run a diode array to keep them from fighting too much.
>
>Saw someone post that most diodes will steal too much voltage to be
>able to maintain a steady ~+3V supply to the finicky CPUs unless you
>are very careful. I do have an electrical engineering friend tho... ;)


Power supply needs to be regulated to 5%... A typical Schottky diode might
have a forward voltage drop of 0.5 Volt (10%), and don't forget the loss in
the wiring, as well.  If you are using lab supplies, you could carefully
boost the voltage to compensate, or use the sense input. (paralleling gets
tricky though).

Your real problem is finding a big enough supply. 100W (nominal power
consumption for an ATX mobo), is 20A @ 5V.  Granted, the power is actually
spread across several voltages, and the supplies aren't all that efficient
(say 70%), but you're probably still looking at 10A on each board.  Gang up
10 boards and you're at 100A.. Gang up 24, and you're looking at 240A.  I'll
bet that the lab supply that puts out 240A @ 5V isn't going to be very
cheap.  Even if you divide it up among 4 supplies (@ 60A each) it is still a
chore.


>
>> Instead of x-smaller supplies, you can use 4-5 larger supplies and run
them
>> into a common harness to supply power.  You'll need 3.3v, 5v, 12v
supplies,
>> but it beats running 24 serparate supplies (IMHO) and if one dies, you
don't
>> lose the board, you just take a drop in supply until you replace it.
>>
>> For heat dissapation, we're in a CoLo facility.  Since getting to/from
the
>> individual video/network/mouse/keyboard/etc stuff is very rare
(hopefully)
>> once it's up, we were going to put a pair of box-fans (wind tunnel style)
>> in front and behind the box.  =)  In a CoLo, noise is not an issue.
>> Depending
>> on exact design, you might even get away with dropping the fans off of
the
>> individual boards and letting the windtunnel do that part, but that's got
>> problems if the tunnel dies and affects every processor in the box.


Better check your airflow requirements..  If a typical PC needs 30-40 CFM to
keep the temperatures reasonable (21 CFM for a micro ATX), a gang of 24 will
need 600-1000 CFM.  This is substantially more than the typical "box fan"
can push at any reasonable pressure drop.  A typical 1/4HP, 18"diameter fan
has a free air flow of about 2500CFM, dropping down to 1000 cfm at 0.3"
water column pressure drop.

Your idea of hooking right to the chiller might be a good one.  Typical HVAC
systems are designed to push against substantial pressure drop, and, of
course, the air will be coldest at that point.

>
>
>Like I said we may just take some accordion airduct hose from the
>ceiling and latch it onto the whole array. The Liebert is far more
>reliable than any box fan and is alarmed up the yingyang. Its possible


>> > Problem 1
>> > """""""""
>> > The problem is in the diagram above, the upside down board has another
>> board
>> > .5" above it - are these two boards going to leak RF like mad and
>> interefere
>> > with eachothers' operations? I assume there's not much to do there but
to
>> put
>> > a layer of grounded (to the cabinet) metal in between.  This will drive
up
>> the
>> > cabinet construction costs. I'd rather avoid this if possible.
>> >


I don't think that "interboard" EMI would be a big problem.


>> > Problem 3 - Power
>> > """""""""""""""""
>> > The power density here is going to be high. I need to mount 64 power
>> supplies
>> > in close proximity to the boards, another reason I might need to
maintain
>> > the 2x1 instead of 2x2 design. (2x1 allows easier access too).
>> >
>> >
>> >
>> > I am wondering if someone has a reliable way of wiring together
multiple
>> > boards per power supply? Whats the max density per supply? Can we
>> > go with redundant power supplies, like N+1? We dont need that much
>> > reliability (jobs are short, run on one machine and can be restarted
>> > elsewhere), but I am really looking for something thats going to
>> > reduce the cabling.

Standard PC power supplies may not parallel well, even with diodes (it is
something you would definitely want to test).  It will almost certainly be
model specific (i.e. what works with Brand X PS may not work with Brand Y).

However, running two boards off a 300W PS might be reasonably feasible.


From James.P.Lux at jpl.nasa.gov  Tue Mar  6 11:23:37 2001
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 6 Mar 2001 11:23:37 -0800
Subject: high physical density cluster design -structural...
Message-ID: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov>

Rather than the copper pipe and fittings (which isn't very structural, and
will be a pretty significant problem as it gets bigger), you might want to
look at some alternatives:

1) UniStrut (available in aluminum and in galv steel) is much stronger, has
nice 90 degree connectors, etc.  There are a variety of similar products
made from aluminum extrusions of one kind or another with longitudinal slots
that make very nice rigid boxes. You assemble it with captive nuts and
bolts. The best thing about these products is that they are rectangular, not
round, which makes attaching stuff much easier.

2) Speedrail - a brand of cast aluminum fittings that works with aluminum
tubing to make structures, etc. (and hand and safety railings...)   There
are other brands, as well.  There are versions for 2" and 1" tubing, at
least.  The tubing fits into the socket on the fitting, and you tighten set
screws to hold it together.  (Or you can epoxy it....).  For a given $$, the
aluminum tubing will be much stronger and more rigid than the copper tubing.


As far as design guidelines go,  a 0.6 g side load, or so, would be an
appropriate number.  For instance, you should build it strong enough so that
you can (gently) tip it over on it's side and not have it fall apart during
the move.  In even a small earthquake, poorly braced sheet metal racks
loaded with many pounds of equipment just crumple.  Especially on less
expensive racking, a lot of the strength depends on the sides not buckling,
and once it bends even a little bit, it just caves in.

After all, some day, you WILL have to move the rack a bit, even if only a
few feet to let them take up the tile underneath it.


>> >
>> > Problem 1
>> > """""""""
>> > The problem is in the diagram above, the upside down board has another
>> board
>> > .5" above it - are these two boards going to leak RF like mad and
>> interefere
>> > with eachothers' operations? I assume there's not much to do there but
to
>> put
>> > a layer of grounded (to the cabinet) metal in between.  This will drive
up
>> the
>> > cabinet construction costs. I'd rather avoid this if possible.
>> >
>> > Our original construction was going to be copper pipe and plexiglass
>> sheeting,
>> > but we're not sure that this will be viable for something that could be
>> rather
>> > tall in our future revisions of our model. Then again, copper pipe can
be
>> > bolted to our (cement) ceiling and floor for support.
>> >


From lindahl at conservativecomputer.com  Tue Mar  6 11:11:50 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Tue, 6 Mar 2001 14:11:50 -0500
Subject: Power-managment of slave nodes
In-Reply-To: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu>; from marsden@scripps.edu on Tue, Mar 06, 2001 at 09:39:42AM -0800
References: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu>
Message-ID: <20010306141150.C5629@wumpus>

On Tue, Mar 06, 2001 at 09:39:42AM -0800, Brian Marsden wrote:

>  Before I spend more time delving deeper into these problems, has anyone
> ever attempted to try to do all of this? If so, what are the perils and
> pitfalls? Is this a completely crazy idea?

I haven't done it, but I know that one pitfall is that PBS doesn't
like down nodes. You need to have a daemon mark them "offline" when
they're going to be off.

Another way to power a node down is using an external power switch,
some of which talk to ethernet. That's more expensive than wake-on-lan
these days.

-- g


From lowther at att.net  Tue Mar  6 11:25:25 2001
From: lowther at att.net (Ken)
Date: Tue, 06 Mar 2001 14:25:25 -0500
Subject: Power-managment of slave nodes
References: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu>
Message-ID: <3AA539A5.D698E3F6@att.net>

Brian Marsden wrote:
> 


> 1) APM is not supported under Linux 2.2 for SMP. However I believe that it
>    is for 2.4 - can anyone comment on this?


Check the help when configuring.  "Note that the APM support is almost
completely disabled for machines with more than one CPU."


-- 
Ken Lowther
Youngstown, Ohio
http://www.atmsite.org


From parkw at better.net  Tue Mar  6 11:24:56 2001
From: parkw at better.net (William Park)
Date: Tue, 6 Mar 2001 14:24:56 -0500
Subject: high physical density cluster design - power/heat/rf questions
In-Reply-To: <20010306102859.R84763@velocet.ca>; from mathboy@velocet.ca on Tue, Mar 06, 2001 at 10:28:59AM -0500
References: <20010306011313.A99898@velocet.ca> <20010306023010.A27427@better.net> <20010306102859.R84763@velocet.ca>
Message-ID: <20010306142456.A27961@better.net>

On Tue, Mar 06, 2001 at 10:28:59AM -0500, Velocet wrote:
> On Tue, Mar 06, 2001 at 02:30:10AM -0500, William Park's all...
> > Wouldn't regular computer cases be cheaper?  You wouldn't save that much
> > space anyways.

> Welding will take more time and be more expensive. If we use the copper
> pipe we're looking to just solder the t joints.  Not sure yet, we may
> use sheet metal.

Oh... wood.  You can screw the motherboards onto the plywood or panel;
then, these can be shelved like in bookcases or nailed to long beam or
panel.

> > I'm putting together a Linux cluster using ABit VP6 dual-cpu
> > motherboard.  But, 64-cpu is outside my price. ;-)
> 
> For the density and price, 64 cpus is within range with only $30000 CDN
> that we have to play with. My mainboard is $130. The VP6 is about
> $190 or $200 CDN IIRC, then you gotta add a NIC ($20 minimum). Already
> a cost savings there (one limitation of the M810 we're using however
> is its only got 2 DIMM slots and 512Mb ram is expensive. So we're
> maxing at 512Mb of ram per node for this config.)

All my nodes are fully loaded machines (ie.  floppy, disk, ethernet,
video).  I chose SMP because
    - adding another CPU is cheaper than adding another node
    - my electrical wiring is standard 15A circuit
    - I don't have dedicated Air Conditioner.

Your electrical and a/c problem is big.  Assuming 100W per motherboard,
you need 7kW just to power the boards.  Can your circuit handle that?

Is PC-Chip a good motherboard?  I don't have personal experience, but
I've heard it's crap.  I am assuming that you're in graduate level, so
your time is free.  But, if you factor in time, effort, and lost sleep,
then perhaps a quality brand might be better choice.

---William Park, Open Geometry Consulting, Linux/Python, 8 CPUs.


From mathboy at velocet.ca  Tue Mar  6 11:49:04 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 14:49:04 -0500
Subject: high physical density cluster design -structural...
In-Reply-To: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov>; from James.P.Lux@jpl.nasa.gov on Tue, Mar 06, 2001 at 11:23:37AM -0800
References: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov>
Message-ID: <20010306144904.F1196@velocet.ca>

On Tue, Mar 06, 2001 at 11:23:37AM -0800, Jim Lux's all...
> Rather than the copper pipe and fittings (which isn't very structural, and
> will be a pretty significant problem as it gets bigger), you might want to
> look at some alternatives:

Ya, were convening with a few people who've done some work with metal
as well as piping this week to go over a few other cheap options. The
cluster will be 48 to 64 nodes depending on pricing of other materials,
network switches, etc. 48 nodes will be 2 stacks of 24, and at 1U per,
thats only 3.5' tall. So we dont need something thats bombproof,
just sturdy.

> 1) UniStrut (available in aluminum and in galv steel) is much stronger, has
> nice 90 degree connectors, etc.  There are a variety of similar products
> made from aluminum extrusions of one kind or another with longitudinal slots
> that make very nice rigid boxes. You assemble it with captive nuts and
> bolts. The best thing about these products is that they are rectangular, not
> round, which makes attaching stuff much easier.

Hmm, this stuff looks really great - and they seem to be somewhat local
to me. :) Looks like it might not be that cheap however, even if it is
'cheap' for industrial applications. Wonder if I can find prices online
somewhere here...

> 2) Speedrail - a brand of cast aluminum fittings that works with aluminum
> tubing to make structures, etc. (and hand and safety railings...)   There
> are other brands, as well.  There are versions for 2" and 1" tubing, at
> least.  The tubing fits into the socket on the fitting, and you tighten set
> screws to hold it together.  (Or you can epoxy it....).  For a given $$, the
> aluminum tubing will be much stronger and more rigid than the copper tubing.
> 
> 
> As far as design guidelines go,  a 0.6 g side load, or so, would be an
> appropriate number.  For instance, you should build it strong enough so that
> you can (gently) tip it over on it's side and not have it fall apart during
> the move.  In even a small earthquake, poorly braced sheet metal racks
> loaded with many pounds of equipment just crumple.  Especially on less
> expensive racking, a lot of the strength depends on the sides not buckling,
> and once it bends even a little bit, it just caves in.
> 
> After all, some day, you WILL have to move the rack a bit, even if only a
> few feet to let them take up the tile underneath it.

True. I dont have a scale, but the board with CPU and ram is about 1.5 or
2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
boards. (Need to figure out if I am going to double up the mainboards
per powersupply, would save alot of weight).

Thanks for the pointers!

/kc
 
> 
> >> >
> >> > Problem 1
> >> > """""""""
> >> > The problem is in the diagram above, the upside down board has another
> >> board
> >> > .5" above it - are these two boards going to leak RF like mad and
> >> interefere
> >> > with eachothers' operations? I assume there's not much to do there but
> to
> >> put
> >> > a layer of grounded (to the cabinet) metal in between.  This will drive
> up
> >> the
> >> > cabinet construction costs. I'd rather avoid this if possible.
> >> >
> >> > Our original construction was going to be copper pipe and plexiglass
> >> sheeting,
> >> > but we're not sure that this will be viable for something that could be
> >> rather
> >> > tall in our future revisions of our model. Then again, copper pipe can
> be
> >> > bolted to our (cement) ceiling and floor for support.
> >> >
> 
> 

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From ksheumaker at advancedclustering.com  Tue Mar  6 12:09:53 2001
From: ksheumaker at advancedclustering.com (Kyle Sheumaker)
Date: Tue, 6 Mar 2001 14:09:53 -0600
Subject: Power-managment of slave nodes
References: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu>
Message-ID: <001901c0a679$688871c0$3201a8c0@act>

I have played with wake-on-lan and not had a lot of luck.  It is a special
packet (http://www.scyld.com/expert/wake-on-lan.html), but some motherboards
/ nics just don't seem to want to "wake."

They aren't cheap but I would suggest something like the APC masterswitch,
it's a network controllable power switch
(http://www.apc.com/products/masterswitch/index.cfm).  I've used them before
their pretty cool, web, telnet, and SNMP controllable.

-- Kyle


----- Original Message -----
From: "Brian Marsden" <marsden at scripps.edu>
To: <beowulf at beowulf.org>
Sent: Tuesday, March 06, 2001 11:39 AM
Subject: Power-managment of slave nodes


> Dear all,
>
>  I look after a 96 processor, 48 node, Pentium III linux cluster. The
> owners have just received their first electricity bill for the machine and
> unsuprisingly have had a nasty shock! They are now desperate to find ways
> to keep the bill as low as possible.
>
>  One solution put forward has been to have nodes shut themselves down
> using APM when not in use. Then when a node is needed for a job, it could
> be switched back on via wake-on-LAN on the ethernet card. I see a number
of
> problems associated with this:
>
> 1) APM is not supported under Linux 2.2 for SMP. However I believe that it
>    is for 2.4 - can anyone comment on this?
> 2) Wake-on-LAN - I'm not 100% clear on whether this listens for a specific
>    packet or whether it will just fire the machine up if a packet comes
>    along with the NICs MAC address. If the later is the case I think we
>    are snookered since we use PBS as the queueing system which I believe
>    sends out packets to query nodes every now and then.
>
>  Before I spend more time delving deeper into these problems, has anyone
> ever attempted to try to do all of this? If so, what are the perils and
> pitfalls? Is this a completely crazy idea?
>
> Thanks
>
> Brian.
>
> --
> ---------------------------------------------------------------------
>  Brian Marsden          Email: marsden at scripps.edu
>  TSRI, San Diego, USA.  Phone: +1 858 784 8698  Fax: +1 858 784 8299
> ---------------------------------------------------------------------
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


From James.P.Lux at jpl.nasa.gov  Tue Mar  6 12:19:07 2001
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 6 Mar 2001 12:19:07 -0800
Subject: high physical density cluster design -structural...
Message-ID: <00a001c0a67a$b2744880$61064f89@cerulean.jpl.nasa.gov>

>
>> 1) UniStrut (available in aluminum and in galv steel) is much stronger,
has
>> nice 90 degree connectors, etc.  There are a variety of similar products
>> made from aluminum extrusions of one kind or another with longitudinal
slots
>> that make very nice rigid boxes. You assemble it with captive nuts and
>> bolts. The best thing about these products is that they are rectangular,
not
>> round, which makes attaching stuff much easier.
>
>Hmm, this stuff looks really great - and they seem to be somewhat local
>to me. :) Looks like it might not be that cheap however, even if it is
>'cheap' for industrial applications. Wonder if I can find prices online
>somewhere here...

Home Depot carries it (sometimes).  As for pricing, galvanized or painted
steel is VERY cheap compared to anything copper.  Any decent electrical
contractor supply place will carry it, for sure in galv steel, often in
aluminum.  You CAN also get it scrap if you are into scrounging at large
construction sites.  Find the lead electrician...


>
>
>True. I dont have a scale, but the board with CPU and ram is about 1.5 or
>2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
>boards. (Need to figure out if I am going to double up the mainboards
>per powersupply, would save alot of weight).


Don't forget the weight of all the power cords, and the mounting hardware
for everything.  As you say, when you've got 50 widgets, a few ounces here
or there really add up.


From davidgrant at mediaone.net  Tue Mar  6 12:12:42 2001
From: davidgrant at mediaone.net (David Grant)
Date: Tue, 6 Mar 2001 15:12:42 -0500
Subject: high physical density cluster design -structural...
References: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov> <20010306144904.F1196@velocet.ca>
Message-ID: <011701c0a679$d5c53a20$954f1e42@ne.mediaone.net>

This has been an interesting thread, but I do have a concern about
appropriate cooling  with "homegrown" 1U chassis.  Yes, you can build a box
the will physically support the hardware in a 1U form factor.  My concern
would be long term, and not so long term heat related failures on CPU's
and/or disk drives.....

just my .02


David A. Grant,  V.P. Cluster Technologies
GSH Intelligent Integrated Systems
95 Fairmount St. Fitchburg Ma 01450
Phone 603.898.9717       Fax 603.898.9719
Email: davidg at gshiis.com      Web: www.gshiis.com
"Providing High Performance Computing Solutions for Over a Decade"
----- Original Message -----
From: "Velocet" <mathboy at velocet.ca>
To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>; <rutile at fixy.org>;
<bcrl at kvack.org>
Cc: <beowulf at beowulf.org>
Sent: Tuesday, March 06, 2001 2:49 PM
Subject: Re: high physical density cluster design -structural...


> On Tue, Mar 06, 2001 at 11:23:37AM -0800, Jim Lux's all...
> > Rather than the copper pipe and fittings (which isn't very structural,
and
> > will be a pretty significant problem as it gets bigger), you might want
to
> > look at some alternatives:
>
> Ya, were convening with a few people who've done some work with metal
> as well as piping this week to go over a few other cheap options. The
> cluster will be 48 to 64 nodes depending on pricing of other materials,
> network switches, etc. 48 nodes will be 2 stacks of 24, and at 1U per,
> thats only 3.5' tall. So we dont need something thats bombproof,
> just sturdy.
>
> > 1) UniStrut (available in aluminum and in galv steel) is much stronger,
has
> > nice 90 degree connectors, etc.  There are a variety of similar products
> > made from aluminum extrusions of one kind or another with longitudinal
slots
> > that make very nice rigid boxes. You assemble it with captive nuts and
> > bolts. The best thing about these products is that they are rectangular,
not
> > round, which makes attaching stuff much easier.
>
> Hmm, this stuff looks really great - and they seem to be somewhat local
> to me. :) Looks like it might not be that cheap however, even if it is
> 'cheap' for industrial applications. Wonder if I can find prices online
> somewhere here...
>
> > 2) Speedrail - a brand of cast aluminum fittings that works with
aluminum
> > tubing to make structures, etc. (and hand and safety railings...)
There
> > are other brands, as well.  There are versions for 2" and 1" tubing, at
> > least.  The tubing fits into the socket on the fitting, and you tighten
set
> > screws to hold it together.  (Or you can epoxy it....).  For a given $$,
the
> > aluminum tubing will be much stronger and more rigid than the copper
tubing.
> >
> >
> > As far as design guidelines go,  a 0.6 g side load, or so, would be an
> > appropriate number.  For instance, you should build it strong enough so
that
> > you can (gently) tip it over on it's side and not have it fall apart
during
> > the move.  In even a small earthquake, poorly braced sheet metal racks
> > loaded with many pounds of equipment just crumple.  Especially on less
> > expensive racking, a lot of the strength depends on the sides not
buckling,
> > and once it bends even a little bit, it just caves in.
> >
> > After all, some day, you WILL have to move the rack a bit, even if only
a
> > few feet to let them take up the tile underneath it.
>
> True. I dont have a scale, but the board with CPU and ram is about 1.5 or
> 2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
> boards. (Need to figure out if I am going to double up the mainboards
> per powersupply, would save alot of weight).
>
> Thanks for the pointers!
>
> /kc
>
> >
> > >> >
> > >> > Problem 1
> > >> > """""""""
> > >> > The problem is in the diagram above, the upside down board has
another
> > >> board
> > >> > .5" above it - are these two boards going to leak RF like mad and
> > >> interefere
> > >> > with eachothers' operations? I assume there's not much to do there
but
> > to
> > >> put
> > >> > a layer of grounded (to the cabinet) metal in between.  This will
drive
> > up
> > >> the
> > >> > cabinet construction costs. I'd rather avoid this if possible.
> > >> >
> > >> > Our original construction was going to be copper pipe and
plexiglass
> > >> sheeting,
> > >> > but we're not sure that this will be viable for something that
could be
> > >> rather
> > >> > tall in our future revisions of our model. Then again, copper pipe
can
> > be
> > >> > bolted to our (cement) ceiling and floor for support.
> > >> >
> >
> >
>
> --
> Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
CANADA
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From James.P.Lux at jpl.nasa.gov  Tue Mar  6 12:39:15 2001
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Tue, 6 Mar 2001 12:39:15 -0800
Subject: high physical density cluster design -structural...
Message-ID: <00b501c0a67d$82c909b0$61064f89@cerulean.jpl.nasa.gov>

I fully agree with David's comments about thermal issues probably being the
big problem here.  The structural issue really isn't a big problem. Almost
any material can work for actually holding the boards (wood, plastic,
copper, steel, fiberglass, titanium, depleted uranium, etc.).  Likewise,
you're probably going to be stuck with N/2 power supplies, at best.

Getting the heat out, reliably, is going to be the BIG problem with any
homebrew tight packaging.  Commercial manufacturers do actually spend a fair
amount of money doing testing and going through lots of revisions (or they
copy a proven design from someone else...) to keep temperatures reasonable
with very limited air flow (lower flow fans are quieter and cheaper, so
there is an economic incentive for low flow).

I have had very bad luck with operating consumer gear in even mildly
elevated room temperatures (say, 35-40C).  For a grin, check out the
specified operating temperatures on a laptop (typ, 30C max), and then
compare that to the temperature in, for instance, a car sitting in the sun
(40-50C): clearly, they only expect laptops to be used on a desk in an
office).  Consumer gear has very small design margin.  They realize that
most people will be operating it in a reasonably air conditioned office or
house, and that if it gets too hot to sit in the room, they'll turn the PC
off.  And, if they get a few failures.. oh well, you're used to rebooting
from BSOD anyway. Remember, this is an industry that is positively PROUD of
getting their DOA rates down below 3-5%.  Hmm.. 64 computers, 5% DOA means 3
dead computers, out of the box....

I have been working out designs for a compact transportable cluster for
field use, and getting the heat out is the single problem causing the most
trouble.  in this context, I am only looking at 6-8 mobos in a package.
Vibe, shock, and power all have fairly straightforward solutions.  Heat does
not, especially if the air temp you are working in is 40C and 85% RH


-----Original Message-----
From: David Grant <davidgrant at mediaone.net>
To: Velocet <mathboy at velocet.ca>; Jim Lux <James.P.Lux at jpl.nasa.gov>;
rutile at fixy.org <rutile at fixy.org>; bcrl at kvack.org <bcrl at kvack.org>
Cc: beowulf at beowulf.org <beowulf at beowulf.org>
Date: Tuesday, March 06, 2001 12:18 PM
Subject: Re: high physical density cluster design -structural...


>This has been an interesting thread, but I do have a concern about
>appropriate cooling  with "homegrown" 1U chassis.  Yes, you can build a box
>the will physically support the hardware in a 1U form factor.  My concern
>would be long term, and not so long term heat related failures on CPU's
>and/or disk drives.....
>
>just my .02
>
>


From mathboy at velocet.ca  Tue Mar  6 12:33:02 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 15:33:02 -0500
Subject: high physical density cluster design -structural...
In-Reply-To: <011701c0a679$d5c53a20$954f1e42@ne.mediaone.net>; from davidgrant@mediaone.net on Tue, Mar 06, 2001 at 03:12:42PM -0500
References: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov> <20010306144904.F1196@velocet.ca> <011701c0a679$d5c53a20$954f1e42@ne.mediaone.net>
Message-ID: <20010306153302.K1196@velocet.ca>

On Tue, Mar 06, 2001 at 03:12:42PM -0500, David Grant's all...
> This has been an interesting thread, but I do have a concern about
> appropriate cooling  with "homegrown" 1U chassis.  Yes, you can build a box
> the will physically support the hardware in a 1U form factor.  My concern
> would be long term, and not so long term heat related failures on CPU's
> and/or disk drives.....
> 
> just my .02

agreed, totally. We have no hardrives, which are far more stressed by hot
environments considering the moving parts and what not. The cpu I am not so
worried about, since electron migration path burning takes a long time to
occur at high temperatures. Nonetheless, we are going to keep things very
cool. I intend to remain several (10? 15?) degrees C lower than the max
operational temp (where we start seeing machines crash). With Durons I've seen
this around 45C. We have full access to a VERY large airflow direct from the
Liebert (as I said, an 8" pipe with a 20 or 30 mph (my guess) cold airflow
coming out of it - which comes to something like 2500 to 3500 CFM of air at
67F) so I think with careful construction we'll be able to dissipate this
amount of heat.

The liebert system was designed to provide 20 tons of cooling over 4000 sq
feet, but the density of machines installed is lower than we planned for
(customers put in fewer boxes than we expected, its up to them). We can
therefore divert more airflow to this room if we really need to (a few of
the rooms are still completely empty as well).

If anyone thinks that even with these considerations this is foolish,
let me know. :)

I mean, if we have problems we can always just seperate the boards by
larger amounts and move the stacks apart. I am sure 48 or 64 boards
in the one room will be fine - other customers have this density
are fine. Its not even a full degree C warmer in that customers room
than the others (after some adjustment during their installation).

/kc

> 
> 
> David A. Grant,  V.P. Cluster Technologies
> GSH Intelligent Integrated Systems
> 95 Fairmount St. Fitchburg Ma 01450
> Phone 603.898.9717       Fax 603.898.9719
> Email: davidg at gshiis.com      Web: www.gshiis.com
> "Providing High Performance Computing Solutions for Over a Decade"
> ----- Original Message -----
> From: "Velocet" <mathboy at velocet.ca>
> To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>; <rutile at fixy.org>;
> <bcrl at kvack.org>
> Cc: <beowulf at beowulf.org>
> Sent: Tuesday, March 06, 2001 2:49 PM
> Subject: Re: high physical density cluster design -structural...
> 
> 
> > On Tue, Mar 06, 2001 at 11:23:37AM -0800, Jim Lux's all...
> > > Rather than the copper pipe and fittings (which isn't very structural,
> and
> > > will be a pretty significant problem as it gets bigger), you might want
> to
> > > look at some alternatives:
> >
> > Ya, were convening with a few people who've done some work with metal
> > as well as piping this week to go over a few other cheap options. The
> > cluster will be 48 to 64 nodes depending on pricing of other materials,
> > network switches, etc. 48 nodes will be 2 stacks of 24, and at 1U per,
> > thats only 3.5' tall. So we dont need something thats bombproof,
> > just sturdy.
> >
> > > 1) UniStrut (available in aluminum and in galv steel) is much stronger,
> has
> > > nice 90 degree connectors, etc.  There are a variety of similar products
> > > made from aluminum extrusions of one kind or another with longitudinal
> slots
> > > that make very nice rigid boxes. You assemble it with captive nuts and
> > > bolts. The best thing about these products is that they are rectangular,
> not
> > > round, which makes attaching stuff much easier.
> >
> > Hmm, this stuff looks really great - and they seem to be somewhat local
> > to me. :) Looks like it might not be that cheap however, even if it is
> > 'cheap' for industrial applications. Wonder if I can find prices online
> > somewhere here...
> >
> > > 2) Speedrail - a brand of cast aluminum fittings that works with
> aluminum
> > > tubing to make structures, etc. (and hand and safety railings...)
> There
> > > are other brands, as well.  There are versions for 2" and 1" tubing, at
> > > least.  The tubing fits into the socket on the fitting, and you tighten
> set
> > > screws to hold it together.  (Or you can epoxy it....).  For a given $$,
> the
> > > aluminum tubing will be much stronger and more rigid than the copper
> tubing.
> > >
> > >
> > > As far as design guidelines go,  a 0.6 g side load, or so, would be an
> > > appropriate number.  For instance, you should build it strong enough so
> that
> > > you can (gently) tip it over on it's side and not have it fall apart
> during
> > > the move.  In even a small earthquake, poorly braced sheet metal racks
> > > loaded with many pounds of equipment just crumple.  Especially on less
> > > expensive racking, a lot of the strength depends on the sides not
> buckling,
> > > and once it bends even a little bit, it just caves in.
> > >
> > > After all, some day, you WILL have to move the rack a bit, even if only
> a
> > > few feet to let them take up the tile underneath it.
> >
> > True. I dont have a scale, but the board with CPU and ram is about 1.5 or
> > 2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
> > boards. (Need to figure out if I am going to double up the mainboards
> > per powersupply, would save alot of weight).
> >
> > Thanks for the pointers!
> >
> > /kc
> >
> > >
> > > >> >
> > > >> > Problem 1
> > > >> > """""""""
> > > >> > The problem is in the diagram above, the upside down board has
> another
> > > >> board
> > > >> > .5" above it - are these two boards going to leak RF like mad and
> > > >> interefere
> > > >> > with eachothers' operations? I assume there's not much to do there
> but
> > > to
> > > >> put
> > > >> > a layer of grounded (to the cabinet) metal in between.  This will
> drive
> > > up
> > > >> the
> > > >> > cabinet construction costs. I'd rather avoid this if possible.
> > > >> >
> > > >> > Our original construction was going to be copper pipe and
> plexiglass
> > > >> sheeting,
> > > >> > but we're not sure that this will be viable for something that
> could be
> > > >> rather
> > > >> > tall in our future revisions of our model. Then again, copper pipe
> can
> > > be
> > > >> > bolted to our (cement) ceiling and floor for support.
> > > >> >
> > >
> > >
> >
> > --
> > Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
> CANADA
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From Lechner at drs-esg.com  Tue Mar  6 12:33:50 2001
From: Lechner at drs-esg.com (Lechner, David)
Date: Tue, 6 Mar 2001 15:33:50 -0500 
Subject: high physical density cluster design -structural...
Message-ID: <D6F1CB2A6FD3D211A0AD00A0C995F320012C97F2@mercury.tas.drs.com>

I'll throw in that there are some valid liability concerns and Underwriters
Lab issues and insurance policy issues that people building their own
hardware should think about - 
There are greater risks of electrical fires when dealing with plexiglass and
plywood casing for electrical products - esp. ones that get hot spots and PC
processors and such - 
I would hate to have to explain to a corporate or university financial
manager after-the-fact when 50 PCs in plywood or plastic cases were mounted
in a high density home-grown arrangement caught on fire or worse-yet that
fire injured someone or electricuted someone.
That bunch of UL and NEBS tests are what the insurance industry falls back
onto, and is why the products are slightly more expensive than spin-your-own
boxes.  I think that university teams in particular are given some
flexibility (in some cases not!) but since the market for cluster products
has evolved now it seems worth considering.

This also does not count the personal or project time factors involved - are
you in the hardware business or do you want to get code running - there are
qualified hardware vendors that can provide hardware to specification or in
vanilla flavor at reasonably quick turnarounds - 

Best of luck in the project though - 
Rave Lechner

-----Original Message-----
From: David Grant [mailto:davidgrant at mediaone.net]
Sent: Tuesday, March 06, 2001 3:13 PM
To: Velocet; Jim Lux; rutile at fixy.org; bcrl at kvack.org
Cc: beowulf at beowulf.org
Subject: Re: high physical density cluster design -structural...


This has been an interesting thread, but I do have a concern about
appropriate cooling  with "homegrown" 1U chassis.  Yes, you can build a box
the will physically support the hardware in a 1U form factor.  My concern
would be long term, and not so long term heat related failures on CPU's
and/or disk drives.....

just my .02


David A. Grant,  V.P. Cluster Technologies
GSH Intelligent Integrated Systems
95 Fairmount St. Fitchburg Ma 01450
Phone 603.898.9717       Fax 603.898.9719
Email: davidg at gshiis.com      Web: www.gshiis.com
"Providing High Performance Computing Solutions for Over a Decade"
----- Original Message -----
From: "Velocet" <mathboy at velocet.ca>
To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>; <rutile at fixy.org>;
<bcrl at kvack.org>
Cc: <beowulf at beowulf.org>
Sent: Tuesday, March 06, 2001 2:49 PM
Subject: Re: high physical density cluster design -structural...


> On Tue, Mar 06, 2001 at 11:23:37AM -0800, Jim Lux's all...
> > Rather than the copper pipe and fittings (which isn't very structural,
and
> > will be a pretty significant problem as it gets bigger), you might want
to
> > look at some alternatives:
>
> Ya, were convening with a few people who've done some work with metal
> as well as piping this week to go over a few other cheap options. The
> cluster will be 48 to 64 nodes depending on pricing of other materials,
> network switches, etc. 48 nodes will be 2 stacks of 24, and at 1U per,
> thats only 3.5' tall. So we dont need something thats bombproof,
> just sturdy.
>
> > 1) UniStrut (available in aluminum and in galv steel) is much stronger,
has
> > nice 90 degree connectors, etc.  There are a variety of similar products
> > made from aluminum extrusions of one kind or another with longitudinal
slots
> > that make very nice rigid boxes. You assemble it with captive nuts and
> > bolts. The best thing about these products is that they are rectangular,
not
> > round, which makes attaching stuff much easier.
>
> Hmm, this stuff looks really great - and they seem to be somewhat local
> to me. :) Looks like it might not be that cheap however, even if it is
> 'cheap' for industrial applications. Wonder if I can find prices online
> somewhere here...
>
> > 2) Speedrail - a brand of cast aluminum fittings that works with
aluminum
> > tubing to make structures, etc. (and hand and safety railings...)
There
> > are other brands, as well.  There are versions for 2" and 1" tubing, at
> > least.  The tubing fits into the socket on the fitting, and you tighten
set
> > screws to hold it together.  (Or you can epoxy it....).  For a given $$,
the
> > aluminum tubing will be much stronger and more rigid than the copper
tubing.
> >
> >
> > As far as design guidelines go,  a 0.6 g side load, or so, would be an
> > appropriate number.  For instance, you should build it strong enough so
that
> > you can (gently) tip it over on it's side and not have it fall apart
during
> > the move.  In even a small earthquake, poorly braced sheet metal racks
> > loaded with many pounds of equipment just crumple.  Especially on less
> > expensive racking, a lot of the strength depends on the sides not
buckling,
> > and once it bends even a little bit, it just caves in.
> >
> > After all, some day, you WILL have to move the rack a bit, even if only
a
> > few feet to let them take up the tile underneath it.
>
> True. I dont have a scale, but the board with CPU and ram is about 1.5 or
> 2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
> boards. (Need to figure out if I am going to double up the mainboards
> per powersupply, would save alot of weight).
>
> Thanks for the pointers!
>
> /kc
>
> >
> > >> >
> > >> > Problem 1
> > >> > """""""""
> > >> > The problem is in the diagram above, the upside down board has
another
> > >> board
> > >> > .5" above it - are these two boards going to leak RF like mad and
> > >> interefere
> > >> > with eachothers' operations? I assume there's not much to do there
but
> > to
> > >> put
> > >> > a layer of grounded (to the cabinet) metal in between.  This will
drive
> > up
> > >> the
> > >> > cabinet construction costs. I'd rather avoid this if possible.
> > >> >
> > >> > Our original construction was going to be copper pipe and
plexiglass
> > >> sheeting,
> > >> > but we're not sure that this will be viable for something that
could be
> > >> rather
> > >> > tall in our future revisions of our model. Then again, copper pipe
can
> > be
> > >> > bolted to our (cement) ceiling and floor for support.
> > >> >
> >
> >
>
> --
> Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
CANADA
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From mathboy at velocet.ca  Tue Mar  6 12:45:30 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 15:45:30 -0500
Subject: high physical density cluster design -structural...
In-Reply-To: <D6F1CB2A6FD3D211A0AD00A0C995F320012C97F2@mercury.tas.drs.com>; from Lechner@drs-esg.com on Tue, Mar 06, 2001 at 03:33:50PM -0500
References: <D6F1CB2A6FD3D211A0AD00A0C995F320012C97F2@mercury.tas.drs.com>
Message-ID: <20010306154530.M1196@velocet.ca>

On Tue, Mar 06, 2001 at 03:33:50PM -0500, Lechner, David's all...
> I'll throw in that there are some valid liability concerns and Underwriters
> Lab issues and insurance policy issues that people building their own
> hardware should think about - 

> There are greater risks of electrical fires when dealing with plexiglass and
> plywood casing for electrical products - esp. ones that get hot spots and PC
> processors and such - 

> I would hate to have to explain to a corporate or university financial
> manager after-the-fact when 50 PCs in plywood or plastic cases were mounted
> in a high density home-grown arrangement caught on fire or worse-yet that
> fire injured someone or electricuted someone.

> That bunch of UL and NEBS tests are what the insurance industry falls back
> onto, and is why the products are slightly more expensive than spin-your-own
> boxes.  I think that university teams in particular are given some
> flexibility (in some cases not!) but since the market for cluster products
> has evolved now it seems worth considering.
 
> This also does not count the personal or project time factors involved - are
> you in the hardware business or do you want to get code running - there are
> qualified hardware vendors that can provide hardware to specification or in
> vanilla flavor at reasonably quick turnarounds - 

Good points. The plexiglass is something we've considered here also for
static electricity issues. Thats part of the reason we're looking to
go all metal. We WONT be using plywood at all - basically for the reason
you state above, but also because in our colo facility wood isnt allowed
at all (we even limit the amount of paper manuals per suite).

To kill both the static electricity and fire safety issues with the plexiglass 
we will probably just not bother with it at all.

All we have to worry about is shorting out the boards on the metal mounts. :)

/kc


> Best of luck in the project though - 
> Rave Lechner
> 
> -----Original Message-----
> From: David Grant [mailto:davidgrant at mediaone.net]
> Sent: Tuesday, March 06, 2001 3:13 PM
> To: Velocet; Jim Lux; rutile at fixy.org; bcrl at kvack.org
> Cc: beowulf at beowulf.org
> Subject: Re: high physical density cluster design -structural...
> 
> 
> This has been an interesting thread, but I do have a concern about
> appropriate cooling  with "homegrown" 1U chassis.  Yes, you can build a box
> the will physically support the hardware in a 1U form factor.  My concern
> would be long term, and not so long term heat related failures on CPU's
> and/or disk drives.....
> 
> just my .02
> 
> 
> David A. Grant,  V.P. Cluster Technologies
> GSH Intelligent Integrated Systems
> 95 Fairmount St. Fitchburg Ma 01450
> Phone 603.898.9717       Fax 603.898.9719
> Email: davidg at gshiis.com      Web: www.gshiis.com
> "Providing High Performance Computing Solutions for Over a Decade"
> ----- Original Message -----
> From: "Velocet" <mathboy at velocet.ca>
> To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>; <rutile at fixy.org>;
> <bcrl at kvack.org>
> Cc: <beowulf at beowulf.org>
> Sent: Tuesday, March 06, 2001 2:49 PM
> Subject: Re: high physical density cluster design -structural...
> 
> 
> > On Tue, Mar 06, 2001 at 11:23:37AM -0800, Jim Lux's all...
> > > Rather than the copper pipe and fittings (which isn't very structural,
> and
> > > will be a pretty significant problem as it gets bigger), you might want
> to
> > > look at some alternatives:
> >
> > Ya, were convening with a few people who've done some work with metal
> > as well as piping this week to go over a few other cheap options. The
> > cluster will be 48 to 64 nodes depending on pricing of other materials,
> > network switches, etc. 48 nodes will be 2 stacks of 24, and at 1U per,
> > thats only 3.5' tall. So we dont need something thats bombproof,
> > just sturdy.
> >
> > > 1) UniStrut (available in aluminum and in galv steel) is much stronger,
> has
> > > nice 90 degree connectors, etc.  There are a variety of similar products
> > > made from aluminum extrusions of one kind or another with longitudinal
> slots
> > > that make very nice rigid boxes. You assemble it with captive nuts and
> > > bolts. The best thing about these products is that they are rectangular,
> not
> > > round, which makes attaching stuff much easier.
> >
> > Hmm, this stuff looks really great - and they seem to be somewhat local
> > to me. :) Looks like it might not be that cheap however, even if it is
> > 'cheap' for industrial applications. Wonder if I can find prices online
> > somewhere here...
> >
> > > 2) Speedrail - a brand of cast aluminum fittings that works with
> aluminum
> > > tubing to make structures, etc. (and hand and safety railings...)
> There
> > > are other brands, as well.  There are versions for 2" and 1" tubing, at
> > > least.  The tubing fits into the socket on the fitting, and you tighten
> set
> > > screws to hold it together.  (Or you can epoxy it....).  For a given $$,
> the
> > > aluminum tubing will be much stronger and more rigid than the copper
> tubing.
> > >
> > >
> > > As far as design guidelines go,  a 0.6 g side load, or so, would be an
> > > appropriate number.  For instance, you should build it strong enough so
> that
> > > you can (gently) tip it over on it's side and not have it fall apart
> during
> > > the move.  In even a small earthquake, poorly braced sheet metal racks
> > > loaded with many pounds of equipment just crumple.  Especially on less
> > > expensive racking, a lot of the strength depends on the sides not
> buckling,
> > > and once it bends even a little bit, it just caves in.
> > >
> > > After all, some day, you WILL have to move the rack a bit, even if only
> a
> > > few feet to let them take up the tile underneath it.
> >
> > True. I dont have a scale, but the board with CPU and ram is about 1.5 or
> > 2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
> > boards. (Need to figure out if I am going to double up the mainboards
> > per powersupply, would save alot of weight).
> >
> > Thanks for the pointers!
> >
> > /kc
> >
> > >
> > > >> >
> > > >> > Problem 1
> > > >> > """""""""
> > > >> > The problem is in the diagram above, the upside down board has
> another
> > > >> board
> > > >> > .5" above it - are these two boards going to leak RF like mad and
> > > >> interefere
> > > >> > with eachothers' operations? I assume there's not much to do there
> but
> > > to
> > > >> put
> > > >> > a layer of grounded (to the cabinet) metal in between.  This will
> drive
> > > up
> > > >> the
> > > >> > cabinet construction costs. I'd rather avoid this if possible.
> > > >> >
> > > >> > Our original construction was going to be copper pipe and
> plexiglass
> > > >> sheeting,
> > > >> > but we're not sure that this will be viable for something that
> could be
> > > >> rather
> > > >> > tall in our future revisions of our model. Then again, copper pipe
> can
> > > be
> > > >> > bolted to our (cement) ceiling and floor for support.
> > > >> >
> > >
> > >
> >
> > --
> > Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
> CANADA
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From mathboy at velocet.ca  Tue Mar  6 12:51:10 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 15:51:10 -0500
Subject: Power-managment of slave nodes
In-Reply-To: <001901c0a679$688871c0$3201a8c0@act>; from ksheumaker@advancedclustering.com on Tue, Mar 06, 2001 at 02:09:53PM -0600
References: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu> <001901c0a679$688871c0$3201a8c0@act>
Message-ID: <20010306155110.P1196@velocet.ca>

On Tue, Mar 06, 2001 at 02:09:53PM -0600, Kyle Sheumaker's all...
> I have played with wake-on-lan and not had a lot of luck.  It is a special
> packet (http://www.scyld.com/expert/wake-on-lan.html), but some motherboards
> / nics just don't seem to want to "wake."
> 
> They aren't cheap but I would suggest something like the APC masterswitch,
> it's a network controllable power switch
> (http://www.apc.com/products/masterswitch/index.cfm).  I've used them before
> their pretty cool, web, telnet, and SNMP controllable.

Could also try wake on MODEM as well - just write up a DC9 connector
for the serial port, and tweak whatever pin it is that it listens on
(could be RING or CD pins or both). Alot of BIOSes support wake on modem.

Could also set it to wake on serial mouse, same idea. Just tweak the 
RX pin - my Windoze box at home wakes when I hit the mouse, so I know
it must be doing something like that. The mouse still has power enough
to send a signal back to the board and the system wakes. If its not
in full 'power off' but just in standby, then all the better - standby
mode might be low power enough for what you are looking for, if it
actually ramps down the MHZ of the CPU and all that. (I know my
power supply fan isnt on when its in standby mode, so it must be
running cool enough to not need it).

/kc

> 
> -- Kyle
> 
> 
> ----- Original Message -----
> From: "Brian Marsden" <marsden at scripps.edu>
> To: <beowulf at beowulf.org>
> Sent: Tuesday, March 06, 2001 11:39 AM
> Subject: Power-managment of slave nodes
> 
> 
> > Dear all,
> >
> >  I look after a 96 processor, 48 node, Pentium III linux cluster. The
> > owners have just received their first electricity bill for the machine and
> > unsuprisingly have had a nasty shock! They are now desperate to find ways
> > to keep the bill as low as possible.
> >
> >  One solution put forward has been to have nodes shut themselves down
> > using APM when not in use. Then when a node is needed for a job, it could
> > be switched back on via wake-on-LAN on the ethernet card. I see a number
> of
> > problems associated with this:
> >
> > 1) APM is not supported under Linux 2.2 for SMP. However I believe that it
> >    is for 2.4 - can anyone comment on this?
> > 2) Wake-on-LAN - I'm not 100% clear on whether this listens for a specific
> >    packet or whether it will just fire the machine up if a packet comes
> >    along with the NICs MAC address. If the later is the case I think we
> >    are snookered since we use PBS as the queueing system which I believe
> >    sends out packets to query nodes every now and then.
> >
> >  Before I spend more time delving deeper into these problems, has anyone
> > ever attempted to try to do all of this? If so, what are the perils and
> > pitfalls? Is this a completely crazy idea?
> >
> > Thanks
> >
> > Brian.
> >
> > --
> > ---------------------------------------------------------------------
> >  Brian Marsden          Email: marsden at scripps.edu
> >  TSRI, San Diego, USA.  Phone: +1 858 784 8698  Fax: +1 858 784 8299
> > ---------------------------------------------------------------------
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> >
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From hahn at coffee.psychology.mcmaster.ca  Tue Mar  6 12:52:29 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Tue, 6 Mar 2001 15:52:29 -0500 (EST)
Subject: Power-managment of slave nodes
In-Reply-To: <3AA539A5.D698E3F6@att.net>
Message-ID: <Pine.LNX.4.10.10103061550240.24950-100000@coffee.psychology.mcmaster.ca>

> > 1) APM is not supported under Linux 2.2 for SMP. However I believe that it
> >    is for 2.4 - can anyone comment on this?
> 
> Check the help when configuring.  "Note that the APM support is almost
> completely disabled for machines with more than one CPU."

indeed APM is non-SMP-friendly.  but ACPI should work.


From jerosejr at cajunbro.com  Tue Mar  6 12:58:07 2001
From: jerosejr at cajunbro.com (Joseph E. Rose, Jr.)
Date: Tue, 6 Mar 2001 14:58:07 -0600
Subject: high physical density cluster design -structural...
In-Reply-To: <00a001c0a67a$b2744880$61064f89@cerulean.jpl.nasa.gov>
Message-ID: <000901c0a680$25ab9380$c015a8c0@cajunbro.int>

At the Home Depot near us, you can get a 1 1/2" x 96" 90d angle aluminum for
11.00.  then I used a piece of 1" x 3/16" aluminum bar for center support.


-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Jim Lux
Sent: Tuesday, March 06, 2001 2:19 PM
To: Velocet; Beowulf (E-mail)
Subject: Re: high physical density cluster design -structural...


>
>> 1) UniStrut (available in aluminum and in galv steel) is much stronger,
has
>> nice 90 degree connectors, etc.  There are a variety of similar products
>> made from aluminum extrusions of one kind or another with longitudinal
slots
>> that make very nice rigid boxes. You assemble it with captive nuts and
>> bolts. The best thing about these products is that they are rectangular,
not
>> round, which makes attaching stuff much easier.
>
>Hmm, this stuff looks really great - and they seem to be somewhat local
>to me. :) Looks like it might not be that cheap however, even if it is
>'cheap' for industrial applications. Wonder if I can find prices online
>somewhere here...

Home Depot carries it (sometimes).  As for pricing, galvanized or painted
steel is VERY cheap compared to anything copper.  Any decent electrical
contractor supply place will carry it, for sure in galv steel, often in
aluminum.  You CAN also get it scrap if you are into scrounging at large
construction sites.  Find the lead electrician...


>
>
>True. I dont have a scale, but the board with CPU and ram is about 1.5 or
>2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
>boards. (Need to figure out if I am going to double up the mainboards
>per powersupply, would save alot of weight).


Don't forget the weight of all the power cords, and the mounting hardware
for everything.  As you say, when you've got 50 widgets, a few ounces here
or there really add up.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From mas at ucla.edu  Tue Mar  6 13:03:21 2001
From: mas at ucla.edu (Michael Stein)
Date: Tue, 6 Mar 2001 13:03:21 -0800
Subject: Power-managment of slave nodes
In-Reply-To: <001901c0a679$688871c0$3201a8c0@act>; from Kyle Sheumaker on Tue, Mar 06, 2001 at 02:09:53PM -0600
References: <Pine.LNX.4.30.0103060926570.14513-100000@eurus.scripps.edu> <001901c0a679$688871c0$3201a8c0@act>
Message-ID: <20010306130321.A2151@mas1.oac.ucla.edu>

On Tue, Mar 06, 2001 at 02:09:53PM -0600, Kyle Sheumaker wrote:
> I have played with wake-on-lan and not had a lot of luck.  It is a special
> packet (http://www.scyld.com/expert/wake-on-lan.html), but some motherboards
> / nics just don't seem to want to "wake."
> 
> They aren't cheap but I would suggest something like the APC masterswitch,
> it's a network controllable power switch
> (http://www.apc.com/products/masterswitch/index.cfm).  I've used them before
> their pretty cool, web, telnet, and SNMP controllable.

I haven't tried this but have had the idea to connect a optical isolator
output directly to the "power" button wires.  Then the isolator could
be driven by any logic signal I wanted with a simple series resistor.
The neighboring nodes printer ports come to mind.  A few diodes for
or logic could allow more than one node to control each node's power.
Minimum hardware, all the rest would be software (A TCL display of nodes
with click to power up?).

A few more isolators would allow sensing power status and reseting
(without powering off) nodes.

This probably requires normal ATX type power supplies....


From larry at pssclabs.com  Tue Mar  6 13:23:16 2001
From: larry at pssclabs.com (Larry Lesser)
Date: Tue, 06 Mar 2001 13:23:16 -0800
Subject: Real Time
Message-ID: <4.3.2.20010306132041.00b87390@pop.pssclabs.com>

Hello:

I am trying to find out if anyone has built a Beowulf on Mac G4s with a 
real time operating system, MPI (any flavor) and Myrinet?

Thanks,

Larry Lesser
=====================================
Larry Lesser
PSSC Labs
voice: (949) 380-7288
fax: (949) 380-9788
larry at pssclabs.com

http://www.pssclabs.com
=====================================


From JParker at coinstar.com  Tue Mar  6 13:44:00 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Tue, 6 Mar 2001 13:44:00 -0800
Subject: high physical density cluster design -structural...
Message-ID: <OFD52C0A2C.9480030E-ON08256A07.0076546C@coinstar.com>

G'Day !

Cost of aluminum at a retail stores adds up fast .... and welding is tricky as
you need an inert atmosphere.  If you use mechanical fastners, you need to
machine holes (punch, drill press, etc.).  OTOH, we have made some very nice
cabinets with riveted aluminum and angle extrusions ....

surplus steel (3/4" box is ideal) can be very cheap and you used to be able to
get a High School shop class to do the welding for free ... They might have the
stuff need for sheetmetal work also.

Do make sure your finished set-up is properly grounded.  We used to have a ship
board electrician named "3 finger Harry" ...

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


                    "Joseph E.                                                                                                      
                    Rose, Jr."             To:     "'Jim Lux'" <James.P.Lux at jpl.nasa.gov>, "'Velocet'" <mathboy at velocet.ca>,        
                    <jerosejr at cajun        "'Beowulf \(E-mail\)'" <beowulf at beowulf.org>                                             
                    bro.com>               cc:                                                                                      
                    Sent by:               Subject:     RE: high physical density cluster design -structural...                     
                    beowulf-admin at b                                                                                                 
                    eowulf.org                                                                                                      
                                                                                                                                    
                                                                                                                                    
                    03/06/01 12:58                                                                                                  
                    PM                                                                                                              
                    Please respond                                                                                                  
                    to jerosejr                                                                                                     
                                                                                                                                    
                                                                                                                                    
At the Home Depot near us, you can get a 1 1/2" x 96" 90d angle aluminum for
11.00.  then I used a piece of 1" x 3/16" aluminum bar for center support.


-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Jim Lux
Sent: Tuesday, March 06, 2001 2:19 PM
To: Velocet; Beowulf (E-mail)
Subject: Re: high physical density cluster design -structural...


>
>> 1) UniStrut (available in aluminum and in galv steel) is much stronger,
has
>> nice 90 degree connectors, etc.  There are a variety of similar products
>> made from aluminum extrusions of one kind or another with longitudinal
slots
>> that make very nice rigid boxes. You assemble it with captive nuts and
>> bolts. The best thing about these products is that they are rectangular,
not
>> round, which makes attaching stuff much easier.
>
>Hmm, this stuff looks really great - and they seem to be somewhat local
>to me. :) Looks like it might not be that cheap however, even if it is
>'cheap' for industrial applications. Wonder if I can find prices online
>somewhere here...

Home Depot carries it (sometimes).  As for pricing, galvanized or painted
steel is VERY cheap compared to anything copper.  Any decent electrical
contractor supply place will carry it, for sure in galv steel, often in
aluminum.  You CAN also get it scrap if you are into scrounging at large
construction sites.  Find the lead electrician...


>
>
>True. I dont have a scale, but the board with CPU and ram is about 1.5 or
>2lbs, and the power supply is 2-3lbs. That adds up with 48 or 64 odd
>boards. (Need to figure out if I am going to double up the mainboards
>per powersupply, would save alot of weight).


Don't forget the weight of all the power cords, and the mounting hardware
for everything.  As you say, when you've got 50 widgets, a few ounces here
or there really add up.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Tue Mar  6 14:46:52 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 6 Mar 2001 17:46:52 -0500 (EST)
Subject: redhat 7.0 upgrade woes
In-Reply-To: <Pine.GSO.3.96.1010306112358.15108A-100000@research.amnh.org>
Message-ID: <Pine.LNX.4.30.0103061742040.28286-100000@ganesh.phy.duke.edu>

On Tue, 6 Mar 2001, Jeffrey Oishi wrote:

> Hi--
>
> I'm trying to upgrade a 130 node cluster of machines with no video cards
> from RH6.1 to RH7.0. I have created an nfs-method kickstart system that
> works--if a video card is in the machine. If not, and I add
> console=ttyS0,9600n8 to the SYSLINUX.CFG, then the installer runs ok and
> starts spitting stuff out the serial port. However, the installer then
> crashes right before it starts upgrading the packages. It happily works up
> until the standard redhat screen showing each of the packages zipping by
> comes up. There it hangs. This has happened on a number of boxes.

Check the age/date of the actual release of RH7 you are using.  I had
somewhat similar problems trying to install an early version of 7 on a
slightly flaky system, although it did have a VGA card (it was still
using the non-X install).  Once the release was updated to current, it
worked fine.

I'm pretty sure kickstart should work, BTW, on a system with no video
and no console at all.  I haven't actually tried it, though, so I won't
swear to it.

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From Chester.Fitch at mdx.com  Tue Mar  6 15:50:13 2001
From: Chester.Fitch at mdx.com (Fitch, Chester)
Date: Tue, 6 Mar 2001 16:50:13 -0700 
Subject: high physical density cluster design -structural...
Message-ID: <19E8BE159FECD4118FE700508BEE12D20C8822@mdx-email1.den.mdx.com>

> Cost of aluminum at a retail stores adds up fast .... and 
> welding is tricky as
> you need an inert atmosphere.  If you use mechanical 
> fastners, you need to
> machine holes (punch, drill press, etc.).  OTOH, we have made 
> some very nice
> cabinets with riveted aluminum and angle extrusions ....
> 
> surplus steel (3/4" box is ideal) can be very cheap and you 
> used to be able to
> get a High School shop class to do the welding for free ... 
> They might have the
> stuff need for sheetmetal work also.

Check around - you can usually find a local surplus/scrap/odd-piece dealer
in aluminum (or steel) who is MUCH cheaper than retail.. Several months back
I was able to obtain ~40 square feet of scrap (but unused) 3/16 inch
aluminum plate for about $80.00 US. (They sell it by weight) He even cut it
for me to my rough measurements for $20 -- saved me a day with a saber saw,
that did... well worth the cost. Ended-up with about 12 shelves for a
cabinet rack - MUCH cheaper than buying standard, prefabricated shelves. Had
I gone retail, I figure the aluminum itself would have cost me about 10x as
much (even if I'd even been able to locate/order it)...  He had lots of
other extruded metal parts cheap, as well... 

Pays to look around..

I would stick with aluminum - steel is much harder to work with, and you
shouldn't be using the case to ground the units anyway... Strength is not
really an issue either, if you use aluminum that is reasonably thick, or
fabricate the case properly. (I assume we're not talking several hundred
pounds here) No need to weld the aluminum - easy enough (most times) to just
drill holes and use extruded shapes and fasteners (bolts and/or rivets) to
fabricate any shape (case) you need.. Get a little creative, and you
wouldn't even need the prefabricated stuff (like UniStrut).. Depends on how
fancy you want to be, and the weight involved, of course. 

I shudder when I hear people talking about using some kind of wood (or even
plastic) for the case: having overloaded my fair share of circuits over the
years, I want NOTHING combustible near such a beast - A system melt-down is
one thing, but what you DON'T want is a case that could catch fire at the
same time -- you could loose the building its' in as well...  Any competent
electrician looking a such a beast would have a fit.  While such a
posibility is admittedly unlikely, why take the chance?  Metal may melt, but
at least it won't catch fire!

(Sounds a bit paranoid, I know, but...  just my $.02)

Good luck! 

Chet Fitch

----
The superior programmer uses his superior judgment to avoid situations
requiring superior skill. (Or the fire department)


From roger at ERC.MsState.Edu  Tue Mar  6 15:59:19 2001
From: roger at ERC.MsState.Edu (Roger L. Smith)
Date: Tue, 6 Mar 2001 17:59:19 -0600
Subject: redhat 7.0 upgrade woes
In-Reply-To: <Pine.LNX.4.30.0103061742040.28286-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.SGI.4.30.0103061756290.991-100000@Downforce.ERC.MsState.Edu>

On Tue, 6 Mar 2001, Robert G. Brown wrote:

> I'm pretty sure kickstart should work, BTW, on a system with no video
> and no console at all.  I haven't actually tried it, though, so I won't
> swear to it.

I just kickstarted 128 nodes with RedHat 7 without so much as a mouse,
keyboard, or monitor running (there was a video card, but no X was
configured).  I had no problems at all.  I just added the "skipx" option
to my kickstart file.

 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith            Phone:662-325-3625      roger at ERC.MsState.Edu |
| Systems Administrator     FAX:  662-325-7692 WWW.ERC.MsState.Edu/~roger |
|-------------------------------------------------------------------------|
|         Mississippi State University/National Science Foundation        |
|______Engineering Research Center for Computational Field Simulation_____|


From newt at scyld.com  Tue Mar  6 16:03:22 2001
From: newt at scyld.com (Daniel Ridge)
Date: Tue, 6 Mar 2001 19:03:22 -0500 (EST)
Subject: Power-managment of slave nodes
In-Reply-To: <20010306130321.A2151@mas1.oac.ucla.edu>
Message-ID: <Pine.LNX.4.21.0103061900410.14929-100000@eleanor.wdhq.scyld.com>

On Tue, 6 Mar 2001, Michael Stein wrote:

> On Tue, Mar 06, 2001 at 02:09:53PM -0600, Kyle Sheumaker wrote:
> Minimum hardware, all the rest would be software (A TCL display of nodes
> with click to power up?).

'beosetup' -- the Beowulf cluster configuration tool distributed with the
Scyld Beowulf release actually features a 'wake up' button when you
go for the right click menu on a node. This sends a WOL magic packet to
the mac address for that node.

This functionality will also be added into Scyld-aware schedulers so that
nodes can be woken up for jobs.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From lindahl at conservativecomputer.com  Tue Mar  6 14:44:47 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Tue, 6 Mar 2001 17:44:47 -0500
Subject: Real Time
In-Reply-To: <4.3.2.20010306132041.00b87390@pop.pssclabs.com>; from larry@pssclabs.com on Tue, Mar 06, 2001 at 01:23:16PM -0800
References: <4.3.2.20010306132041.00b87390@pop.pssclabs.com>
Message-ID: <20010306174447.A6421@wumpus>

On Tue, Mar 06, 2001 at 01:23:16PM -0800, Larry Lesser wrote:

> I am trying to find out if anyone has built a Beowulf on Mac G4s with a 
> real time operating system, MPI (any flavor) and Myrinet?

I believe that several of the vendors in the embedded systems market
sell systems like that.

I'm not so sure that a real time operating system coupled with MPI is
going to do that much good compared to Linux with MPI, since use of
MPI is going to blow away your hard real time guarantees. (Oh, darn,
bad CRC, we have to retransmit that packet...)

-- g


From pbn2au at qwest.net  Tue Mar  6 18:37:18 2001
From: pbn2au at qwest.net (pbn2au)
Date: Tue, 06 Mar 2001 19:37:18 -0700
Subject: Beowulf digest, Vol 1 #304 - 13 msgs
References: <200103060737.CAA06486@blueraja.scyld.com>
Message-ID: <3AA59EDE.E69265A0@qwest.net>

> Dean.Carpenter at pharma.com said:
> > We, like most out there I'm sure, are constrained, by money and by
> > space. We need to get lots of cpus in as small a space as possible.
> > Lots of 1U VA-Linux or SGI boxes would be very cool, but would drain
> > the coffers way too quickly.  Generic motherboards in clone cases is
> > cheap, but takes up too much room.
>
> > So, a colleague and I are working on a cheap and high-density 1U node.
> >  So far it looks like we'll be able to get two dual-CPU (P3)
> > motherboards per 1U chassis, with associated dual-10/100, floppy, CD
> > and one hard drive.  And one PCI slot.  Although it would be nice to
> > have several Ultra160 scsi drives in raid, a generic cluster node (for
> > our uses) will work fine with a single large UDMA-100 ide drive.
>
> > That's 240 cpus per 60U rack.  We're still working on condensed power
> > for the rack, to simplify things.  Note that I said "for our uses"
> > above.  Our design goals here are density and $$$.  Hence some of the
> > niceties are being foresworn - things like hot-swap U160 scsi raid
> > drives, das blinken lights up front, etc.
>
> > So, what do you think ?  If there's interest, I'll keep you posted on
> > our progress.  If there's LOTS of interest, we may make a larger
> > production run to make these available to others.
>
> > -- Dean Carpenter deano at areyes.com dean.carpenter at pharma.com
> > dean.carpenter at purduepharma.com 94TT :)

 Dean, Get rid of the cases!!!! You can put the motherboards together using all- threads. There are a
couple of companies selling 90 degree pci slot adapters, for the nics. By running  2 motherboards on a
regular power supply, using just the nic card, processor and ram, (use boot proms on the nics) you can
get 40 boards in a 5 foot Rack mount. use a shelf every 4 boards to attach the power supply top and
bottom. With a fully enclosed case 8 100 mm fans are sufficient to cool the entire setup. Conversely
if you use 32 boards and a 32 port router/switch you can have nodes on wheels!!

 It may sound nuts, but mine has a truncated version of this setup. using 4 boards I was able to
calculate the needed power for fans and by filling my tower with 36 naked m\boards running full steam,
I calculated the air flow. Yes it sounds rinky-dink but under smoked glass it looks awesome!!

Dave Campbell
Campbell Consulting
Middleton ID 83644
pbn2au at qwest.net


From rauch at inf.ethz.ch  Wed Mar  7 01:18:49 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed, 7 Mar 2001 10:18:49 +0100 (CET)
Subject: Power-managment of slave nodes
In-Reply-To: <001901c0a679$688871c0$3201a8c0@act>
Message-ID: <Pine.LNX.4.21.0103071011210.13576-100000@maloney.inf.ethz.ch>

[adjusted quoting]

On Tue, 6 Mar 2001, Kyle Sheumaker wrote:
> On March 6, Brian Marsden <marsden at scripps.edu> wrote:
> > 2) Wake-on-LAN - I'm not 100% clear on whether this listens for a specific
> >    packet or whether it will just fire the machine up if a packet comes
> >    along with the NICs MAC address. If the later is the case I think we
> >    are snookered since we use PBS as the queueing system which I believe
> >    sends out packets to query nodes every now and then.

AFAIR Wake-on-LAN waits for a broadcast packet with a magic-pattern
and the destination machines MAC address in the middle. Normal packets
do not wake the machine.

> I have played with wake-on-lan and not had a lot of luck.  It is a
> special packet (http://www.scyld.com/expert/wake-on-lan.html), but
> some motherboards / nics just don't seem to want to "wake."

When we played with our Cluster of Dell-machines, we found ot that we
have to power off the machines at the right time. If we pressed
CTRL-ALT-DEL to reboot the machine and powered them off when de BIOS
screen appeared, Wake-on-Lan most likely didn't work, as the machines
were completely off. It seemed to work however, if we did a "shutdown -h" 
and switched the machines off when they reached runlevel 0. So a
combination of APM and "halt -p" would probably do the trick without
having to press the power-button.

- Felix
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From anders.lennartsson at foi.se  Wed Mar  7 03:56:03 2001
From: anders.lennartsson at foi.se (Anders Lennartsson)
Date: Wed, 07 Mar 2001 12:56:03 +0100
Subject: problems with etherchannel and NatSemi DP83815 cards
Message-ID: <3AA621D3.1C003659@foi.se>

Hi

BACKGROUND:

I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes,
each a PPro 200 :( but there may be more/other stuff coming :).
Considering the costs, we settled for Netgear 311 ethernet cards, for
which there is support in 2.4.x kernels. Patches are available for
kernels 2.2.x,
but since 2.4 is here... 
I have checked and the driver is a slightly modified version derived
from natsemi.c
available on www.scyld.com. There are some additions in the later not
included in the
one provided in the kernel source though.

Initially I put one card in each machine and verified that everything
worked.
I tested with NTtcp (netperf derivative?) and the the throughput
asymptotically
went up to about 90Mbits per second when two cards were connected
through a 100Mbps
switch (where are the last 10?).

Then I set out for etherchannel bonding.
It was a bit tricky to find a working ifenslave.c,
the one on www.beowulf.org seemed old and I found a newer at
pdsf.nersc.gov/linux/
Then it seemed to work after doing:

ifconfig bond0 192.168.1.x netmask 255.255.255.0 up
./ifenslave bond0 eth0
(bond0 gets the MAC adress from eth0)
./ifenslave bond0 eth1 

When testing the setup by ftping a large file between two nodes
messages of the following type was output repeatedly on the console:

ethX ... Something wicked happened! 0YYY
where X was 0 or 1 and YYY was one of 500, 700, 740, 749, 749, see
below.

Same thing happened when running NPtcp as package size came above a few
kbytes, speeds approx 50MBits per second.

QUESTIONS:

Anyone got ideas as to the nature/solution of this problem?
I suppose the PCI interface on these particular motherboards may play a
significant
role. Maybe the driver itself? Or is just the processor too slow?

Does anyone have experience of this with for instance 3c905?
Otherwise a very stable card IMHO.
It is about three times more expensive which isn't that much for
one or two, although I could imagine substantial savings
for a large cluster. But if my hours are included ...

Regards,
Anders

SOME DETAILED INFO:

>From syslog, kernel identifying network cards: (eth2 is for accessing from
outside the dedicated networks)

Mar  1 21:30:53 beo101 kernel:  
http://www.scyld.com/network/natsemi.html
Mar  1 21:30:53 beo101 kernel:   (unofficial 2.4.x kernel port, version
1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder)
Mar  1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000,
00:02:e3:03:da:87, IRQ 12.
Mar  1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000,
00:02:e3:03:de:43, IRQ 10.
Mar  1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000,
00:02:e3:03:dc:2c, IRQ 11.
Mar  1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869
advertising 05e1.

some lines of the wicked message: (above those are the two lines where
eth0 and eth1 are reported when ifenslave is run)

Mar  1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok)
Mar  1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on
negotiated link capability.
Mar  1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s
Mar  1 21:35:32 beo101 ntpd[182]: kernel pll status change 41
Mar  1 21:35:32 beo101 ntpd[182]: synchronisation lost
Mar  1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on
negotiated link capability.
Mar  1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD (  if [ -x
/usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null
2>&1; fi)
Mar  1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:12 beo101 last message repeated 2 times
Mar  1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:13 beo101 last message repeated 2 times
Mar  1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 last message repeated 3 times
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500.

The result of ifconfig:

bond0     Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:986886789 (941.1 Mb)

eth0      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907798 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915439 errors:1776 dropped:0 overruns:1776
carrier:1776
          collisions:0 txqueuelen:100 
          RX bytes:435552233 (415.3 Mb)  TX bytes:491795214 (469.0 Mb)
          Interrupt:12 

eth1      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907768 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915466 errors:1748 dropped:0 overruns:1748
carrier:1748
          collisions:0 txqueuelen:100 
          RX bytes:434992308 (414.8 Mb)  TX bytes:489766183 (467.0 Mb)
          Interrupt:10 Base address:0x2000 

eth2      Link encap:Ethernet  HWaddr 00:02:E3:03:DC:2C  
          inet addr:150.227.64.210  Bcast:150.227.64.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:13122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1032660 (1008.4 Kb)  TX bytes:943713 (921.5 Kb)
          Interrupt:11 Base address:0x4000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3904  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:552 (552.0 b)  TX bytes:552 (552.0 b)


From Kian_Chang_Low at vdgc.com.sg  Wed Mar  7 05:53:45 2001
From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low at vdgc.com.sg)
Date: Wed, 7 Mar 2001 21:53:45 +0800
Subject: Power-managment of slave nodes
Message-ID: <OFF67390C6.76B263B0-ON48256A08.004BDD7D@vdgc.com.sg>

Hi,

Not directly related.

But I was wondering with a similar APC master switch, I can actually
powered off (then on) a "dead" slave node when it is found to have hung.
After recycling the power of that node, it can rejoin the cluster without
any intervention from the user. Has anyone used it for such purpose, or is
there another way of recycling a dead node with manual intervention? Or a
cheaper way?

(This is of course assuming the program is able to handle the interrupt in
the number of available slave nodes.)

Regards,
Kian Chang.


                    "Kyle Sheumaker"                                                                
                    <ksheumaker at advancedclust        To:     "Brian Marsden" <marsden at scripps.edu>, 
                    ering.com>                       <beowulf at beowulf.org>                          
                    Sent by:                         cc:                                            
                    beowulf-admin at beowulf.org        Subject:     Re: Power-managment of slave      
                                                     nodes                                          
                                                                                                    
                    03/07/01 04:09 AM                                                               
                                                                                                    
                                                                                                    
I have played with wake-on-lan and not had a lot of luck.  It is a special
packet (http://www.scyld.com/expert/wake-on-lan.html), but some
motherboards
/ nics just don't seem to want to "wake."

They aren't cheap but I would suggest something like the APC masterswitch,
it's a network controllable power switch
(http://www.apc.com/products/masterswitch/index.cfm).  I've used them
before
their pretty cool, web, telnet, and SNMP controllable.

-- Kyle


----- Original Message -----
From: "Brian Marsden" <marsden at scripps.edu>
To: <beowulf at beowulf.org>
Sent: Tuesday, March 06, 2001 11:39 AM
Subject: Power-managment of slave nodes


> Dear all,
>
>  I look after a 96 processor, 48 node, Pentium III linux cluster. The
> owners have just received their first electricity bill for the machine
and
> unsuprisingly have had a nasty shock! They are now desperate to find ways
> to keep the bill as low as possible.
>
>  One solution put forward has been to have nodes shut themselves down
> using APM when not in use. Then when a node is needed for a job, it could
> be switched back on via wake-on-LAN on the ethernet card. I see a number
of
> problems associated with this:
>
> 1) APM is not supported under Linux 2.2 for SMP. However I believe that
it
>    is for 2.4 - can anyone comment on this?
> 2) Wake-on-LAN - I'm not 100% clear on whether this listens for a
specific
>    packet or whether it will just fire the machine up if a packet comes
>    along with the NICs MAC address. If the later is the case I think we
>    are snookered since we use PBS as the queueing system which I believe
>    sends out packets to query nodes every now and then.
>
>  Before I spend more time delving deeper into these problems, has anyone
> ever attempted to try to do all of this? If so, what are the perils and
> pitfalls? Is this a completely crazy idea?
>
> Thanks
>
> Brian.
>
> --
> ---------------------------------------------------------------------
>  Brian Marsden          Email: marsden at scripps.edu
>  TSRI, San Diego, USA.  Phone: +1 858 784 8698  Fax: +1 858 784 8299
> ---------------------------------------------------------------------
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From alangrimes at starpower.net  Wed Mar  7 07:08:15 2001
From: alangrimes at starpower.net (Alan Grimes)
Date: Wed, 07 Mar 2001 10:08:15 -0500
Subject: Need pointers to info...
Message-ID: <3AA64EDF.8B043ACC@starpower.net>

Hey, This seems to be a good place to ask a few questions about the
linux packet drivers... ( pppX/ethX/??? ) How exactly do these drivers
work? what are the interface standards? and how would a moronic
programmer go about trying to establish a home-brew protocol over them?

I would assume the interface to pppX would be very similar to ethX, but
I know horribly little about either of them... =\ 

-- 
om    If I meditate long enough a new sig will come to me.
http://users.erols.com/alangrimes/  <my website.
Any usage of this e-mail account is subject to the terms and conditions
specified on my website.


From josip at icase.edu  Wed Mar  7 09:04:03 2001
From: josip at icase.edu (Josip Loncaric)
Date: Wed, 07 Mar 2001 12:04:03 -0500
Subject: redhat 7.0 upgrade woes
References: <Pine.SGI.4.30.0103061756290.991-100000@Downforce.ERC.MsState.Edu>
Message-ID: <3AA66A03.2E4E3171@icase.edu>

One problem I've seen in upgrading 6.2->6.2+updates->7.0->7.0+updates is
that Red Hat messed up version numbering on about a dozen packages,
which then did not get updated to 7.0 versions.  The most obvious
problem was gnorpm.  Since the updated 6.2 version appeared newer than
the updated 7.0 version, gnorpm did not get replaced (but the underlying
libc did) so afterwards gnorpm refused to work (complaining about a
missing shared library).

The fix for this is to install the correct gnorpm (and other misnumbered
packages) using the rpm -Uvh --force ... command, at least until Red Hat
addresses these version numbering problems.

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134


From lindahl at conservativecomputer.com  Wed Mar  7 10:01:22 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed, 7 Mar 2001 13:01:22 -0500
Subject: Power-managment of slave nodes
In-Reply-To: <OFF67390C6.76B263B0-ON48256A08.004BDD7D@vdgc.com.sg>; from Kian_Chang_Low@vdgc.com.sg on Wed, Mar 07, 2001 at 09:53:45PM +0800
References: <OFF67390C6.76B263B0-ON48256A08.004BDD7D@vdgc.com.sg>
Message-ID: <20010307130122.D8541@wumpus>

On Wed, Mar 07, 2001 at 09:53:45PM +0800, Kian_Chang_Low at vdgc.com.sg wrote:

> But I was wondering with a similar APC master switch, I can actually
> powered off (then on) a "dead" slave node when it is found to have hung.
> After recycling the power of that node, it can rejoin the cluster without
> any intervention from the user. Has anyone used it for such purpose, or is
> there another way of recycling a dead node with manual intervention? Or a
> cheaper way?

It's extremely rare that a node hangs -- it's more common that nodes
die due to hardware failures. So I've never had an automatic way of
recycling dead nodes. Instead, I view the APC as an administrator
convenience: a good way to reboot a node that you're testing with, a
fast way to power down the entire cluster when there's an AC failure,
etc.

-- g


From kragen at pobox.com  Wed Mar  7 11:56:22 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Wed, 7 Mar 2001 14:56:22 -0500 (EST)
Subject: Need pointers to info...
Message-ID: <200103071956.OAA13394@kirk.dnaco.net>

Alan Grimes <alangrimes at starpower.net> writes:
> Hey, This seems to be a good place to ask a few questions about the
> linux packet drivers...

It isn't, unless you're running them in a Beowulf or planning to.

There are books about this from Coriolis, and there are mailing lists
related to driver development.  This is not among them.


From kragen at pobox.com  Wed Mar  7 11:57:43 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Wed, 7 Mar 2001 14:57:43 -0500 (EST)
Subject: Need pointers to info...
Message-ID: <200103071957.OAA13512@kirk.dnaco.net>

Alan Grimes <alangrimes at starpower.net> writes:
> Hey, This seems to be a good place to ask a few questions about the
> linux packet drivers...

It isn't, unless you're running them in a Beowulf or planning to.

There are books about this from Coriolis, and there are mailing lists
related to driver development.  This is not among them.


From Eugene.Leitl at lrz.uni-muenchen.de  Wed Mar  7 12:54:30 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de)
Date: Wed, 07 Mar 2001 21:54:30 +0100
Subject: gaussian98/rh 6.2 trouble
Message-ID: <3AA6A006.C85B9E64@lrz.uni-muenchen.de>

Please reply to chemistry at ccl.net as well as original requesters,
not to me.

------------------forwarded message---------------------------------------

Subject: 
         No Subject Given By The Author
     To: 
         chemistry at ccl.net


Dear CCL members,

We have two clusters of intel based machines.

Cluster 1 
 * Intel Pentium III 600 MHz CPU with fan
 * ASUS P3B-F Motheroard
 * 512MB SDRAM PC100MHz (2x256MB), Cas III Type and Low Profile.
 * 22GB IBM IDE HDD 7200RPM
 * 4MB AGP ATI Display Card
cluster 2
 * (1)Intel Pentium III 800MHz CPU with fan, 256K cache
 * Intel L440GX+ Dual Pentium III Motherboard
 * SCSI, Lan and VGA on Board
 * 1024MB SDRAM PC100MHz (4x256MB), ECC Register
 * 40GB IBM IDE HDD 7200RPM

We ran gaussian98 in parallel using linda on cluster 1 
using redhat 6.1 for several months and everything appeared 
to function perfectly.  Cluster 2 came with redhat 6.2, and 
initially everything seemed to be fine, so cluster 1 was upgraded
to 6.2.  Under 6.2 gaussian jobs fail on cluster 1 with
a relatively short mean time between failure.  We have
now discovered that cluster 2 also seems to have problems
under 6.2, but the mean time between failures is much
longer.  
The failures come in two types: 1) the calculation appears
to run but values are all garbage (the values print as "nan")
2) one of the processes dies, most commonly, the message
is process 0 failed to complete.

Before restoring 6.1 to the machine, does anyone know
what this problem is and how to fix it?

Thanks,

Alessandra


---------------------------------------------------------------------
Alessandra Ricca                 Mail:  NASA Ames Research Center    
Senior Research Scientist               Mail Stop 230-3         
ELORET Corporation                      Moffett Field, CA 94035-1000 
http://www.eloret.com

Ph:  +1-650-604-5410             Email: ricca at pegasus.arc.nasa.gov
Fax: +1-650-604-0350


-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search    |    Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu

---------------------------------------------------------------------------------------------------------------

From: 
         Gerardo Andres Cisneros <andres at chem.duke.edu>
                                                                                               
19:59

 Subject: 
         CCL:No Subject Given By The Author
     To: 
         Alessandra Ricca <ricca at pegasus.arc.nasa.gov>
     CC: 
         chemistry at ccl.net


Hello

We have a similar problem, we've just built an 8 node PIII cluster running
RH6.2 (2.2.16-3 kernel) and I'm testing gaussian using Linda for this
cluster.

However, if the calculation takes more than 1 hour, invariably, one of the
slave nodes will run out of memory because gaussian will fail to kill the
processes so there will be a good number of phantom processes on the nodes
just sitting there occupying memory.

I would really appretiate if you could post a summary from any reply you
might get.

Thanks in advance

Andres

--
G. Andres Cisneros
Department of Chemistry 
Duke University
andres at chem.duke.edu


-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search    |    Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu


From rajkumar at csse.monash.edu.au  Wed Mar  7 21:26:12 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Thu, 08 Mar 2001 16:26:12 +1100
Subject: IEEE Cluster Computing 2001: Call for papers
Message-ID: <3AA717F4.68B7D32E@csse.monash.edu.au>

                              Call for Papers

         Third IEEE International Conference on Cluster Computing
                       http://andy.usc.edu/cluster2001/

            Sutton Place Hotel, Newport Beach, California, USA
                              Oct. 8-11, 2001

            Sponsored by the IEEE Computer Society, through the
                  Task Force on Cluster Computing (TFCC)

     Organized by the University of Southern California, University of
       California at Irvine, and California Institute of Technology

---------------------------------------------------------------------------

                          Call  For Participation


The rapid emergence of COTS Cluster Computing as a major strategy for
delivering high performance to technical and commercial applications is
driven by the superior cost effectiveness and flexibility achievable
through ensembles of PCs, workstations, and servers. Cluster computing,
such as Beowulf class, SMP clusters, ASCI machines, and metacomputing
grids, is redefining the manner in which parallel and distributed computing
is being accomplished today and is the focus of important research in
hardware, software, and application development.

The Third IEEE International Conference on Cluster Computing will be held
in the beautiful Pacific coastal city, Newport Beach in Southern
California, from October 8 to 11, 2001.  For the first time, the Cluster
2001 merges four popular professional conferences or workshops: IWCC,
PC-NOW, CCC, JPC and German  CC into an integrated, large-scale,
international forum to be held in Northern America. The conference series
was previously held in Australia (1999) and Germany (2000). For details and
updated information, visit the Cluster 2001 official web site:
http://andy.usc.edu/cluster2001/. The conference series information can be
found at: http://www.clustercomp.org/.

We encourage submission of high quality papers reporting original work in
theoretical, experimental, and industrial research and development in the
following topics, which are not exclusive to cluster architecture,
software, protocols, and applications :

   * Hardware Technology for Clustering
   * High-speed System Interconnects
   * Light Weight Communication Protocols
   * Fast Message Passing Libraries
   * Single System Image Services
   * File Systems and Distributed RAID
   * Internet Security and Reliability
   * Cluster Job and Resource Management
   * Data Distribution and Load Balancing
   * Tools for Operating and Managing Clusters
   * Cluster Middleware, Groupware, and Infoware
   * Highly Available Cluster Solutions
   * Problem Solving Environments for Cluster
   * Scientific and E-Commerce Applications
   * Collaborative Work and Multimedia Clusters
   * Performance Evaluation and Modeling
   * Clusters of Clusters/Computational Grids
   * Software Tools for Metacomputing
   * Novel Cluster Systems Architectures
   * Network-based Distributed Computing
   * Mobile Agents and Java for Cluster Computing
   * Massively Parallel Processing
   * Software Environments for Clusters
   * Clusters for Bioinformatics
   * Innovative Cluster Applications

Paper Submission

The review process will be based on papers not exceeding 6000 words on at
most 20 pages. Deadline for Web-based electronic submission is March 12,
2001 in Postscript (*.ps) or Adobe Acrobat v3.0 (*.pdf) format. The
submitted file must be viewable with Aladdin GhostScript 5.10 and printable
on a standard PostScript Laser printer. No constraint on the format of the
submitted draft except the length. However, double-space format is
encouraged, in order to provide the referees the convenience of marking
comments and corrections on the paper copy. The web site for the paper
submission is: http://www.cacr.caltech.edu/cluster2001/papers

Proceedings

The proceedings of CLUSTER 2001 will be published by IEEE Computer Society.
The proceedings will also be made available online through the IEEE digital
library following the conference.

Panels/Tutorials/Exhibitions

Proposals are solicited for special topics and panel sessions. These
proposals must be submitted to the Program Chair: Thomas Sterling.
Proposals for a half-day or a full-day tutorial related to the conference
topics are encouraged and submit the same to tutorial chair: Ira Pramanick.
For exhibitions, contact exbition chair: Rawn Shah.

Conference Organization

General Chairs:

        * Kai Hwang  (University of Southern California, USA)
        * Mark Baker  (Portsmouth University, UK)

Vice General Chairs:

        * Rick Stevens  (Argonne National Laboratory, USA)
        * Nalini Venkatasubramanian  (University of California at
          Irvine, USA)

Steering Committee:

        * Mark Baker    (University of Portsmouth, UK)
        * Pete Beckman    (Turbolinux, Inc., USA)
        * Bill Blake    (Compaq, USA)
        * Rajkumar Buyya    (Monash University, Australia)
        * Giovanni Chiola    (DISI - Universita di Genova, Italy)
        * Jack Dongarra    (University of Tennessee and ORNL, USA)
        * Geoffrey Fox    (NPAC, Syracuse, USA)
        * Al Geist    (ORNL, USA)
        * Kai Hwang    (University of Southern California, USA)
        * Rusty Lusk    (Argonne National Laboratory, USA)
        * Paul Messina    (Caltech, USA)
        * Greg Pfister    (IBM, USA)
        * Wolfgang Rehm    (Technische Universit?t Chemnitz, Germany)
        * Thomas Sterling    (JPL and Caltech, USA)
        * Rick Stevens    (Argonne National Laboratory, USA)
        * Thomas Stricker    (ETH Z?rich, Switzerland)
        * Barry Wilkinson    (UNCC, USA)


Technical Program Chair: Thomas Sterling  (Caltech & NASA JPL, USA)

Deputy Program Chair: Daniel S. Katz  (NASA JPL, USA)

Vice Program Chairs:

        * Gordon Bell  (Microsoft Research, USA)
        * Dave Culler  (University of California, Berkeley, USA)
        * Jack Dongarra  (University of Tennessee, USA)
        * Jim Gray  (Microsoft, USA)
        * Bill Gropp  (Argonne National Laboratory, USA)
        * Ken Kennedy  (Rice University, USA)
        * Dan Reed  (UIUC, USA)
        * Chuck Seitz  (Myricom  Inc., USA)
        * Burton Smith  (Cray Inc., USA)

Program Committee

Tutorial Chair: Ira Pramanick  (Sun Microsystems, USA)

Publications/Proceedings Co-Chairs:

        * Marcin Paprzycki  (University of Southern Mississippi, USA)
        * Rajkumar Buyya (Monash University, Australia)

Exhibition Chair: Rawn Shah (Sun World Journal, USA)

Publicity Chair: Hai Jin (Huazhong University of Science and Technology,
China)

Poster Chair:  Phil Merkey (Michigan Technical University, USA)

Conference Venue:

          Sutton Place Hotel
          4500 MacArthur Blvd.
          Newport Beach, California, 92660
          USA
          Tel:  949-476-2001
          Fax:  949-250-7191

Important Deadlines:

            Paper Submission          March 12, 2001
            Notification of
            Acceptance                June 18, 2001
            Camera Ready Papers       July 9, 2001
            Early Registration        August 31, 2001
            Tutorial/Exhibition/Panel
            Proposals                 June 11, 2001

Cluster2001 is in cooperation  with the IEEE TC on Distributed Processing,
IEEE TC on Parallel Processing, ACM SIG on Computer Architecture, Univ. of
Portmouth, UK, Univ. of California, Berkeley, Rice Univ., Univ. of
Illinois, Urbana-Champaaign, Univ. of Tennessee, Monash Univ.,
Australia,Technical University of Chemnitz, Germany,  Huazhong University
of Science and Technology, China, Argonne National Lab., NASA Jet
Propulsion Lab., National Center for High-Performance Computing, Taiwan,
Sun Microsystems, Cray, Compaq, IBM, Microsoft, and Myricom, etc.

---------------------------------------------------------------------------


From bill at math.ucdavis.edu  Thu Mar  8 04:13:44 2001
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Thu, 8 Mar 2001 04:13:44 -0800
Subject: AMD SMP/AMD interest in beowulf
Message-ID: <20010308041344.A4411@sphere.math.ucdavis.edu>

I just noticed at:
http://www.theregister.co.uk/content/3/17389.html

I thought it was interesting that they mention:
AMD is taking a particular interest in the Beowulf clustering project ... 
and wish to recruit system integrators capable of giving this type of
platform a go.

I've been watching/hoping for athlon SMP for quite some time.

--
Bill


From carlos at nernet.unex.es  Thu Mar  8 05:17:53 2001
From: carlos at nernet.unex.es (=?Windows-1252?Q?Carlos_J._Garc=EDa_Orellana?=)
Date: Thu, 8 Mar 2001 14:17:53 +0100
Subject: Scyld and resolv library
Message-ID: <005401c0a7d2$2ecd4870$7c12319e@unex.es>

Hello,

I'm trying to start DNS resolver in nodes.
I've created a hosts, host.conf and resolv.conf files in nodes, but, DNS
doesn't work.

Not even, names in hosts file can be resolutes.

Thanks.

Carlos.


From agrajag at linuxpower.org  Thu Mar  8 05:27:13 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 8 Mar 2001 05:27:13 -0800
Subject: Scyld and resolv library
In-Reply-To: <005401c0a7d2$2ecd4870$7c12319e@unex.es>; from carlos@nernet.unex.es on Thu, Mar 08, 2001 at 02:17:53PM +0100
References: <005401c0a7d2$2ecd4870$7c12319e@unex.es>
Message-ID: <20010308052713.L7935@kotako.analogself.com>

On Thu, 08 Mar 2001, Carlos J. Garc?a Orellana wrote:

> Hello,
> 
> I'm trying to start DNS resolver in nodes.
> I've created a hosts, host.conf and resolv.conf files in nodes, but, DNS
> doesn't work.
> 
> Not even, names in hosts file can be resolutes.

It's debatable wether letting nodes do this resolving is the 'right
thing', however I will assume that you have an application that's messed
up enough that it needs it.

By default, Scyld slave nodes have their nsswitch.conf setup so that
they only do host lookups through beonss (referenced as 'bproc' in the
nsswitch.conf).  This will resolve 'master' to the internal ip of the
master node, 'self' to get your own IP, '.-1' to get the master node
again, '.0' to get node zero, '.1' to get node one, and so on.  It also
reverses these so that the ip of a node (or the master node's internal
IP address) will resolve to '.<node number>' like above.   If you're
just wanting smiliar functionality to this, then you don't have to worry
about setting up dns and the hosts file.

If you're wanting more than this, you're talking about things that
theorectically shouldn't be on a beowulf cluster (like ipmasq on the
head node to get out).  However, for the /etc/hosts and dns resolution,
I'll tell you that the file you need to also look at is
/etc/nsswitch.conf


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010308/f3b43bc9/attachment.sig>

From wasshub at ti.com  Thu Mar  8 06:01:58 2001
From: wasshub at ti.com (Christoph Wasshuber)
Date: Thu, 08 Mar 2001 08:01:58 -0600
Subject: job migration
References: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov>
Message-ID: <3AA790D6.20F25812@ti.com>

Is it possible to do a crude job migration with some
simple scripts? Or do I need to get one of
the job migration packages like MOSIX, Scyld, ...

chris....


From ddj at cascv.brown.edu  Thu Mar  8 06:28:30 2001
From: ddj at cascv.brown.edu (Dave Johnson)
Date: Thu, 8 Mar 2001 09:28:30 -0500
Subject: Scyld and resolv library
In-Reply-To: <20010308052713.L7935@kotako.analogself.com>; from agrajag@linuxpower.org on Thu, Mar 08, 2001 at 05:27:13AM -0800
References: <005401c0a7d2$2ecd4870$7c12319e@unex.es> <20010308052713.L7935@kotako.analogself.com>
Message-ID: <20010308092830.A3646@mookie.cis.brown.edu>

One more thing that may bite bproc users/admins is the need for
/etc/protocols if an application uses getproto{ent,byname,bynumber},
and possibly /etc/services to support getserv{ent,byname,bynumber}.
I ran into this when running some of the netpipe benchmarks, which
uses getprotobyname.

Since the default behavior of nsswitch is to look in nis, then check
the appropriate file in /etc if nis fails to respond, you don't have
to change /etc/nsswitch.conf to get it to look at the /etc/protocols,
services, networks, ethers, etc.

	-- ddj

	Dave Johnson
	Brown University TCASCV
	ddj at cascv.brown.edu

On Thu, Mar 08, 2001 at 05:27:13AM -0800, Jag wrote:
> On Thu, 08 Mar 2001, Carlos J. Garc?a Orellana wrote:
> 
> > Hello,
> > 
> > I'm trying to start DNS resolver in nodes.
> > I've created a hosts, host.conf and resolv.conf files in nodes, but, DNS
> > doesn't work.
> > 
> > Not even, names in hosts file can be resolutes.
> 
> It's debatable wether letting nodes do this resolving is the 'right
> thing', however I will assume that you have an application that's messed
> up enough that it needs it.
> 
> By default, Scyld slave nodes have their nsswitch.conf setup so that
> they only do host lookups through beonss (referenced as 'bproc' in the
> nsswitch.conf).  This will resolve 'master' to the internal ip of the
> master node, 'self' to get your own IP, '.-1' to get the master node
> again, '.0' to get node zero, '.1' to get node one, and so on.  It also
> reverses these so that the ip of a node (or the master node's internal
> IP address) will resolve to '.<node number>' like above.   If you're
> just wanting smiliar functionality to this, then you don't have to worry
> about setting up dns and the hosts file.
> 
> If you're wanting more than this, you're talking about things that
> theorectically shouldn't be on a beowulf cluster (like ipmasq on the
> head node to get out).  However, for the /etc/hosts and dns resolution,
> I'll tell you that the file you need to also look at is
> /etc/nsswitch.conf
> 
> 
> Jag


From newt at scyld.com  Thu Mar  8 08:41:00 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 8 Mar 2001 11:41:00 -0500 (EST)
Subject: Scyld and resolv library
In-Reply-To: <20010308052713.L7935@kotako.analogself.com>
Message-ID: <Pine.LNX.4.21.0103081136530.19536-100000@eleanor.wdhq.scyld.com>


On Thu, 8 Mar 2001, Jag wrote:

> On Thu, 08 Mar 2001, Carlos J. Garc?a Orellana wrote:

> > Hello, > > I'm trying to start DNS resolver in nodes. > I've created
> a hosts, host.conf and resolv.conf files in nodes, but, DNS > doesn't
> work. > > Not even, names in hosts file can be resolutes.

<SNIP>

Jag's discussion about local name service on Scyld was exactly right.

If you wish to create a permanently different nsswitch.conf, edit
/usr/lib/beoboot/bin/node_up to generate the contents that you want.

I believe this also means that the DNS resolver library has to be
installed in /lib on each node (as the NSS libraries are usually linked
in by dlopen() rather than by the dynamic linker).

Regards,
	Dan Ridge	
	Scyld Computing Corporation


From newt at scyld.com  Thu Mar  8 08:47:36 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 8 Mar 2001 11:47:36 -0500 (EST)
Subject: Scyld and resolv library
In-Reply-To: <20010308092830.A3646@mookie.cis.brown.edu>
Message-ID: <Pine.LNX.4.21.0103081142370.19536-100000@eleanor.wdhq.scyld.com>

On Thu, 8 Mar 2001, Dave Johnson wrote:

> One more thing that may bite bproc users/admins is the need for
> /etc/protocols if an application uses getproto{ent,byname,bynumber},
> and possibly /etc/services to support getserv{ent,byname,bynumber}.
> I ran into this when running some of the netpipe benchmarks, which
> uses getprotobyname.

I may decide eventually to stub out more of these calls to provide answers
to common questions. I worry about the slippery slope of /etc -- I think
that the answer is 5 or 400.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From cosmik.debris at elec.canterbury.ac.nz  Thu Mar  8 11:47:17 2001
From: cosmik.debris at elec.canterbury.ac.nz (Cosmik Debris)
Date: Fri, 09 Mar 2001 08:47:17 +1300
Subject: job migration
Message-ID: <74C3DBA1ACA54844B781615F22D0DB180C1A@claude.elec.canterbury.ac.nz>

I don't think you can do process migration with scripts. For process
migration one has to be able to "pick up" a running job and move it along
with it's open file handles etc. Not an easy task.

> -----Original Message-----
> From: Christoph Wasshuber [mailto:wasshub at ti.com]
> Sent: Friday, 9 March 2001 3:02 a.m.
> To: Beowulf (E-mail)
> Subject: job migration
> 
> 
> Is it possible to do a crude job migration with some
> simple scripts? Or do I need to get one of
> the job migration packages like MOSIX, Scyld, ...
> 
> chris....
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


From newt at scyld.com  Thu Mar  8 12:26:45 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 8 Mar 2001 15:26:45 -0500 (EST)
Subject: job migration
In-Reply-To: <74C3DBA1ACA54844B781615F22D0DB180C1A@claude.elec.canterbury.ac.nz>
Message-ID: <Pine.LNX.4.21.0103081517440.24660-100000@eleanor.wdhq.scyld.com>

On Fri, 9 Mar 2001, Cosmik Debris wrote:

> I don't think you can do process migration with scripts. For process
> migration one has to be able to "pick up" a running job and move it along
> with it's open file handles etc. Not an easy task.

Funny you should mention this....

We've got a little library we use internally. It's an LD_PRELOAD for
bash that uses it's library constructor to set up a signal handler on
SIGPWR. If you put the node you want to go to in the environment and
send yourself SIGPWR, the current process moves to that node.

This was done just for bash so that I could create bash bindings for
bproc. :)

I now have shell scripts that just wander around the cluster like worms
doing various things. The even sicker part is that we wrapper execve()
to do bproc_move()/bproc_execmove() so that bash can run subprograms even
though none are installed locally.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From agrajag at linuxpower.org  Thu Mar  8 13:47:42 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 8 Mar 2001 13:47:42 -0800
Subject: job migration
In-Reply-To: <3AA790D6.20F25812@ti.com>; from wasshub@ti.com on Thu, Mar 08, 2001 at 08:01:58AM -0600
References: <008801c0a672$f1570b30$61064f89@cerulean.jpl.nasa.gov> <3AA790D6.20F25812@ti.com>
Message-ID: <20010308134742.M7935@kotako.analogself.com>

On Thu, 08 Mar 2001, Christoph Wasshuber wrote:

> Is it possible to do a crude job migration with some
> simple scripts? Or do I need to get one of
> the job migration packages like MOSIX, Scyld, ...

Actually, I've written a script in Python that does just that for a
Scyld system, crude job migration.  It just rforks() to the remote node
then does an exec() of the binary on that node (after twiddling with
stdin/stdout some).  This assumes that the program you want to run is on
the remote node as well as any libraries it needs (I assumed the binary
was on an nfs share, and that the libraries were already cached on the
remote node).  Of course to do this, you need BProc bindings for
whatever scripting language you're using.  So far I've seen bindings for
Perl and Python released.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010308/d9a629d2/attachment.sig>

From beowulf at chaka.net  Thu Mar  8 14:08:32 2001
From: beowulf at chaka.net (Todd Chapman)
Date: Thu, 8 Mar 2001 17:08:32 -0500 (EST)
Subject: Question about BProc process migration/ Scyld.
Message-ID: <Pine.LNX.4.10.10103081702500.860-100000@pengy.chakatodd.net>

Hi,

I have worked with Beowulf systems and MPI in the past, but am considering
diving into Scyld's tools to help build a cluster for a large company.

I'm a systems administrator, not a C programmer. After reading the BProc
documentation I am a bit confused. It mentions that the process migration
process is not transparent. Questions I have are:

1. Dou you need access the the program's source code to make modifications
for process migration?

2. Or, can a wrapper be written that is migrated and spawns the real
application?

If there is any documentation that explains this more plainly than the
Scyld documentation I would really be interested.

Thanks.

-Todd


From newt at scyld.com  Thu Mar  8 15:13:21 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 8 Mar 2001 18:13:21 -0500 (EST)
Subject: job migration
In-Reply-To: <20010308134742.M7935@kotako.analogself.com>
Message-ID: <Pine.LNX.4.21.0103081808040.24810-100000@eleanor.wdhq.scyld.com>

On Thu, 8 Mar 2001, Jag wrote:

> On Thu, 08 Mar 2001, Christoph Wasshuber wrote:
> 
> > Is it possible to do a crude job migration with some
> > simple scripts? Or do I need to get one of
> > the job migration packages like MOSIX, Scyld, ...
> 
> Actually, I've written a script in Python that does just that for a
> Scyld system, crude job migration.  It just rforks() to the remote node
> then does an exec() of the binary on that node (after twiddling with
> stdin/stdout some).  This assumes that the program you want to run is on
> the remote node as well as any libraries it needs (I assumed the binary
> was on an nfs share, and that the libraries were already cached on the
> remote node).

Why not just use fork()/bproc_execmove() ? You don't need then to have any
of the binaries installed remotely. You can (if you're clever) even
manage to get the dynamic link step to happen on the frontend. 

>  Of course to do this, you need BProc bindings for
> whatever scripting language you're using.  So far I've seen bindings for
> Perl and Python released.

I'm actually looking for a responsible maintainer for the Perl bindings. I
want basically nothing to do with Perl. :)

Regards,
	Dan Ridge
	Scyld Computing


From rgb at phy.duke.edu  Thu Mar  8 15:20:37 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 8 Mar 2001 18:20:37 -0500 (EST)
Subject: job migration
In-Reply-To: <74C3DBA1ACA54844B781615F22D0DB180C1A@claude.elec.canterbury.ac.nz>
Message-ID: <Pine.LNX.4.30.0103081806010.5957-100000@ganesh.phy.duke.edu>

On Fri, 9 Mar 2001, Cosmik Debris wrote:

> I don't think you can do process migration with scripts. For process
> migration one has to be able to "pick up" a running job and move it along
> with it's open file handles etc. Not an easy task.

OTOH, if you're willing to write the jobs and scripts as a tuned pair,
you can e.g. kill -USR1 pid from the script, trap the signal in the
executable and e.g. write out a restartable checkpoint to an NFS shared
file, and the rerun the command on a lightly loaded remote host with a
flag that causes it to restart the process from the checkpoint file.  Or
you can get more sophisticated and have the process itself (upon receipt
of the kill signal) start another copy of itself on a remote host, open
a socket connection, transmit its current state and a begin command, and
die.  There are always ways to do it.

Whether one SHOULD do it rather than install MOSIX (which was born and
bred to make this specific task utterly painless) is a question of how
checkpointable your task is -- how easily it can write out a restartable
state and restart from that state.  If you are e.g. doing Monte Carlo
and each thread just generates independent samples without any initial
thermalization time, you may need NO initial data to migrate a task --
just kill one on the loaded host with a signal that forces it to write
out any unflushed samples first and start a new one on the unloaded
host.

Then there are tasks with open sockets, with many megabytes of internal
state data, with open files, that would be a nightmare to checkpoint
restartably.  Somewhere in between is a point of no (sane) return,
especially with clear upper bounds on the work required to install
MOSIX.  Only you know if your job is pretty darn easy to migrate in this
way would it be worth it.  I've used this sort of method a few times
about six or seven years ago so I know it works, but it is a bit clunky.

   rgb

>
> > -----Original Message-----
> > From: Christoph Wasshuber [mailto:wasshub at ti.com]
> > Sent: Friday, 9 March 2001 3:02 a.m.
> > To: Beowulf (E-mail)
> > Subject: job migration
> >
> >
> > Is it possible to do a crude job migration with some
> > simple scripts? Or do I need to get one of
> > the job migration packages like MOSIX, Scyld, ...
> >
> > chris....
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe)
> > visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From agrajag at linuxpower.org  Thu Mar  8 16:17:48 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 8 Mar 2001 16:17:48 -0800
Subject: job migration
In-Reply-To: <Pine.LNX.4.21.0103081808040.24810-100000@eleanor.wdhq.scyld.com>; from newt@scyld.com on Thu, Mar 08, 2001 at 06:13:21PM -0500
References: <20010308134742.M7935@kotako.analogself.com> <Pine.LNX.4.21.0103081808040.24810-100000@eleanor.wdhq.scyld.com>
Message-ID: <20010308161748.N7935@kotako.analogself.com>

On Thu, 08 Mar 2001, Daniel Ridge wrote:

> > Actually, I've written a script in Python that does just that for a
> > Scyld system, crude job migration.  It just rforks() to the remote node
> > then does an exec() of the binary on that node (after twiddling with
> > stdin/stdout some).  This assumes that the program you want to run is on
> > the remote node as well as any libraries it needs (I assumed the binary
> > was on an nfs share, and that the libraries were already cached on the
> > remote node).
> 
> Why not just use fork()/bproc_execmove() ? You don't need then to have any
> of the binaries installed remotely. You can (if you're clever) even
> manage to get the dynamic link step to happen on the frontend. 

I didn't do that because I need to be able to redirect
stdin/stdout/stderr.  BProc can't forward open file descriptors, so I
have to rfork, open the new files, use the dup2 magic to redirect stdin
and stdout, then exec() the process.  See my recent post on the bproc
list if you actually want to see some code for that.

> 
> >  Of course to do this, you need BProc bindings for
> > whatever scripting language you're using.  So far I've seen bindings for
> > Perl and Python released.
> 
> I'm actually looking for a responsible maintainer for the Perl bindings. I
> want basically nothing to do with Perl. :)

I don't blame you.  I'll just stick to my Python bindings :)


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010308/0865f895/attachment.sig>

From rgb at phy.duke.edu  Thu Mar  8 19:01:27 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 8 Mar 2001 22:01:27 -0500 (EST)
Subject: Dual Athlon
Message-ID: <Pine.LNX.4.30.0103082158160.6246-100000@ganesh.phy.duke.edu>

Dear Listvolken,

I am very likely to have access to one of the very few dual Athlons in
existence over the weekend, with the latest stepping even.  I'm planning
to run benchmarks.  I'm taking orders/suggestions for Benchmarks You
Would Like To See on Dual Athlons.  Send 'em to me, with benchmark URL's
if you know 'em, and I'll try to run them and will publish all results
to the list when I'm done.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From nashif at suse.de  Fri Mar  9 00:34:38 2001
From: nashif at suse.de (nashif at suse.de)
Date: Fri, 9 Mar 2001 09:34:38 +0100 (CET)
Subject: job migration
In-Reply-To: <Pine.LNX.4.21.0103081808040.24810-100000@eleanor.wdhq.scyld.com>
Message-ID: <Pine.LNX.4.30.0103090933360.14505-100000@Appserv.suse.de>

On Thu, 8 Mar 2001, Daniel Ridge wrote:
> >  Of course to do this, you need BProc bindings for
> > whatever scripting language you're using.  So far I've seen bindings for
> > Perl and Python released.
>
> I'm actually looking for a responsible maintainer for the Perl bindings. I
> want basically nothing to do with Perl. :)
Maybe you should put it on sourceforge as a project:-)

Anas

>
> Regards,
> 	Dan Ridge
> 	Scyld Computing

-- 
Anas Nashif <nashif at suse.de>
SuSE GmbH, Nuremberg, Germany

Fone: +1 450 978 2382
Fax:  +1 507 242 9604


From Eugene.Leitl at lrz.uni-muenchen.de  Fri Mar  9 06:53:40 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 9 Mar 2001 15:53:40 +0100 (MET)
Subject: Microsoft co-opts open source approach? (fwd)
Message-ID: <Pine.GSO.4.03.10103091552370.26053-100000@sun1.lrz-muenchen.de>

Sure, it will be a cold day in hell, but still worth watching.

---------- Forwarded message ----------
Date: Fri, 9 Mar 2001 09:19:13 -0500
From: Gary Pupurs <garyp at speakeasy.net>
To: FoRK at xent.com
Subject: Microsoft co-opts open source approach?

[Can't believe this didn't get FoRKed yet, but maybe it did and I missed it.
if (wasForkedAlready) {gary.express.apologies;}]


This is nothing short of earth-shattering (IMHO).  Very interesting things
going on within Microsoft these days. (More to the point, what in the world
is going on in there?)  I've heard several of the major developers are
pushing internally to open source as much of the .NET Framework as possible
when it's released, but this blows that idea out of the water by
comparison...

-g

http://news.cnet.com/news/0-1003-201-5067896-0.html


Commentary: Microsoft co-opts open source approach

By Meta Group
Special to CNET News.com
March 8, 2001, 11:50 a.m. PT

In a major extension of corporate policy, Microsoft has quietly started a
program to provide selected large enterprise customers with copies of the
source code for Windows 2000 (Professional, Server, Advanced Server and Data
Center), Windows XP (released betas) and all related service packs.

The standard agreement, which resembles those under which IBM has
traditionally made source code of its operating systems available, allows
customers to consult the Windows source code when debugging their own
applications and to better integrate Windows with individual corporate
environments.

However, the agreement does not allow customers to modify or customize the
code, and Microsoft anticipates that problems or bugs that customers may
find in Windows will be reported to Microsoft for resolution through normal
support channels.

Microsoft lists the main benefits of the program to customers as follows:
one, augmenting the ability to debug and optimize customers' internal
applications; two, improving troubleshooting of deployed Windows
environments; and three, increasing understanding of Windows to promote
long-term success of the customer's organization.

Microsoft says it has already released copies of source code for Windows
2000 and Windows XP to a few large clients as well as to academic
institutions and large original equipment manufacturers (OEMs). Now it is
formalizing this process, extending it to Global 2000 customers and making
it routine. The company expects to offer source code to approximately 1,000
large users with enterprise-level agreements with Microsoft. This program
will initially be available only to U.S.-based users, but it will eventually
be available worldwide.

A first step toward open source?

We believe this is an important change in Microsoft policy. It may be the
first step toward a new software development, distribution and business
model similar to open source and designed to support the Internet-based
environment of Microsoft's .Net platform.

The greatest initial benefit to Microsoft from this move will be building
trust with its largest customers. We tend to trust those who trust us, and
releasing source code on the basis of mutual trust will encourage those
clients to trust Microsoft.
However, the long-term importance of this change is its impact on enabling
the software industry (in particular, Microsoft) to leverage more
Internet-style business and distribution models. The biggest problem the
software industry currently has in leveraging the Internet is not
technology--it's developing a viable business model.

Software makers need to find ways to develop and test their
products--particularly operating systems--that can work in the multitude of
different environments that exist in an Internet-based infrastructure, while
retaining ownership and thereby making money.

For instance, the heavily hyped ASP (application service provider) model is
now faltering because it had no way to provide the integration and services
that companies need. The open-source movement harnessed thousands of users
to do distributed development and testing over the huge number of components
that this model demands, but software vendors did not control the products,
and now they are struggling to find ways to make money.

The power of distribution

Microsoft is in the best position to succeed in this new software
environment, and this change in its corporate policy is a good step in that
direction. Although Microsoft is certainly a technology developer, its real
power comes from its position as the best software distributor in the
industry.

The advantage of providing Windows source code is that Microsoft enlists
tens of thousands of software professionals in 1,000 or more of its biggest
and best customers to help it test its key operating systems in their unique
environments. This will create a flood of bug fixes, improvements and
extensions that will flow back to Microsoft to improve those products.
In our opinion, the Windows source code will inevitably end up on the
Web--within six months or less--where thousands more hackers will start
working on it, exposing weaknesses. This will help Microsoft improve its
products further until they are bulletproof.

In effect, Microsoft is co-opting the open-source approach. It is
essentially recruiting the technical staff of its largest customers (and
potentially even the entire hacker community) to help it create improved
versions of its software that only it will have the right to distribute.
This becomes the vehicle that will drive the technical community to its new
model for software development and distribution.

While harnessing the power of an open-source-like strategy for Microsoft,
the access agreements specifically do not permit customers to make any
changes to the operating system source code themselves, so Microsoft retains
full legal ownership of its products. This enables it to continue charging
licensing fees and making money--something the open-source community has not
been able to do. This gives Microsoft both the distribution channel and
business model it needs to succeed with .Net.

Evangelism, viral marketing and response time

By making this policy change, Microsoft is facilitating an "evangelical"
community of die-hard software engineers within global 2000 companies that
value being more involved in contributing to future modifications and
enhancements to Windows. Microsoft has exploited this technique for years
across the developer community. Extending this type of "viral marketing" to
its operating system is an effective tactic to counterbalance the community
aspects of Linux.
However, all this is true only if Microsoft formalizes the process of
integrating into Windows suggestions supplied by third parties. At present,
nothing public indicates it is prepared to do that. IBM, for instance, never
did that for MVS. To produce open-source types of processes, Microsoft must
do much more than just give away source code. It must give people a reason
to contribute to Windows, and those people must have access to a process
whereby their contributions are quickly and obviously included in Windows.

A key measurement that will determine the success of this program is how
fast Microsoft responds to the suggestions its customers send. If it takes
Microsoft 18 months to implement them in a new version, this will only
frustrate the people that identified the issues in the first place.

Large corporate users that want access to the Microsoft source code should
contact Microsoft about this program. It will immediately enable their
internal developers and integrators to better understand how the Microsoft
operating systems work, so they can optimize their systems accordingly. It
also gives these users reassurance that nothing in the code is working
against them. And it gives users the chance to identify extensions and fixes
that they can pass back to Microsoft that ultimately will help them.

Meta Group analysts William Zachmann, Peter Burris, David Cearley, Daniel
Sholler, David Yockelson, Dale Kutnick, Jack Gold, Steve Kleynhans, Mike
Gotta and Val Sribar contributed to this article.

Visit Metagroup.com for more analysis of key IT and e-business issues.

Entire contents, Copyright ? 2001 Meta Group, Inc. All rights reserved.


From Eugene.Leitl at lrz.uni-muenchen.de  Fri Mar  9 07:50:41 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 9 Mar 2001 16:50:41 +0100 (MET)
Subject: DBases in very large RAMDisks
Message-ID: <Pine.GSO.4.03.10103091647280.26053-100000@sun1.lrz-muenchen.de>

In my current application, I have a purely static ~700 MBytes dbase,
indices and all. It appeared to me, that even without partitioning across
machines, this would fit into a 1 GByte machine's RAMDisk (much cheaper
and noticeably faster than a solid-state disk, I would imagine), and offer
much better reponse times without changing a single line of code. A single
machine could thus serve one or two orders of magnitude more queries, or
far more complex (and hence forbiddingly expensive) queries.

I'm sure somebody here has experiences with such a setup, are there any
gotchas? What is the end-user speedup to expect? What is further speedup
typically, if one bypasses the filesystem entirely, and (the logical next
step) operates on stuff loaded directly into memory?

TIA,
-- Eugene


From JParker at coinstar.com  Fri Mar  9 08:26:34 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Fri, 9 Mar 2001 08:26:34 -0800
Subject: Microsoft co-opts open source approach? (fwd)
Message-ID: <OF53A04E3B.C853F2CA-ON88256A0A.005A13BE@coinstar.com>

G'Day !

The reports I saw said that they are releaseing source to INTERNAL APPLICATIONS
developement teams, within major customers, so that they may optimize thier code
for high avability.  With all the NDA's in place.

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


                    Eugene Leitl                                                                                                             
                    <Eugene.Leitl at lrz.uni-mu        To:     beowulf at beowulf.org                                                              
                    enchen.de>                      cc:                                                                                      
                    Sent by:                        Subject:     Microsoft co-opts open source approach? (fwd)                               
                    beowulf-admin at beowulf.or                                                                                                 
                    g                                                                                                                        
                                                                                                                                             
                                                                                                                                             
                    03/09/01 06:53 AM                                                                                                        
                                                                                                                                             
                                                                                                                                             
Sure, it will be a cold day in hell, but still worth watching.

---------- Forwarded message ----------
Date: Fri, 9 Mar 2001 09:19:13 -0500
From: Gary Pupurs <garyp at speakeasy.net>
To: FoRK at xent.com
Subject: Microsoft co-opts open source approach?

[Can't believe this didn't get FoRKed yet, but maybe it did and I missed it.
if (wasForkedAlready) {gary.express.apologies;}]


This is nothing short of earth-shattering (IMHO).  Very interesting things
going on within Microsoft these days. (More to the point, what in the world
is going on in there?)  I've heard several of the major developers are
pushing internally to open source as much of the .NET Framework as possible
when it's released, but this blows that idea out of the water by
comparison...

-g

http://news.cnet.com/news/0-1003-201-5067896-0.html


Commentary: Microsoft co-opts open source approach

By Meta Group
Special to CNET News.com
March 8, 2001, 11:50 a.m. PT

In a major extension of corporate policy, Microsoft has quietly started a
program to provide selected large enterprise customers with copies of the
source code for Windows 2000 (Professional, Server, Advanced Server and Data
Center), Windows XP (released betas) and all related service packs.

The standard agreement, which resembles those under which IBM has
traditionally made source code of its operating systems available, allows
customers to consult the Windows source code when debugging their own
applications and to better integrate Windows with individual corporate
environments.

However, the agreement does not allow customers to modify or customize the
code, and Microsoft anticipates that problems or bugs that customers may
find in Windows will be reported to Microsoft for resolution through normal
support channels.

Microsoft lists the main benefits of the program to customers as follows:
one, augmenting the ability to debug and optimize customers' internal
applications; two, improving troubleshooting of deployed Windows
environments; and three, increasing understanding of Windows to promote
long-term success of the customer's organization.

Microsoft says it has already released copies of source code for Windows
2000 and Windows XP to a few large clients as well as to academic
institutions and large original equipment manufacturers (OEMs). Now it is
formalizing this process, extending it to Global 2000 customers and making
it routine. The company expects to offer source code to approximately 1,000
large users with enterprise-level agreements with Microsoft. This program
will initially be available only to U.S.-based users, but it will eventually
be available worldwide.

A first step toward open source?

We believe this is an important change in Microsoft policy. It may be the
first step toward a new software development, distribution and business
model similar to open source and designed to support the Internet-based
environment of Microsoft's .Net platform.

The greatest initial benefit to Microsoft from this move will be building
trust with its largest customers. We tend to trust those who trust us, and
releasing source code on the basis of mutual trust will encourage those
clients to trust Microsoft.
However, the long-term importance of this change is its impact on enabling
the software industry (in particular, Microsoft) to leverage more
Internet-style business and distribution models. The biggest problem the
software industry currently has in leveraging the Internet is not
technology--it's developing a viable business model.

Software makers need to find ways to develop and test their
products--particularly operating systems--that can work in the multitude of
different environments that exist in an Internet-based infrastructure, while
retaining ownership and thereby making money.

For instance, the heavily hyped ASP (application service provider) model is
now faltering because it had no way to provide the integration and services
that companies need. The open-source movement harnessed thousands of users
to do distributed development and testing over the huge number of components
that this model demands, but software vendors did not control the products,
and now they are struggling to find ways to make money.

The power of distribution

Microsoft is in the best position to succeed in this new software
environment, and this change in its corporate policy is a good step in that
direction. Although Microsoft is certainly a technology developer, its real
power comes from its position as the best software distributor in the
industry.

The advantage of providing Windows source code is that Microsoft enlists
tens of thousands of software professionals in 1,000 or more of its biggest
and best customers to help it test its key operating systems in their unique
environments. This will create a flood of bug fixes, improvements and
extensions that will flow back to Microsoft to improve those products.
In our opinion, the Windows source code will inevitably end up on the
Web--within six months or less--where thousands more hackers will start
working on it, exposing weaknesses. This will help Microsoft improve its
products further until they are bulletproof.

In effect, Microsoft is co-opting the open-source approach. It is
essentially recruiting the technical staff of its largest customers (and
potentially even the entire hacker community) to help it create improved
versions of its software that only it will have the right to distribute.
This becomes the vehicle that will drive the technical community to its new
model for software development and distribution.

While harnessing the power of an open-source-like strategy for Microsoft,
the access agreements specifically do not permit customers to make any
changes to the operating system source code themselves, so Microsoft retains
full legal ownership of its products. This enables it to continue charging
licensing fees and making money--something the open-source community has not
been able to do. This gives Microsoft both the distribution channel and
business model it needs to succeed with .Net.

Evangelism, viral marketing and response time

By making this policy change, Microsoft is facilitating an "evangelical"
community of die-hard software engineers within global 2000 companies that
value being more involved in contributing to future modifications and
enhancements to Windows. Microsoft has exploited this technique for years
across the developer community. Extending this type of "viral marketing" to
its operating system is an effective tactic to counterbalance the community
aspects of Linux.
However, all this is true only if Microsoft formalizes the process of
integrating into Windows suggestions supplied by third parties. At present,
nothing public indicates it is prepared to do that. IBM, for instance, never
did that for MVS. To produce open-source types of processes, Microsoft must
do much more than just give away source code. It must give people a reason
to contribute to Windows, and those people must have access to a process
whereby their contributions are quickly and obviously included in Windows.

A key measurement that will determine the success of this program is how
fast Microsoft responds to the suggestions its customers send. If it takes
Microsoft 18 months to implement them in a new version, this will only
frustrate the people that identified the issues in the first place.

Large corporate users that want access to the Microsoft source code should
contact Microsoft about this program. It will immediately enable their
internal developers and integrators to better understand how the Microsoft
operating systems work, so they can optimize their systems accordingly. It
also gives these users reassurance that nothing in the code is working
against them. And it gives users the chance to identify extensions and fixes
that they can pass back to Microsoft that ultimately will help them.

Meta Group analysts William Zachmann, Peter Burris, David Cearley, Daniel
Sholler, David Yockelson, Dale Kutnick, Jack Gold, Steve Kleynhans, Mike
Gotta and Val Sribar contributed to this article.

Visit Metagroup.com for more analysis of key IT and e-business issues.

Entire contents, Copyright ? 2001 Meta Group, Inc. All rights reserved.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From sjarczyk at wist.net.pl  Fri Mar  9 08:35:50 2001
From: sjarczyk at wist.net.pl (Sergiusz Jarczyk)
Date: Fri, 9 Mar 2001 17:35:50 +0100 (CET)
Subject: DBases in very large RAMDisks
In-Reply-To: <Pine.GSO.4.03.10103091647280.26053-100000@sun1.lrz-muenchen.de>
Message-ID: <Pine.LNX.4.21.0103091728450.15611-100000@dns-01.wist.net.pl>

Welcome
This topic was discussed on many lists several times, and always ideas was
crashed by one simple question - what happen when server crashes, or
simply power goes down ? You can syncing data from memory with data on
disks, but if you'll doing this in short period, overall performance won't
be differ so much from "classical" implementations.

Sergiusz


From JParker at coinstar.com  Fri Mar  9 08:38:47 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Fri, 9 Mar 2001 08:38:47 -0800
Subject: DBases in very large RAMDisks
Message-ID: <OFD81763F4.9A767CFF-ON08256A0A.005B0E81@coinstar.com>

G'Day !

I believe M$ used the concept in some large internal SQL databases using SAP
around 5 years ago when I was contracting there.  I do not know the exact
details, or how successfully they were, but it was a much larger than the
database you are talking about.

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


                    Eugene Leitl                                                                                                             
                    <Eugene.Leitl at lrz.uni-mu        To:     beowulf at beowulf.org                                                              
                    enchen.de>                      cc:                                                                                      
                    Sent by:                        Subject:     DBases in very large RAMDisks                                               
                    beowulf-admin at beowulf.or                                                                                                 
                    g                                                                                                                        
                                                                                                                                             
                                                                                                                                             
                    03/09/01 07:50 AM                                                                                                        
                                                                                                                                             
                                                                                                                                             
In my current application, I have a purely static ~700 MBytes dbase,
indices and all. It appeared to me, that even without partitioning across
machines, this would fit into a 1 GByte machine's RAMDisk (much cheaper
and noticeably faster than a solid-state disk, I would imagine), and offer
much better reponse times without changing a single line of code. A single
machine could thus serve one or two orders of magnitude more queries, or
far more complex (and hence forbiddingly expensive) queries.

I'm sure somebody here has experiences with such a setup, are there any
gotchas? What is the end-user speedup to expect? What is further speedup
typically, if one bypasses the filesystem entirely, and (the logical next
step) operates on stuff loaded directly into memory?

TIA,
-- Eugene


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From Eugene.Leitl at lrz.uni-muenchen.de  Fri Mar  9 08:47:24 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 9 Mar 2001 17:47:24 +0100 (MET)
Subject: DBases in very large RAMDisks
In-Reply-To: <Pine.LNX.4.21.0103091728450.15611-100000@dns-01.wist.net.pl>
Message-ID: <Pine.GSO.4.03.10103091742250.26053-100000@sun1.lrz-muenchen.de>

On Fri, 9 Mar 2001, Sergiusz Jarczyk wrote:

> This topic was discussed on many lists several times, and always ideas was
> crashed by one simple question - what happen when server crashes, or
> simply power goes down ? You can syncing data from memory with data on

Simple servers, especially written in high-level scripted languages with
gc (Python or Scheme) don't crash so easily.

If power goes down and no UPS is available, the server goes offline.
It will boot up fast enough (LinuxBIOS can boot up in ~10 s, and use
fsckless file systems to fill the RAMdisks with).

Nothing can happen to the database itself, as in my case the database is
entirely static.

> disks, but if you'll doing this in short period, overall performance won't
> be differ so much from "classical" implementations.


From zarquon at zarq.dhs.org  Fri Mar  9 09:11:25 2001
From: zarquon at zarq.dhs.org (R C)
Date: Fri, 9 Mar 2001 12:11:25 -0500
Subject: Microsoft co-opts open source approach? (fwd)
In-Reply-To: <OF53A04E3B.C853F2CA-ON88256A0A.005A13BE@coinstar.com>; from JParker@coinstar.com on Fri, Mar 09, 2001 at 08:26:34AM -0800
References: <OF53A04E3B.C853F2CA-ON88256A0A.005A13BE@coinstar.com>
Message-ID: <20010309121125.A1417@zarq.dhs.org>

On Fri, Mar 09, 2001 at 08:26:34AM -0800, JParker at coinstar.com wrote:
> 
> G'Day !
> 
> The reports I saw said that they are releaseing source to INTERNAL APPLICATIONS
> developement teams, within major customers, so that they may optimize thier code
> for high avability.  With all the NDA's in place.

There was a largish thread on this in linux-kernel.  MS has been doing this
for a while, they're just opening it up a bit more (customers with 1500+
licenses).  You don't get any special support treatment on support (well,
aside from any special treatment 1500+ licenses gets you.)  No modifications,
strict NDA.  The idea, as stated above, is to let app developers see the
internals and figure out what's really happening (because most windows
programmers have been bit by undocumented or wrongly documented behavior
at some point.)  One person claimed it wouldn't help much, as the code
is extremely difficult to read, with a very confusing exceptions model.

R C


From newt at scyld.com  Fri Mar  9 09:34:09 2001
From: newt at scyld.com (Daniel Ridge)
Date: Fri, 9 Mar 2001 12:34:09 -0500 (EST)
Subject: Question about BProc process migration/ Scyld.
In-Reply-To: <Pine.LNX.4.10.10103081702500.860-100000@pengy.chakatodd.net>
Message-ID: <Pine.LNX.4.21.0103091228051.28081-100000@eleanor.wdhq.scyld.com>

On Thu, 8 Mar 2001, Todd Chapman wrote:

> 1. Dou you need access the the program's source code to make modifications
> for process migration?

Yes, no, and maybe.

	If you want for a regular program to be able to spawn a remote
child where it would normally spawn a local child, you need to be able to
either mess with the dynamic libraries it sees (LD_PRELOAD or similar) or
alter the source code.

	If you want to be able to spawn an unmodified program on a remote
node, the Scyld distribution make this easy. You can use the existing
program (bpsh) which acts like rsh, but uses our system. You can also
program to the BProc library directly and use the bproc_execmove() call.

> 2. Or, can a wrapper be written that is migrated and spawns the real
> application?

Yes. This is the 'bpsh' wrapper described above.

> If there is any documentation that explains this more plainly than the
> Scyld documentation I would really be interested.

Which documentation do you mean? The documentation under
/usr/doc/beowulf-doc-XXX contains everything I have just mentioned.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From rgb at phy.duke.edu  Fri Mar  9 09:53:00 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 9 Mar 2001 12:53:00 -0500 (EST)
Subject: DBases in very large RAMDisks
In-Reply-To: <Pine.LNX.4.21.0103091728450.15611-100000@dns-01.wist.net.pl>
Message-ID: <Pine.LNX.4.30.0103091243050.7198-100000@ganesh.phy.duke.edu>

On Fri, 9 Mar 2001, Sergiusz Jarczyk wrote:

> Welcome
> This topic was discussed on many lists several times, and always ideas was
> crashed by one simple question - what happen when server crashes, or
> simply power goes down ? You can syncing data from memory with data on
> disks, but if you'll doing this in short period, overall performance won't
> be differ so much from "classical" implementations.

The same problem exists even if you leave the data on disk.  A lot of
disk I/O on Unixoid systems is buffered and cached in both directions
anyway by the operating system, and only is guaranteed to be written to
disk by e.g. the sync command or read from the disk if the file has been
altered or touched after its last read. If some particular part of the
DB is most commonly accessed, it will likely find its way into cache on
any sufficiently large-memory system and queries will be answered out of
cache instead of off the disk anyway.  I think that cache functions like
a FIFO and that pages persist until the space is recovered for other
purposes (e.g. mallocs or to run processes or to load active pages from
running processes), but I'm not absolutely certain.

One interesting question is therefore how much performance benefit you'd
actually gain by running from a ramdisk vs letting the OS cache the disk
(and running effectively from a ramdisk) but also letting the OS handle
items such as sync'ing the VFS with the actual file on disk.

You might do just as well by buying/building a system with 1-2 GB of
memory (lots of room for cache and buffers) and running only the DB
application plus the OS plus (perhaps) an application that at boot time
"reads" the entire DB as a file image, effectively preloading the disk
cache.  There might also be some way of tuning the OS to cache the file
pages more aggressively or to increase the size of its tables of pages
so that it can cache the entire DB at once, but I don't really know what
those limits are in linux and it may not be necessary.

   rgb

>
> Sergiusz
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From newt at scyld.com  Fri Mar  9 10:21:47 2001
From: newt at scyld.com (Daniel Ridge)
Date: Fri, 9 Mar 2001 13:21:47 -0500 (EST)
Subject: job migration
In-Reply-To: <20010308161748.N7935@kotako.analogself.com>
Message-ID: <Pine.LNX.4.21.0103091313240.28235-100000@eleanor.wdhq.scyld.com>

On Thu, 8 Mar 2001, Jag wrote:

> On Thu, 08 Mar 2001, Daniel Ridge wrote:

> > Why not just use fork()/bproc_execmove() ? You don't need then to have any
> > of the binaries installed remotely. You can (if you're clever) even
> > manage to get the dynamic link step to happen on the frontend. 
> 
> I didn't do that because I need to be able to redirect
> stdin/stdout/stderr.  BProc can't forward open file descriptors, so I
> have to rfork, open the new files, use the dup2 magic to redirect stdin
> and stdout, then exec() the process.  See my recent post on the bproc
> list if you actually want to see some code for that.

bpsh knows how to do this already. Would it be more helpful if I made bpsh
available as libbpsh?

Regards,
	Dan Ridge
	Scyld Computing Corporation


From whitney at math.berkeley.edu  Fri Mar  9 10:30:16 2001
From: whitney at math.berkeley.edu (Wayne Whitney)
Date: Fri, 9 Mar 2001 10:30:16 -0800 (PST)
Subject: 8 DIMM Slot PIII/Athlon Motherboard ?
Message-ID: <Pine.LNX.4.30.0103091025400.29337-100000@mf1.private>

Hi All,

I'm looking for a PIII/Athlon motherboard that has 8 DIMM slots and
handles 32x8 (high-density) 512MB DIMMs.  Does anyone know of one?  On the
PIII side, dual SMP is preferable, but single PIII would be OK.

My goal is to add a 4GB main RAM machine to my little cluster on the
cheap.  8 $160 32x8 512MB DIMMs cost $1280, while 4 $650 16x16 1GB DIMMs
cost $2600.

I believe the Serverworks HE chipset can handle 8 DIMM slots, as it does
2-way interleaving, so it can do 4 DIMM slots on each channel.  However,
the motherboards I've seen with this chipset typically have just 4 DIMM
slots.  Moreover, I don't know if this chipset handles 32x8 512MB DIMMs.

Any pointers would be appreciated.

Cheers,
Wayne


From simen-tt at online.no  Fri Mar  9 12:43:52 2001
From: simen-tt at online.no (Simen Thoresen)
Date: Fri, 09 Mar 2001 21:43:52 +0100
Subject: 8 DIMM Slot PIII/Athlon Motherboard ?
In-Reply-To: <Pine.LNX.4.30.0103091025400.29337-100000@mf1.private>
References: <Pine.LNX.4.30.0103091025400.29337-100000@mf1.private>
Message-ID: <200103092143520078.1951531D@scispor.dolphinics.no>

>Hi All,
>
>I'm looking for a PIII/Athlon motherboard that has 8 DIMM slots and
>handles 32x8 (high-density) 512MB DIMMs.  Does anyone know of one?  On the
>PIII side, dual SMP is preferable, but single PIII would be OK.
>
>My goal is to add a 4GB main RAM machine to my little cluster on the
>cheap.  8 $160 32x8 512MB DIMMs cost $1280, while 4 $650 16x16 1GB DIMMs
>cost $2600.
>
>I believe the Serverworks HE chipset can handle 8 DIMM slots, as it does
>2-way interleaving, so it can do 4 DIMM slots on each channel.  However,
>the motherboards I've seen with this chipset typically have just 4 DIMM
>slots.  Moreover, I don't know if this chipset handles 32x8 512MB DIMMs.
>
The Tyan Thunder 2500 is the only ServerWorks HE (not HE-Sl) board that I know of that is 'easily' available. It features 8 DIMM slots (for up to 8GB ram),

You'll need registered ECC memory for this one, tho.

http://www.tyan.com/products/html/thunder2500_p.html

-S
--
Simen Thoresen, Beowulf-cleaner and random artist - close and personal.

Er det ikke rart?
The gnu RART-project on http://valinor.dolphinics.no:1080/~simentt/rart


From sam at venturatech.com  Fri Mar  9 13:21:33 2001
From: sam at venturatech.com (sam at venturatech.com)
Date: Fri, 09 Mar 2001 13:21:33 -0800
Subject: Beowulf digest, Vol 1 #312 - 3 msgs
Message-ID: <MDAEMON2360236200103091321.AA2133023@ventura.net>

Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.

Thank you for your continued business.

Sam Lewis


From JParker at coinstar.com  Fri Mar  9 16:48:51 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Fri, 9 Mar 2001 16:48:51 -0800
Subject: channel bonding
Message-ID: <OF63385512.ACDC8F2A-ON88256A0B.00043BBC@coinstar.com>

G'Day !

I have been following the discussion(s) with some interest.  Now I want to know
more, but I don't know which manual to read ... any suggestions, especially as
applied to a cluster ?

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


From rlatham at plogic.com  Fri Mar  9 17:29:56 2001
From: rlatham at plogic.com (Rob Latham)
Date: Fri, 9 Mar 2001 20:29:56 -0500
Subject: channel bonding
In-Reply-To: <OF63385512.ACDC8F2A-ON88256A0B.00043BBC@coinstar.com>; from JParker@coinstar.com on Fri, Mar 09, 2001 at 04:48:51PM -0800
References: <OF63385512.ACDC8F2A-ON88256A0B.00043BBC@coinstar.com>
Message-ID: <20010309202956.D8545@otto.plogic.internal>

On Fri, Mar 09, 2001 at 04:48:51PM -0800, JParker at coinstar.com wrote:
> G'Day !
> 
> I have been following the discussion(s) with some interest.  Now I
> want to know more, but I don't know which manual to read ... any
> suggestions, especially as applied to a cluster ?

you won't find much better discussion about the merrits and
implementation details of channel bonding than this list.  as for 'the
manual', i guess that'd be bonding.txt in the kernel source:

http://lxr.linux.no/source/Documentation/networking/bonding.txt?v=2.2.18

==rob

-- 
[ Rob Latham <rlatham at plogic.com>         Developer, Admin, Alchemist ]
[ Paralogic Inc. - www.plogic.com                                     ]
[                                                                     ]
[ EAE8 DE90 85BB 526F 3181                   1FCF 51C4 B6CB 08CC 0897 ]


From rlatham at plogic.com  Fri Mar  9 17:31:38 2001
From: rlatham at plogic.com (Rob Latham)
Date: Fri, 9 Mar 2001 20:31:38 -0500
Subject: Dual Athlon
In-Reply-To: <Pine.LNX.4.30.0103082158160.6246-100000@ganesh.phy.duke.edu>; from rgb@phy.duke.edu on Thu, Mar 08, 2001 at 10:01:27PM -0500
References: <Pine.LNX.4.30.0103082158160.6246-100000@ganesh.phy.duke.edu>
Message-ID: <20010309203138.E8545@otto.plogic.internal>

On Thu, Mar 08, 2001 at 10:01:27PM -0500, Robert G. Brown wrote:
> Dear Listvolken,
> 
> I'm taking orders/suggestions for Benchmarks You
> Would Like To See on Dual Athlons.  Send 'em to me, with benchmark URL's
> if you know 'em, and I'll try to run them and will publish all results
> to the list when I'm done.

Since the only /true/ benchmark is "your code", does that mean you
just offered free cpu time to the entire beowulf community?

==rob

-- 
[ Rob Latham <rlatham at plogic.com>         Developer, Admin, Alchemist ]
[ Paralogic Inc. - www.plogic.com                                     ]
[                                                                     ]
[ EAE8 DE90 85BB 526F 3181                   1FCF 51C4 B6CB 08CC 0897 ]


From RSchilling at affiliatedhealth.org  Fri Mar  9 17:44:37 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Fri, 9 Mar 2001 17:44:37 -0800 
Subject: Sequent 2000
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1102@mail1.affiliatedhealth.org>

I just picked up a Sequent 2000/290 for a pretty good price, and the
documentation speaks of clustering.  Does anyone have any experience
clustering with the Sequent boxes?  Happy to share experiences.

Thanks.

Richard Schilling
Webmaster / Web Integration Programmer
Affiliated Health Services
Mount Vernon, WA 
http://www.affiliatedhealth.org


From tlovie at pokey.mine.nu  Fri Mar  9 18:31:49 2001
From: tlovie at pokey.mine.nu (Thomas Lovie)
Date: Fri, 9 Mar 2001 21:31:49 -0500
Subject: DBases in very large RAMDisks
In-Reply-To: <Pine.GSO.4.03.10103091647280.26053-100000@sun1.lrz-muenchen.de>
Message-ID: <NDBBLHMKNAFPMALFDAGJGEIBCLAA.tlovie@pokey.mine.nu>

Many of the commercial databases already effectively do this.  The database
engine knows what pages to read and write to disk, and it will cache
information in RAM up to the total resources it is allowed to use.  Initial
reads will be from disk, but subsequent ones will use the information in
RAM.  In addition the OS will provide another level of cache, but it will
generally not be as good as the one inside the database.  I can't comment on
any of the low cost databases available for Linux, but commercial solutions
like Sybase and Oracle would definitely have this level of sophistication
built in.

As usual, the performance gains that you could see would depend on the
application.  If your database was a few large tables, that couldn't all fit
into RAM, perhaps you would see significant gains just by making more
resources available.  However, if your database was many small tables which
couldn't all fit into RAM either, but your queries only operated on a subset
of the tables that could fit into RAM, then the performance gains may not be
as significant.

One other point to make is that if your database has full transactional
support, then there could potentially be a bottleneck.  The transaction log
is a write-ahead log, as in the database writes what it intends to do, all
the way to disk, then does that operation on the database (usually in RAM)
then writes that it was successful to the transaction log, then sync's the
database to disk when it has free time.  So if your application has *alot*
of small transactions, there may be a performace issue here.

RAM is cheap, why don't you try it?

Tom Lovie.

-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of Eugene Leitl
Sent: Friday, March 09, 2001 10:51 AM
To: beowulf at beowulf.org
Subject: DBases in very large RAMDisks


In my current application, I have a purely static ~700 MBytes dbase,
indices and all. It appeared to me, that even without partitioning across
machines, this would fit into a 1 GByte machine's RAMDisk (much cheaper
and noticeably faster than a solid-state disk, I would imagine), and offer
much better reponse times without changing a single line of code. A single
machine could thus serve one or two orders of magnitude more queries, or
far more complex (and hence forbiddingly expensive) queries.

I'm sure somebody here has experiences with such a setup, are there any
gotchas? What is the end-user speedup to expect? What is further speedup
typically, if one bypasses the filesystem entirely, and (the logical next
step) operates on stuff loaded directly into memory?

TIA,
-- Eugene


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From agrajag at linuxpower.org  Fri Mar  9 20:57:40 2001
From: agrajag at linuxpower.org (Jag)
Date: Fri, 9 Mar 2001 20:57:40 -0800
Subject: job migration
In-Reply-To: <Pine.LNX.4.21.0103091313240.28235-100000@eleanor.wdhq.scyld.com>; from newt@scyld.com on Fri, Mar 09, 2001 at 01:21:47PM -0500
References: <20010308161748.N7935@kotako.analogself.com> <Pine.LNX.4.21.0103091313240.28235-100000@eleanor.wdhq.scyld.com>
Message-ID: <20010309205740.O7935@kotako.analogself.com>

On Fri, 09 Mar 2001, Daniel Ridge wrote:

> On Thu, 8 Mar 2001, Jag wrote:
> 
> > On Thu, 08 Mar 2001, Daniel Ridge wrote:
> 
> > > Why not just use fork()/bproc_execmove() ? You don't need then to have any
> > > of the binaries installed remotely. You can (if you're clever) even
> > > manage to get the dynamic link step to happen on the frontend. 
> > 
> > I didn't do that because I need to be able to redirect
> > stdin/stdout/stderr.  BProc can't forward open file descriptors, so I
> > have to rfork, open the new files, use the dup2 magic to redirect stdin
> > and stdout, then exec() the process.  See my recent post on the bproc
> > list if you actually want to see some code for that.
> 
> bpsh knows how to do this already. Would it be more helpful if I made bpsh
> available as libbpsh?

I looked over bpsh, but couldn't find much that it does in the way of IO
redirection except forwarding stdin and stdout over the network.
Would libbpsh do something like bproc_execmove(), except you also give
it fd's for stdin, stdout, and stderr, where the fd's you give it are
fd's on the master node that the exec()'ed process uses transparently
over the network?  If so, then I think its something that would be quite
useful for any kind of program that wants to propagate jobs to the slave
nodes.

Out of curiosity, what overhead is there for this?  Is it just an extra
process on the master node used for processing all the IO requests?


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010309/97059a92/attachment.sig>

From rgb at phy.duke.edu  Sat Mar 10 07:32:58 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 10 Mar 2001 10:32:58 -0500 (EST)
Subject: Dual Athlon
In-Reply-To: <20010309203138.E8545@otto.plogic.internal>
Message-ID: <Pine.LNX.4.30.0103101030520.9639-100000@ganesh.phy.duke.edu>

On Fri, 9 Mar 2001, Rob Latham wrote:

> On Thu, Mar 08, 2001 at 10:01:27PM -0500, Robert G. Brown wrote:
> > Dear Listvolken,
> >
> > I'm taking orders/suggestions for Benchmarks You
> > Would Like To See on Dual Athlons.  Send 'em to me, with benchmark URL's
> > if you know 'em, and I'll try to run them and will publish all results
> > to the list when I'm done.
>
> Since the only /true/ benchmark is "your code", does that mean you
> just offered free cpu time to the entire beowulf community?

:-)  Very funny;-)

If only I could.  But it's Not My Box, just a short-term loaner.  So I'm
afraid that however useless they are, traditional benchmarks will have
to do for the moment.  Now, if I can just convince AMD to give me a few
of those pre-release dual suckers to "test" in a leedle beowulf...;-)

    rgb

>
> ==rob
>
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From sam at venturatech.com  Sat Mar 10 09:03:55 2001
From: sam at venturatech.com (sam at venturatech.com)
Date: Sat, 10 Mar 2001 09:03:55 -0800
Subject: Beowulf digest, Vol 1 #313 - 14 msgs
Message-ID: <MDAEMON2660266200103100903.AA0355028@ventura.net>

Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.

Thank you for your continued business.

Sam Lewis


From edwards at icantbelieveimdoingthis.com  Sat Mar 10 23:38:47 2001
From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834)
Date: Sun, 11 Mar 2001 00:38:47 -0700
Subject: First errors
Message-ID: <3AAB2B87.3000105@icantbelieveimdoingthis.com>

I've just installed scyld on a small cluster. I started trying to run 
the test problems.

pi3.f will run if I use

mpirun -np 1 pi3

but any larger value for np generates the following error.

p0_20404:  p4_error: net_create_slave: bproc_rfork: -1
   p4_error: latest msg from perror: Invalid argument
bm_list_20405:  p4_error: interrupt SIGINT: 2


Many of the standard checks seem to work on the cluster:

bpsh -a uptime gives me

11:38pm  up 1 day, 12:40,  0 users,  load average: 0.00, 0.00, 0.00
11:38pm  up 1 day, 12:37,  0 users,  load average: 0.00, 0.00, 0.00
11:38pm  up 1 day, 12:37,  0 users,  load average: 0.00, 0.00, 0.00
11:38pm  up 1 day, 12:37,  0 users,  load average: 0.00, 0.00, 0.00
11:38pm  up 1 day, 12:33,  0 users,  load average: 0.00, 0.00, 0.00
11:38pm  up 1 day, 12:35,  0 users,  load average: 0.00, 0.00, 0.00
11:38pm  up 1 day, 12:34,  0 users,  load average: 0.00, 0.00, 0.00

Any insight would be greatly appreciated.

Art Edwards


From tekka99 at libero.it  Sun Mar 11 02:54:07 2001
From: tekka99 at libero.it (Gianluca Cecchi)
Date: Sun, 11 Mar 2001 11:54:07 +0100
Subject: 8 DIMM Slot PIII/Athlon Motherboard ?
References: <Pine.LNX.4.30.0103091025400.29337-100000@mf1.private> <200103092143520078.1951531D@scispor.dolphinics.no>
Message-ID: <001e01c0aa19$99576600$5df31d97@W2KCECCHI1>

> >8 $160 32x8 512MB DIMMs cost $1280, while 4 $650 16x16 1GB DIMMs
> >cost $2600.
Any pointer to where to buy for these proces true good memories? 
Thanks
Bye,
Gianluca Cecchi


From mathboy at velocet.ca  Mon Mar  5 21:35:51 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 6 Mar 2001 00:35:51 -0500
Subject: high physical density cluster design - power/heat/rf questions
Message-ID: <20010306003551.K84763@velocet.ca>

I have some questions about a cluster we're designing. We really need
a relatively high density configuration here, in terms of floor space.

To be able to do this I have found out pricing on some socket A boards with
onboard NICs and video (dont need video though). We arent doing anything
massively parallel right now (just running Gaussian/Jaguar/MPQC calculations)
so we dont need major bandwidth.* We're booting with root filesystem over
NFS  on these boards. Havent decided on FreeBSD or Linux yet. (This email
isnt about software config, but feel free to ask questions).

(* even with NFS disk we're looking at using MFS on freebsd (or possibly
the new md system) or the new nbd on linux or equivalent for gaussian's
scratch files - oodles faster than disk, and in our case, with no
disk, it writes across the network only when required. Various tricks
we can do here.)

The boards we're using are PC Chip M810 boards (www.pcchips.com). Linux seems
fine with the NIC on board (SiS chip of some kind - Ben LaHaise of redhat is
working with me on some of the design and has been testing it for Linux, I
have yet to play with freebsd on it).

The configuration we're looking at to achieve high physical density is
something like this:

               NIC and Video connectors
              /
 ------------=--------------	 board upside down
    | cpu |  =  |   RAM   |
    |-----|     |_________|
    |hsync|
    |     |      --fan--
    --fan--      |     | 
   _________     |hsync|
  |         |    |-----|
  |  RAM    | =  | cpu |
 -------------=-------------	board right side up

as you can see the boards kind of mesh together to take up less space. At
micro ATX factor (9.25" I think per side) and about 2.5 or 3" high for the
CPU+Sync+fan (tallest) and 1" tall for the ram or less, I can stack two of
these into 7" (4U). At 9.25" per side, 2 wide inside a cabinet gives me 4
boards per 4U in a standard 24" rack footprint. If I go 2 deep as well (ie 2x2
config), then for every 4U I can get 16 boards in.

The cost for this is amazing, some $405 CDN right now for Duron 800s with
128Mb of RAM each without the power supply (see below; standard ATX power is
$30 CDN/machine). For $30000 you can get a large ass-load of machines ;)

Obviously this is pretty ambitious. I heard talk of some people doing
something like this, with the same physical confirguration and cabinet
construction, on the list. Wondering what your experiences have been.


Problem 1
"""""""""
The problem is in the diagram above, the upside down board has another board
.5" above it - are these two boards going to leak RF like mad and interefere
with eachothers' operations? I assume there's not much to do there but to put
a layer of grounded (to the cabinet) metal in between.  This will drive up the
cabinet construction costs. I'd rather avoid this if possible.

Our original construction was going to be copper pipe and plexiglass sheeting,
but we're not sure that this will be viable for something that could be rather
tall in our future revisions of our model. Then again, copper pipe can be
bolted to our (cement) ceiling and floor for support.

For a small model that Ben LaHaise built, check the pix at
http://trooper.velocet.ca/~mathboy/giocomms/images

Its quick a hack, try not to laugh. It does engender the 'do it damn cheap'
mentality we're operating with here.

The boards are designed to slide out the front once the power and network
are disconnected.

An alternate construction we're considering is sheet metal cutting and
folding, but at much higher cost.


Problem 2 - Heat Dissipation
"""""""""""""""""""""""""""" 
The other problem we're going to have is heat. We're going to need to build
our cabinet such that its relatively sealed, except at front, so we can get
some coherent airflow in between boards. I am thinking we're going to need to
mount extra fans on the back (this is going to make the 2x2 design a bit more
tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
stacks of 32, just 16U high). I dont know what you can suggest here, its all
going to depend on physical configuration. The machine is housed in a proper
environment (Datavaults.com's facilities, where I work :) thats climate
controlled, but the inside of the cabinet will still need massive airflow,
even with the room at 68F.


Problem 3 - Power
"""""""""""""""""
The power density here is going to be high. I need to mount 64 power supplies
in close proximity to the boards, another reason I might need to maintain
the 2x1 instead of 2x2 design. (2x1 allows easier access too). 

We dont really wanna pull that many power outlets into the room - I dont know
what a diskless Duron800 board with 256Mb or 512Mb ram will use, though I
guess around .75 to 1 A. Im gonna need 3 or 4 full circuits in the room (not
too bad actually). However thats alot of weight on the cabinet to hold 60 odd
power supplies, not to mention the weight of the cables themselves weighing
down on it, and a huge mess of them to boot.

I am wondering if someone has a reliable way of wiring together multiple
boards per power supply? Whats the max density per supply? Can we
go with redundant power supplies, like N+1? We dont need that much
reliability (jobs are short, run on one machine and can be restarted
elsewhere), but I am really looking for something thats going to
reduce the cabling.

As well, I am hoping there is some economy of power converted here -
a big supply will hopefully convert power for multiple boards more
efficiently than a single supply per board. However, as always, the
main concern is cost.

Any help or ideas is appreciated.

/kc
-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From drh at niptron.com  Tue Mar  6 14:05:45 2001
From: drh at niptron.com (D. R. Holsbeck)
Date: Tue, 06 Mar 2001 16:05:45 -0600
Subject: sk98lin gigabit driver
Message-ID: <3AA55F39.BCA9C8E@niptron.com>

Is anyone using a sysconnect card? I built the
module and all loads fine. But when I bring up
the interface keeps going up and down. Stateing 
that the network is not connected. Im using a 
2.2.16 kernel with the stock module. Any suggestions
would be greatly appreciated.


-- 
drh at niptron.com

Laugh at your problems; everybody else does.


From drh at niptron.com  Tue Mar  6 09:03:48 2001
From: drh at niptron.com (D. R. Holsbeck)
Date: Tue, 06 Mar 2001 11:03:48 -0600
Subject: redhat 7.0 upgrade woes
References: <Pine.GSO.3.96.1010306112358.15108A-100000@research.amnh.org>
Message-ID: <3AA51874.488827EF@niptron.com>

We use the same system(kickstart nfs). But we have only done new
installs.
But the output is on the serial port. Never had any problems with it
freezing up, that is it has froze in both video and serial mode, the
same
way. So I dont consider it a serial issue. I think that kickstart itself 
freezes up from time to time.


Jeffrey Oishi wrote:
> 
> Hi--
> 
> I'm trying to upgrade a 130 node cluster of machines with no video cards
> from RH6.1 to RH7.0. I have created an nfs-method kickstart system that
> works--if a video card is in the machine. If not, and I add
> console=ttyS0,9600n8 to the SYSLINUX.CFG, then the installer runs ok and
> starts spitting stuff out the serial port. However, the installer then
> crashes right before it starts upgrading the packages. It happily works up
> until the standard redhat screen showing each of the packages zipping by
> comes up. There it hangs. This has happened on a number of boxes.
> 
> Does anyone have any idea if the install program will even work with the
> console on a serial port?
> 
> If this doesn't work soon, I'm just going to reclone all the drives...
> 
> thanks,
> 
> j
-- 
drh at niptron.com

Laugh at your problems; everybody else does.


From hoyte at hemlock.colorado.edu  Fri Mar  9 16:29:48 2001
From: hoyte at hemlock.colorado.edu (Eric Hoyt)
Date: Fri, 9 Mar 2001 17:29:48 -0700 (MST)
Subject: ECC memory
In-Reply-To: <200103091632.LAA24401@blueraja.scyld.com>
Message-ID: <Pine.LNX.4.21.0103091711090.3862-100000@hemlock.colorado.edu>

Hi everyone,

I posted a message a few weeks back as to my dilemma of choosing between
Intel and AMD chips.  A belated thanks to all those who replied.

After deciding to go with Athlons and the ABIT KT7A motherboard, I came
across a new problem.  We'd like to use ECC memory, but VIA 133/133A based
motherboards don't seem to have ECC support.  I've seen all sorts of
conflicting accounts, but it seems to be the case that there is no ECC
support with these boards.  The odd thing is, I've seen several companies
selling their commercial Beowulf systems with KT7(A) mobos and ECC
memory.  Since non-ECC motherboards can sometimes use ECC memory with the
ECC functionality disabled, are these companies just fooling people by
selling ECC memory along with boards that can't use ECC?  

For me, this also brings up the question of how ECC works.  Does the DRAM
module perform the error correction, or does the chipset perform error
correction?  From some research and common sense, it seems the chipset
does it - otherwise, what's the point of bringing the extra 8 ECC lines
off of the memory chip?

Finally, have people found ECC to even be necessary?  From some of the
stats we've seen, it seems like ECC is the way to go.  But it also seems
like there are a lot of systems out there running reliably without
ECC memory.  Any experiences people have had with and without ECC would be
helpful.

Thanks again for any help and ideas.

Eric Hoyt


From alvin at Iplink.net  Sat Mar 10 20:13:52 2001
From: alvin at Iplink.net (alvin)
Date: Sat, 10 Mar 2001 23:13:52 -0500
Subject: DBases in very large RAMDisks
References: <Pine.LNX.4.21.0103091728450.15611-100000@dns-01.wist.net.pl>
Message-ID: <3AAAFB80.42C884F7@Iplink.net>

Sergiusz Jarczyk wrote:
> 
> Welcome
> This topic was discussed on many lists several times, and always ideas was
> crashed by one simple question - what happen when server crashes, or
> simply power goes down ? You can syncing data from memory with data on
> disks, but if you'll doing this in short period, overall performance won't
> be differ so much from "classical" implementations.

This is a generalization and as such misses many usefull places for
databases using ramdisk.
A little while ago I was working on a project where a database was being
used for session managment. The problem that we had was that the
database although very small was extreemly active. The database was kept
small by moving the information for closed sessions out to a session
history database. In this case the if the server crashed all the
sessions were lost and the database would get reinitalized in either
case. By using a ramdisk the system performance was greatly improved.

The issue is persistancy. If the database does not have to be persistant
then keeping it in a ramdisk can provide for a serious performance
improvment.


-- 
Alvin Starr                   ||   voice: (416)585-9971
Interlink Connectivity        ||   fax:   (416)585-9974
alvin at iplink.net              ||


From whitney at math.berkeley.edu  Thu Mar  8 15:13:26 2001
From: whitney at math.berkeley.edu (Wayne Whitney)
Date: Thu, 8 Mar 2001 15:13:26 -0800 (PST)
Subject: 8 DIMM Slot PIII Motherboard?
Message-ID: <Pine.LNX.4.30.0103081500490.28402-100000@mf1.private>

Hi All,

I'm looking for a PIII motherboard that has 8 DIMM slots and handles 32x8
(high-density) 512MB DIMMs.  Does anyone know of one?  Dual PIII is
prefereable, but single PIII would be OK.  My goal is to assemble a 4GB
main RAM machine on the cheap, using 8 $160 32x8 512MB DIMMs, rather than
4 $600-$700 16x16 1GB DIMMs.

According to the specifications of the Serverworks HE chipset, it can
handle 8 DIMM slots, as it does interleaving, so it can do 4 DIMM slots on
each channel.  However, the motherboards I've seen with this chipset
typically have just 4 DIMM slots.  Moreover, I don't know if this chipset
handles 32x8 512MB DIMMs.

Any pointers would be appreciated.

Cheers,
Wayne


From jcandy1 at san.rr.com  Thu Mar  8 22:29:30 2001
From: jcandy1 at san.rr.com (Jeff Candy)
Date: Thu, 08 Mar 2001 22:29:30 -0800
Subject: Plasma physics code 
Message-ID: <3AA8784A.22DA5A40@san.rr.com>

Gang,

Often I read messages from list members asking about the 
availability of applications which can make use of the 
computing resources of a contemporary Beowulf cluster.  

We have a plasma turbulence code which is by now the state-
of-the-art in the MFE (magnetic fusion energy) program.
It was developed solely on Intel-Beowulf clusters, and 
runs around the clock on these clusters mostly in an 
attempt to break new "physics ground".  However, in this 
area of physics, the equations are sufficiently complex 
that even areas that are viewed by the community-at-large 
as "explored" are badly understood (IMO).  To this end, 
I wonder if anyone wants to devote spare cycles to a 
limited version of the solver (really, a very robust 
version of the solver with abiabatic electron physics).

The equations are the so-called "gyro-kinetic Maxwell" 
equations.  The solver is Eulerian.  The grid is five-
dimensional.  MPI is the MP library.  Meaningful runs 
will take on the order of days on a 32-64 processor cluster.  
Results are made into MPEG-1 movies, presented at 
conferences, etc.

I won't go into further detail -- if anyone is interested, 
please send email to:

             jeff.candy at gat.com

and we can discuss the conditions.  Some simulation MPEGs 
can be found at:

 http://fusion.gat.com/comp/parallel/GYRO_Gallery.html

Ciao,

J.


From gerry at cs.tamu.edu  Thu Mar  8 05:40:37 2001
From: gerry at cs.tamu.edu (Gerry Creager n5jxs)
Date: Thu, 08 Mar 2001 07:40:37 -0600
Subject: redhat 7.0 upgrade woes
References: <Pine.SGI.4.30.0103061756290.991-100000@Downforce.ERC.MsState.Edu> <3AA66A03.2E4E3171@icase.edu>
Message-ID: <3AA78BD5.5591C867@cs.tamu.edu>

Josip Loncaric wrote:
> 
> One problem I've seen in upgrading 6.2->6.2+updates->7.0->7.0+updates is
> that Red Hat messed up version numbering on about a dozen packages,
> which then did not get updated to 7.0 versions.  The most obvious
> problem was gnorpm.  Since the updated 6.2 version appeared newer than
> the updated 7.0 version, gnorpm did not get replaced (but the underlying
> libc did) so afterwards gnorpm refused to work (complaining about a
> missing shared library).
> 
> The fix for this is to install the correct gnorpm (and other misnumbered
> packages) using the rpm -Uvh --force ... command, at least until Red Hat
> addresses these version numbering problems.

Hopefully, you had not already installed the RPM updates!  If youhad, or
at least, since I had, even --force didn't help!
--
Gerry Creager -- gerry at cs.tamu.edu
Network Engineering			|Research focusing on
Academy for Advanced Telecommunications |Satellite Geodesy and 
and Learning Technologies		|Geodetic Control
Texas A&M University	979.458.4020  (Phone) -- 979.847.8578  (Fax)


From patrick at myri.com  Wed Mar  7 06:30:18 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Wed, 07 Mar 2001 09:30:18 -0500
Subject: Real Time
References: <4.3.2.20010306132041.00b87390@pop.pssclabs.com> <20010306174447.A6421@wumpus>
Message-ID: <3AA645FA.A4096BCC@myri.com>

Greg Lindahl wrote:
> 
> On Tue, Mar 06, 2001 at 01:23:16PM -0800, Larry Lesser wrote:
> 
> > I am trying to find out if anyone has built a Beowulf on Mac G4s with a
> > real time operating system, MPI (any flavor) and Myrinet?
> 
> I believe that several of the vendors in the embedded systems market
> sell systems like that.

CSPI (http://www.cspi.com) is one of them. They run G4s, with Myrinet
and MPI, for real time applications.

Well MPI is not well suited for real time. However, MPI-RT
(www.mpirt.org) may be a solution. MPI SoftTech (www.mpi-softtech.com)
made some work recently on MPI-RT but I don't know if they have any
implementation for Myrinet available.

Regards.

-- 
Patrick Geoffray

---------------------------------------------------------------
|      Myricom Inc       |  University of Tennessee - CS Dept |
| 325 N Santa Anita Ave. |   Suite 203, 1122 Volunteer Blvd.  |
|   Arcadia, CA 91006    |      Knoxville, TN 37996-3450      |
|     (626) 821-5555     |      Tel/Fax : (865) 974-0482      |
---------------------------------------------------------------


From chendric at qssmeds.com  Wed Mar  7 09:39:10 2001
From: chendric at qssmeds.com (Chris Hendrickson)
Date: Wed, 07 Mar 2001 12:39:10 -0500
Subject: 32Meg RAM on Nodes?
Message-ID: <3AA6723E.8C3B7D92@qssmeds.com>

we just recently recieved out LinuxCentral copy of Scyld Beowulf 2, and
are merging our current cluster over to it, problem is, we have several
nodes that currently have only 32M of RAM., and the motherboards seem to
be very particular about RAM type. We have yet to be able to find
anything other than the Original OEM RAM that works (granted we've only
ben looking for a few days)

anyone know of a quick and easy way to build a bootfloppy that does not
use the 40M RAMdisk? but will instead partition the hard drive and put
all needed files there?

any ideas?

thanks,
Chris

--
"The box said requires Windows 95 or better... So I installed Linux"

Chris Hendrickson
QSS Group. Inc - MEDS
NASA/Goddard Space Flight Center
Voice: (301) 867-0081 Fax: (301) 867-0089


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010307/d99c34f6/attachment.html>

From jimlux at jpl.nasa.gov  Tue Mar  6 17:43:39 2001
From: jimlux at jpl.nasa.gov (Jim Lux)
Date: Tue, 6 Mar 2001 17:43:39 -0800
Subject: Real Time
References: <4.3.2.20010306132041.00b87390@pop.pssclabs.com> <20010306174447.A6421@wumpus>
Message-ID: <001601c0a6a8$0b704ec0$04a8a8c0@office1>

While not a beowulf, I am currently working on a very hard real time (<1
microsecond) system (a radar) using an MPI like interprocessor interface
between DSPs.  It is entirely possible to have hard real time systems with
nondeterministic communications.

----- Original Message -----
From: "Greg Lindahl" <lindahl at conservativecomputer.com>
To: <beowulf at beowulf.org>
Sent: Tuesday, March 06, 2001 2:44 PM
Subject: Re: Real Time


> On Tue, Mar 06, 2001 at 01:23:16PM -0800, Larry Lesser wrote:
>
> > I am trying to find out if anyone has built a Beowulf on Mac G4s with a
> > real time operating system, MPI (any flavor) and Myrinet?
>
> I believe that several of the vendors in the embedded systems market
> sell systems like that.
>
> I'm not so sure that a real time operating system coupled with MPI is
> going to do that much good compared to Linux with MPI, since use of
> MPI is going to blow away your hard real time guarantees. (Oh, darn,
> bad CRC, we have to retransmit that packet...)
>
> -- g
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


From Niels.Walet at umist.ac.uk  Wed Mar  7 01:58:37 2001
From: Niels.Walet at umist.ac.uk (Niels Walet)
Date: Wed, 07 Mar 2001 09:58:37 +0000
Subject: Scyld/random reboots
Message-ID: <3AA6064D.F4D2AD0E@umist.ac.uk>

I have reconfigured my cluster to use Scyld, but now I see a large
number of random reboots on the nodes (sometimes failures as well). Is
there any way to capture info about why this is happening? The machines
were quite stable before!
Niels

--
Dr Niels R. Walet  http://www.phy.umist.ac.uk/Theory/people/walet.html
Dept. of Physics,  UMIST,   P.O. Box 88,    Manchester,  M60 1QD,  U.K.
Phone: +44(0)161-2003693 Fax: +44(0)161-2004303 Niels.Walet at umist.ac.uk


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010307/6aeb777c/attachment.html>

From brian at chpc.utah.edu  Sun Mar 11 08:58:01 2001
From: brian at chpc.utah.edu (Brian Haymore)
Date: Sun, 11 Mar 2001 09:58:01 -0700 (MST)
Subject: sk98lin gigabit driver
In-Reply-To: <3AA55F39.BCA9C8E@niptron.com>
Message-ID: <Pine.LNX.4.30.0103110957200.18061-100000@loki.chpc.utah.edu>

I have a few of these and at times they seem to not like some
auto-negotiation from switches.  Make sure your switch port it hard set to
what you want.


On Tue, 6 Mar 2001, D. R. Holsbeck wrote:

> Is anyone using a sysconnect card? I built the
> module and all loads fine. But when I bring up
> the interface keeps going up and down. Stateing
> that the network is not connected. Im using a
> 2.2.16 kernel with the stock module. Any suggestions
> would be greatly appreciated.
>
>
>

-- 
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112-0190

Email: brian at chpc.utah.edu - Phone: (801) 585-1755 - Fax: (801) 585-5366


From sam at venturatech.com  Sun Mar 11 09:10:05 2001
From: sam at venturatech.com (sam at venturatech.com)
Date: Sun, 11 Mar 2001 09:10:05 -0800
Subject: Beowulf digest, Vol 1 #314 - 17 msgs
Message-ID: <MDAEMON5850585200103110910.AA1005012@ventura.net>

Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.

Thank you for your continued business.

Sam Lewis


From hahn at coffee.psychology.mcmaster.ca  Sun Mar 11 11:00:25 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Sun, 11 Mar 2001 14:00:25 -0500 (EST)
Subject: ECC memory
In-Reply-To: <Pine.LNX.4.21.0103091711090.3862-100000@hemlock.colorado.edu>
Message-ID: <Pine.LNX.4.10.10103111328290.9541-100000@coffee.psychology.mcmaster.ca>

> motherboards don't seem to have ECC support.  I've seen all sorts of
> conflicting accounts, but it seems to be the case that there is no ECC
> support with these boards. 

I just looked at the kt133a datasheet, and it doesn't mention ecc support.
the kt266 blurb *does* claim ECC support, though.

> The odd thing is, I've seen several companies
> selling their commercial Beowulf systems with KT7(A) mobos and ECC
> memory.  Since non-ECC motherboards can sometimes use ECC memory with the
> ECC functionality disabled, are these companies just fooling people by
> selling ECC memory along with boards that can't use ECC?  

looks like it.

> For me, this also brings up the question of how ECC works.  Does the DRAM
> module perform the error correction, or does the chipset perform error
> correction?  From some research and common sense, it seems the chipset
> does it - otherwise, what's the point of bringing the extra 8 ECC lines
> off of the memory chip?

the chipset does it.

> Finally, have people found ECC to even be necessary?  From some of the
> stats we've seen, it seems like ECC is the way to go.  But it also seems

what stats are those?  it's difficult to find the relevant data, namely
the FIT (failures in time) statistics for a given dram chip.  the last time
I saw any was a few years ago when Intel introduced a chipset with no ECC
support.  at the time, they circulated a whitepaper containing FIT numbers
to prove that with expected ram size and use, ECC was not overboard.

you must remember that ECC costs at least 12.5% more (usually 20-25%),
*and* consumes a clock cycle of latency.

whether ECC is right for you depends mainly on the amount of ram and how hard
you use it.  possibly also environmental factors (altitude, etc).  most
people *do*not* argue for skipping ECC by saying "oh, my data is junk, I don't 
care about a few flipped bits".

regards, mark hahn.


From Eugene.Leitl at lrz.uni-muenchen.de  Sun Mar 11 14:06:07 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de)
Date: Sun, 11 Mar 2001 23:06:07 +0100
Subject: Real Time
References: <4.3.2.20010306132041.00b87390@pop.pssclabs.com> <20010306174447.A6421@wumpus> <001601c0a6a8$0b704ec0$04a8a8c0@office1>
Message-ID: <3AABF6CF.40F120ED@lrz.uni-muenchen.de>

Jim Lux wrote:
> 
> While not a beowulf, I am currently working on a very hard real time (<1
> microsecond) system (a radar) using an MPI like interprocessor interface
> between DSPs.  It is entirely possible to have hard real time systems with
> nondeterministic communications.

Can you tell us more? (preferably, without having to kill us
afterwards, of course).


From pbn2au at qwest.net  Sun Mar 11 15:20:07 2001
From: pbn2au at qwest.net (pbn2au)
Date: Sun, 11 Mar 2001 16:20:07 -0700
Subject: high physical density cluster design
References: <200103111700.MAA12264@blueraja.scyld.com>
Message-ID: <3AAC0827.6195B22D@qwest.net>

>
> Problem 2 - Heat Dissipation
> """"""""""""""""""""""""""""
> The other problem we're going to have is heat. We're going to need to build
> our cabinet such that its relatively sealed, except at front, so we can get
> some coherent airflow in between boards. I am thinking we're going to need to
> mount extra fans on the back (this is going to make the 2x2 design a bit more
> tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
> stacks of 32, just 16U high). I dont know what you can suggest here, its all
> going to depend on physical configuration. The machine is housed in a proper
> environment (Datavaults.com's facilities, where I work :) thats climate
> controlled, but the inside of the cabinet will still need massive airflow,
> even with the room at 68F.

    For this much heat, I am not sure that you should not rethink this whole Idea. Some suggestions( not tongue in cheek) : Have you considered a
refrigeration unit? Put it in a walk-in freezer and go from there. Another option will be to build a sealed water tight box and encase the boards
in chilled mineral oil. The conductivity of the oil is legendary, and nonexistent. use a small unit to chill one end, and a couple of stirring
units to keep it circulating. The biggest issue is not heat within the unit but the effect on other units in the same room.


From Kian_Chang_Low at vdgc.com.sg  Mon Mar 12 02:01:58 2001
From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low at vdgc.com.sg)
Date: Mon, 12 Mar 2001 18:01:58 +0800
Subject: (Stress-)Testing of nodes in a beowulf cluster
Message-ID: <OF57A8FCFA.C9243A2A-ON48256A0D.0036018F@vdgc.com.sg>

Hi,

I have been playing with beowulf cluster for quite a while and have put
together a small cluster as a test to show that it can be done.

Now I am faced with a question about the reliability of the nodes (slave
or/and master). Is there any tests (or stress-tests) that we can run to
check the reliability of the following,
1) CPU
2) memory
3) network interface card
4) disk
5) motherboard
6) any other?!

I heard of using memtest to test the memory. But what about tests for the
other components?

I thought it will be great if there is a suite of tests that the node has
to undergo before being added to the cluster. Rather than trying to
determine the cause of failure after putting the cluster together, we at
least know that a node is downright faulty from the beginning.

Thanks,
Kian Chang.


From timm at fnal.gov  Mon Mar 12 06:05:40 2001
From: timm at fnal.gov (Steven Timm)
Date: Mon, 12 Mar 2001 08:05:40 -0600 (CST)
Subject: (Stress-)Testing of nodes in a beowulf cluster
In-Reply-To: <OF57A8FCFA.C9243A2A-ON48256A0D.0036018F@vdgc.com.sg>
Message-ID: <Pine.LNX.4.31.0103120802160.2499-100000@snowball.fnal.gov>

At Fermilab in our PC Farms our cluster is not a true Beowulf,
but we do an extensive stress test of 30 days.  Our test
consists of continuously running seti at home for 30 days on
both cpu's, then every hour on the hour using "bonnie" to
write a 1 GB test file to each disk and "nettest" to simultaneously
push 400 MB over the net.  "Streams" could be added to this as well.

In addition, there are starting to be utilities out there that
can read the event logs in the BIOS, which track if you have
any memory faults or any power supply faults.  In our experience,
power supplies are the most likely thing to go bad in the
first 30 days, and sometimes you get a bad batch of memory too.
The stress test above makes the machine draw almost the highest current
it will draw, and if the power supply is going to die, it will do so
quickly.


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

On Mon, 12 Mar 2001 Kian_Chang_Low at vdgc.com.sg wrote:

> Hi,
>
> I have been playing with beowulf cluster for quite a while and have put
> together a small cluster as a test to show that it can be done.
>
> Now I am faced with a question about the reliability of the nodes (slave
> or/and master). Is there any tests (or stress-tests) that we can run to
> check the reliability of the following,
> 1) CPU
> 2) memory
> 3) network interface card
> 4) disk
> 5) motherboard
> 6) any other?!
>
> I heard of using memtest to test the memory. But what about tests for the
> other components?
>
> I thought it will be great if there is a suite of tests that the node has
> to undergo before being added to the cluster. Rather than trying to
> determine the cause of failure after putting the cluster together, we at
> least know that a node is downright faulty from the beginning.
>
> Thanks,
> Kian Chang.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


From davidgrant at mediaone.net  Mon Mar 12 06:11:23 2001
From: davidgrant at mediaone.net (David Grant)
Date: Mon, 12 Mar 2001 09:11:23 -0500
Subject: (no subject)
References: <Pine.GSO.3.96.1010306112358.15108A-100000@research.amnh.org> <3AA51874.488827EF@niptron.com>
Message-ID: <004501c0aafe$51e437e0$954f1e42@ne.mediaone.net>

here's an interesting spin on clustering....


FROM: Smart Partner Magazine, a ZD Net Company
Date: MARCH 5, 2001


WE WANT YOUR CYCLES

    By David Hakala

Free ISP Juno Online Services wants it's users to contribute their idle
clock cycles to a distributed "supercomputer" which the struggling service
would rent to research organizations and corporations that need massive
computation power.

The plan would require users to install client software and leave their PC's
on 24 hours a day. The client would start processing data when the PC's
screensaver kicks in, and upload the results when the user connects to the
Internet.

Currently, the Juno Virtual Supercomputer Network consists of a few
volunteers.  But CEO Charles Ardai says that participation may be required.

So far, however, there are no takers.  Maybe researchers are leery of
outsourcing critical apps to freeloaders.

-David Hakala, Smart Partner Magazine


David A. Grant,  V.P. Cluster Technologies
GSH Intelligent Integrated Systems
95 Fairmount St. Fitchburg Ma 01450
Phone 603.898.9717       Fax 603.898.9719
Email: davidg at gshiis.com      Web: www.gshiis.com
"Providing High Performance Computing Solutions for Over a Decade"


From lowther at att.net  Mon Mar 12 08:22:07 2001
From: lowther at att.net (Ken)
Date: Mon, 12 Mar 2001 11:22:07 -0500
Subject: (Stress-)Testing of nodes in a beowulf cluster
References: <Pine.LNX.4.31.0103120802160.2499-100000@snowball.fnal.gov>
Message-ID: <3AACF7AF.E2C371A5@att.net>

Steven Timm wrote:
> 
> At Fermilab in our PC Farms our cluster is not a true Beowulf,
> but we do an extensive stress test of 30 days.  Our test
> consists of continuously running seti at home 

This has a build in 'torture' test:
http://www.mersenne.org/links.htm

It has been optimized several times and typically uses close to 100% of
clock cyles available.  

Ken


From JParker at coinstar.com  Mon Mar 12 08:28:13 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Mon, 12 Mar 2001 08:28:13 -0800
Subject: Sequent 2000
Message-ID: <OFE81214AA.69C9A9D5-ON08256A0D.005A2835@coinstar.com>

G'Day !

I have worked at places that used Sequent databases servers in high availabilty
clusters.  Basically multiple replicated servers with fall-over and  load
balancing software.  Never heard of them being used in parallel processing, but
that doesn't mean they can't ...

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


                    "Schilling, Richard"                                                                                                    
                    <RSchilling at affiliatedh        To:     "'Beowulf Listserv'" <beowulf at beowulf.org>                                       
                    ealth.org>                     cc:                                                                                      
                    Sent by:                       Subject:     Sequent 2000                                                                
                    beowulf-admin at beowulf.o                                                                                                 
                    rg                                                                                                                      
                                                                                                                                            
                                                                                                                                            
                    03/09/01 05:44 PM                                                                                                       
                                                                                                                                            
                                                                                                                                            
I just picked up a Sequent 2000/290 for a pretty good price, and the
documentation speaks of clustering.  Does anyone have any experience
clustering with the Sequent boxes?  Happy to share experiences.

Thanks.

Richard Schilling
Webmaster / Web Integration Programmer
Affiliated Health Services
Mount Vernon, WA
http://www.affiliatedhealth.org

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From sam at venturatech.com  Mon Mar 12 09:04:18 2001
From: sam at venturatech.com (sam at venturatech.com)
Date: Mon, 12 Mar 2001 09:04:18 -0800
Subject: Beowulf digest, Vol 1 #315 - 9 msgs
Message-ID: <MDAEMON7360736200103120904.AA0418090@ventura.net>

Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.

Thank you for your continued business.

Sam Lewis


From RSchilling at affiliatedhealth.org  Mon Mar 12 09:26:37 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Mon, 12 Mar 2001 09:26:37 -0800
Subject: high physical density cluster design - power/heat/rf question
	s
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1104@mail1.affiliatedhealth.org>

I took a look at the website.  These boards are full size PC boards, and
might not perform well with the compact space, due to the problems you've
outlined. 

But, on the other hand, FreeBSD should work fine on these.  I'm using
FreeBSD for clustering right now, and the operating system is pretty stable.

Check out http://www.emjembedded.com/products/products.html for single board
computers that may give you much more of a dense setup than with these
boards.

Richard Schilling
Mount Vernon, WA


> -----Original Message-----
> From: Velocet [mailto:mathboy at velocet.ca]
> Sent: Monday, March 05, 2001 9:36 PM
> To: beowulf at beowulf.org
> Subject: high physical density cluster design - power/heat/rf 
> questions
> 
> 
> I have some questions about a cluster we're designing. We really need
> a relatively high density configuration here, in terms of floor space.
> 
> To be able to do this I have found out pricing on some socket 
> A boards with
> onboard NICs and video (dont need video though). We arent 
> doing anything
> massively parallel right now (just running 
> Gaussian/Jaguar/MPQC calculations)
> so we dont need major bandwidth.* We're booting with root 
> filesystem over
> NFS  on these boards. Havent decided on FreeBSD or Linux yet. 
> (This email
> isnt about software config, but feel free to ask questions).
> 
> (* even with NFS disk we're looking at using MFS on freebsd 
> (or possibly
> the new md system) or the new nbd on linux or equivalent for 
> gaussian's
> scratch files - oodles faster than disk, and in our case, with no
> disk, it writes across the network only when required. Various tricks
> we can do here.)
> 
> The boards we're using are PC Chip M810 boards 
> (www.pcchips.com). Linux seems
> fine with the NIC on board (SiS chip of some kind - Ben 
> LaHaise of redhat is
> working with me on some of the design and has been testing it 
> for Linux, I
> have yet to play with freebsd on it).
> 
> The configuration we're looking at to achieve high physical density is
> something like this:
> 
>                NIC and Video connectors
>               /
>  ------------=--------------	 board upside down
>     | cpu |  =  |   RAM   |
>     |-----|     |_________|
>     |hsync|
>     |     |      --fan--
>     --fan--      |     | 
>    _________     |hsync|
>   |         |    |-----|
>   |  RAM    | =  | cpu |
>  -------------=-------------	board right side up
> 
> as you can see the boards kind of mesh together to take up 
> less space. At
> micro ATX factor (9.25" I think per side) and about 2.5 or 3" 
> high for the
> CPU+Sync+fan (tallest) and 1" tall for the ram or less, I can 
> stack two of
> these into 7" (4U). At 9.25" per side, 2 wide inside a 
> cabinet gives me 4
> boards per 4U in a standard 24" rack footprint. If I go 2 
> deep as well (ie 2x2
> config), then for every 4U I can get 16 boards in.
> 
> The cost for this is amazing, some $405 CDN right now for 
> Duron 800s with
> 128Mb of RAM each without the power supply (see below; 
> standard ATX power is
> $30 CDN/machine). For $30000 you can get a large ass-load of 
> machines ;)
> 
> Obviously this is pretty ambitious. I heard talk of some people doing
> something like this, with the same physical confirguration and cabinet
> construction, on the list. Wondering what your experiences have been.
> 
> 
> Problem 1
> """""""""
> The problem is in the diagram above, the upside down board 
> has another board
> .5" above it - are these two boards going to leak RF like mad 
> and interefere
> with eachothers' operations? I assume there's not much to do 
> there but to put
> a layer of grounded (to the cabinet) metal in between.  This 
> will drive up the
> cabinet construction costs. I'd rather avoid this if possible.
> 
> Our original construction was going to be copper pipe and 
> plexiglass sheeting,
> but we're not sure that this will be viable for something 
> that could be rather
> tall in our future revisions of our model. Then again, copper 
> pipe can be
> bolted to our (cement) ceiling and floor for support.
> 
> For a small model that Ben LaHaise built, check the pix at
> http://trooper.velocet.ca/~mathboy/giocomms/images
> 
> Its quick a hack, try not to laugh. It does engender the 'do 
> it damn cheap'
> mentality we're operating with here.
> 
> The boards are designed to slide out the front once the power 
> and network
> are disconnected.
> 
> An alternate construction we're considering is sheet metal cutting and
> folding, but at much higher cost.
> 
> 
> Problem 2 - Heat Dissipation
> """""""""""""""""""""""""""" 
> The other problem we're going to have is heat. We're going to 
> need to build
> our cabinet such that its relatively sealed, except at front, 
> so we can get
> some coherent airflow in between boards. I am thinking we're 
> going to need to
> mount extra fans on the back (this is going to make the 2x2 
> design a bit more
> tricky, but at only 64 odd machines we can go with 2x1 config 
> instead, 2
> stacks of 32, just 16U high). I dont know what you can 
> suggest here, its all
> going to depend on physical configuration. The machine is 
> housed in a proper
> environment (Datavaults.com's facilities, where I work :) 
> thats climate
> controlled, but the inside of the cabinet will still need 
> massive airflow,
> even with the room at 68F.
> 
> 
> Problem 3 - Power
> """""""""""""""""
> The power density here is going to be high. I need to mount 
> 64 power supplies
> in close proximity to the boards, another reason I might need 
> to maintain
> the 2x1 instead of 2x2 design. (2x1 allows easier access too). 
> 
> We dont really wanna pull that many power outlets into the 
> room - I dont know
> what a diskless Duron800 board with 256Mb or 512Mb ram will 
> use, though I
> guess around .75 to 1 A. Im gonna need 3 or 4 full circuits 
> in the room (not
> too bad actually). However thats alot of weight on the 
> cabinet to hold 60 odd
> power supplies, not to mention the weight of the cables 
> themselves weighing
> down on it, and a huge mess of them to boot.
> 
> I am wondering if someone has a reliable way of wiring 
> together multiple
> boards per power supply? Whats the max density per supply? Can we
> go with redundant power supplies, like N+1? We dont need that much
> reliability (jobs are short, run on one machine and can be restarted
> elsewhere), but I am really looking for something thats going to
> reduce the cabling.
> 
> As well, I am hoping there is some economy of power converted here -
> a big supply will hopefully convert power for multiple boards more
> efficiently than a single supply per board. However, as always, the
> main concern is cost.
> 
> Any help or ideas is appreciated.
> 
> /kc
> -- 
> Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  * 
>  Toronto, CANADA 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010312/893b3cc0/attachment.html>

From nashif at suse.de  Mon Mar 12 09:27:16 2001
From: nashif at suse.de (nashif at suse.de)
Date: Mon, 12 Mar 2001 18:27:16 +0100 (CET)
Subject: Beowulf digest, Vol 1 #315 - 9 msgs
In-Reply-To: <MDAEMON7360736200103120904.AA0418090@ventura.net>
Message-ID: <Pine.LNX.4.30.0103121826070.1539-100000@Appserv.suse.de>

Thank God this guy only gets the Beowulf digest...

Anas

On Mon, 12 Mar 2001 sam at venturatech.com wrote:

> Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
> please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.
>
> Thank you for your continued business.
>
> Sam Lewis
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Anas Nashif <nashif at suse.de>
SuSE GmbH, Nuremberg, Germany

Fone: +1 450 978 2382
Fax:  +1 507 242 9604


From mathboy at velocet.ca  Mon Mar 12 10:35:11 2001
From: mathboy at velocet.ca (Velocet)
Date: Mon, 12 Mar 2001 13:35:11 -0500
Subject: Beowulf digest, Vol 1 #315 - 9 msgs
In-Reply-To: <Pine.LNX.4.30.0103121826070.1539-100000@Appserv.suse.de>; from nashif@suse.de on Mon, Mar 12, 2001 at 06:27:16PM +0100
References: <MDAEMON7360736200103120904.AA0418090@ventura.net> <Pine.LNX.4.30.0103121826070.1539-100000@Appserv.suse.de>
Message-ID: <20010312133511.F1579@velocet.ca>

On Mon, Mar 12, 2001 at 06:27:16PM +0100, nashif at suse.de's all...
> Thank God this guy only gets the Beowulf digest...
> 

makes no diff, apparently every poster in the digest gets a special
meaningless reply from him.

Isnt this list sent as X-Priority: bulk?

/kc

> Anas
> 
> On Mon, 12 Mar 2001 sam at venturatech.com wrote:
> 
> > Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
> > please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.
> >
> > Thank you for your continued business.
> >
> > Sam Lewis
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> 
> -- 
> Anas Nashif <nashif at suse.de>
> SuSE GmbH, Nuremberg, Germany
> 
> Fone: +1 450 978 2382
> Fax:  +1 507 242 9604
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From andreas at amy.udd.htu.se  Mon Mar 12 11:48:47 2001
From: andreas at amy.udd.htu.se (Andreas Boklund)
Date: Mon, 12 Mar 2001 20:48:47 +0100 (CET)
Subject: Beowulf digest, Vol 1 #315 - 9 msgs
In-Reply-To: <20010312133511.F1579@velocet.ca>
Message-ID: <Pine.LNX.4.21.0103122040250.31599-100000@amy.udd.htu.se>

Spam and fun!

Well someone should teach that guy how to write a proper filter before
they give him the possibility to create this kind of spam. I actually used
to use the "auto on vecation reply" mails as an example of very poor
judegement when i was teaching out personell how to use their (new)email
clients.
It took 3 days before one of my collegues (head of computer security)
accidentify "forwarded, doubled and looped back" all mail that was 
recieved by the postmaster account and created 400 000 (approx) mails in
one night :) 

I have already started to filter him out so i hope he wont send enything 
usefull to this list in the future :(


//Andreas

> > On Mon, 12 Mar 2001 sam at venturatech.com wrote:
> > 
> > > Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
> > > please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.
> > >
> > > Thank you for your continued business.
> > >
> > > Sam Lewis
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >

*********************************************************
*   Administator of Amy and Sfinx(Iris23)               *
*                                                       *
*   Voice: 070-7294401                                  *
*   ICQ: 12030399                                       *
*   Email: andreas at shtu.htu.se, boklund at linux.nu        *
*                                                       *
*   That is how you find me, How do -I- find you ?      *
*********************************************************


From joelja at darkwing.uoregon.edu  Mon Mar 12 11:54:27 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Mon, 12 Mar 2001 11:54:27 -0800 (PST)
Subject: high physical density cluster design
In-Reply-To: <3AAC0827.6195B22D@qwest.net>
Message-ID: <Pine.LNX.4.30.0103121130570.2636-100000@twin.uoregon.edu>

On Sun, 11 Mar 2001, pbn2au wrote:

> >
> > Problem 2 - Heat Dissipation
> > """"""""""""""""""""""""""""
> > The other problem we're going to have is heat. We're going to need to build
> > our cabinet such that its relatively sealed, except at front, so we can get
> > some coherent airflow in between boards. I am thinking we're going to need to
> > mount extra fans on the back (this is going to make the 2x2 design a bit more
> > tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
> > stacks of 32, just 16U high). I dont know what you can suggest here, its all
> > going to depend on physical configuration. The machine is housed in a proper
> > environment (Datavaults.com's facilities, where I work :) thats climate
> > controlled, but the inside of the cabinet will still need massive airflow,
> > even with the room at 68F.
>
>     For this much heat, I am not sure that you should not rethink this
> whole Idea. Some suggestions( not tongue in cheek) : Have you
> considered a refrigeration unit? Put it in a walk-in freezer and go
> from there. Another option will be to build a sealed water tight box

router vendors have boxes that need to dissapate in excess of 5Kw in one
42u box... These are air cooled...

> and encase the boards in chilled mineral oil. The conductivity of the
> oil is legendary, and nonexistent. use a small unit to chill one end,
> and a couple of stirring units to keep it circulating.

liquid cooling is typically used only as a last resort by hardware
vendors when the individual components need to have more heat dissapated
than the thermal conductivity of air will allow.

> The biggest
> issue is not heat within the unit but the effect on other units in the
> same room.

no matter what you have to exhaust the heat from the room.

>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli				       joelja at darkwing.uoregon.edu
Academic User Services			     consult at gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From sam at venturatech.com  Mon Mar 12 12:01:54 2001
From: sam at venturatech.com (sam at venturatech.com)
Date: Mon, 12 Mar 2001 12:01:54 -0800
Subject: Beowulf digest, Vol 1 #316 - 7 msgs
Message-ID: <MDAEMON7550755200103121201.AA0154022@ventura.net>

Thank you for your email.  I'll be out of the office on 03/09 and 03/12 with no access to email.  If you need a response today
please send an email to Daryl Newton at - daryl at venturatech.com or call him at 760-597-9800 X10.

Thank you for your continued business.

Sam Lewis


From joelja at darkwing.uoregon.edu  Mon Mar 12 12:10:56 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Mon, 12 Mar 2001 12:10:56 -0800 (PST)
Subject: 8 DIMM Slot PIII Motherboard?
In-Reply-To: <Pine.LNX.4.30.0103081500490.28402-100000@mf1.private>
Message-ID: <Pine.LNX.4.30.0103121208520.2636-100000@twin.uoregon.edu>

try the tyan thunderbolt 2500 s1867

http://www.tyan.com/products/html/thunder2500_p.html

it is slot-1 not fcpga...

joelja


On Thu, 8 Mar 2001, Wayne Whitney wrote:

>
> Hi All,
>
> I'm looking for a PIII motherboard that has 8 DIMM slots and handles 32x8
> (high-density) 512MB DIMMs.  Does anyone know of one?  Dual PIII is
> prefereable, but single PIII would be OK.  My goal is to assemble a 4GB
> main RAM machine on the cheap, using 8 $160 32x8 512MB DIMMs, rather than
> 4 $600-$700 16x16 1GB DIMMs.
>
> According to the specifications of the Serverworks HE chipset, it can
> handle 8 DIMM slots, as it does interleaving, so it can do 4 DIMM slots on
> each channel.  However, the motherboards I've seen with this chipset
> typically have just 4 DIMM slots.  Moreover, I don't know if this chipset
> handles 32x8 512MB DIMMs.
>
> Any pointers would be appreciated.
>
> Cheers,
> Wayne
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli				       joelja at darkwing.uoregon.edu
Academic User Services			     consult at gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From Dean.Carpenter at pharma.com  Mon Mar 12 12:15:58 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Mon, 12 Mar 2001 15:15:58 -0500
Subject: Beowulf digest, Vol 1 #304 - 13 msgs
Message-ID: <759FC8B57540D311B14E00902727A0C002EC4785@a1mbx01.pharma.com>

Hmmm.  Has anyone looked at the teeny tiny RazorBlade systems from Cell
Computing ?  If P3-500 is enough for a node, these things are *small*.

http://www.cellcomputing.com

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


-----Original Message-----
From: pbn2au [mailto:pbn2au at qwest.net]
Sent: Tuesday, March 06, 2001 9:37 PM
To: beowulf at beowulf.org
Subject: Re: Beowulf digest, Vol 1 #304 - 13 msgs


> Dean.Carpenter at pharma.com said:
> > We, like most out there I'm sure, are constrained, by money and by
> > space. We need to get lots of cpus in as small a space as possible.
> > Lots of 1U VA-Linux or SGI boxes would be very cool, but would drain
> > the coffers way too quickly.  Generic motherboards in clone cases is
> > cheap, but takes up too much room.
>
> > So, a colleague and I are working on a cheap and high-density 1U node.
> >  So far it looks like we'll be able to get two dual-CPU (P3)
> > motherboards per 1U chassis, with associated dual-10/100, floppy, CD
> > and one hard drive.  And one PCI slot.  Although it would be nice to
> > have several Ultra160 scsi drives in raid, a generic cluster node (for
> > our uses) will work fine with a single large UDMA-100 ide drive.
>
> > That's 240 cpus per 60U rack.  We're still working on condensed power
> > for the rack, to simplify things.  Note that I said "for our uses"
> > above.  Our design goals here are density and $$$.  Hence some of the
> > niceties are being foresworn - things like hot-swap U160 scsi raid
> > drives, das blinken lights up front, etc.
>
> > So, what do you think ?  If there's interest, I'll keep you posted on
> > our progress.  If there's LOTS of interest, we may make a larger
> > production run to make these available to others.
>
> > -- Dean Carpenter deano at areyes.com dean.carpenter at pharma.com
> > dean.carpenter at purduepharma.com 94TT :)

 Dean, Get rid of the cases!!!! You can put the motherboards together using
all- threads. There are a
couple of companies selling 90 degree pci slot adapters, for the nics. By
running  2 motherboards on a
regular power supply, using just the nic card, processor and ram, (use boot
proms on the nics) you can
get 40 boards in a 5 foot Rack mount. use a shelf every 4 boards to attach
the power supply top and
bottom. With a fully enclosed case 8 100 mm fans are sufficient to cool the
entire setup. Conversely
if you use 32 boards and a 32 port router/switch you can have nodes on
wheels!!

 It may sound nuts, but mine has a truncated version of this setup. using 4
boards I was able to
calculate the needed power for fans and by filling my tower with 36 naked
m\boards running full steam,
I calculated the air flow. Yes it sounds rinky-dink but under smoked glass
it looks awesome!!


From joelja at darkwing.uoregon.edu  Mon Mar 12 12:25:39 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Mon, 12 Mar 2001 12:25:39 -0800 (PST)
Subject: 8 DIMM Slot PIII/Athlon Motherboard ?
In-Reply-To: <001e01c0aa19$99576600$5df31d97@W2KCECCHI1>
Message-ID: <Pine.LNX.4.30.0103121217370.2636-100000@twin.uoregon.edu>

I see $407 for 512MB registered ecc dimms from crucial which is quite a
bit better than the $700ea or so we paid 3 months ago...

if you're purchasing memory to put on serverworks boards... it needs to be
ecc and registered...

On Sun, 11 Mar 2001, Gianluca Cecchi wrote:

>
> > >8 $160 32x8 512MB DIMMs cost $1280, while 4 $650 16x16 1GB DIMMs
> > >cost $2600.
> Any pointer to where to buy for these proces true good memories?
> Thanks
> Bye,
> Gianluca Cecchi
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli				       joelja at darkwing.uoregon.edu
Academic User Services			     consult at gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From JParker at coinstar.com  Mon Mar 12 12:38:40 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Mon, 12 Mar 2001 12:38:40 -0800
Subject: high physical density cluster design
Message-ID: <OF6AA895C5.04F918D5-ON88256A0D.0071109A@coinstar.com>

G'Day !

True ... but they spend alot of time and effort on the problem.  I used to know
this guy who ran CFD code for Intel to analyize heat transfer within computer
cases.

Kinda catch-22 ... you need a cluster to run the CFD code needed to build a
cluster  ;-)

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!


                    Joel Jaeggli                                                                                                       
                    <joelja at darkwing.uo        To:     pbn2au <pbn2au at qwest.net>                                                       
                    regon.edu>                 cc:     <beowulf at beowulf.org>                                                           
                    Sent by:                   Subject:     Re: high physical density cluster design                                   
                    beowulf-admin at beowu                                                                                                
                    lf.org                                                                                                             
                                                                                                                                       
                                                                                                                                       
                    03/12/01 11:54 AM                                                                                                  
                                                                                                                                       
                                                                                                                                       
On Sun, 11 Mar 2001, pbn2au wrote:

> >
> > Problem 2 - Heat Dissipation
> > """"""""""""""""""""""""""""
> > The other problem we're going to have is heat. We're going to need to build
> > our cabinet such that its relatively sealed, except at front, so we can get
> > some coherent airflow in between boards. I am thinking we're going to need
to
> > mount extra fans on the back (this is going to make the 2x2 design a bit
more
> > tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
> > stacks of 32, just 16U high). I dont know what you can suggest here, its all
> > going to depend on physical configuration. The machine is housed in a proper
> > environment (Datavaults.com's facilities, where I work :) thats climate
> > controlled, but the inside of the cabinet will still need massive airflow,
> > even with the room at 68F.
>
>     For this much heat, I am not sure that you should not rethink this
> whole Idea. Some suggestions( not tongue in cheek) : Have you
> considered a refrigeration unit? Put it in a walk-in freezer and go
> from there. Another option will be to build a sealed water tight box

router vendors have boxes that need to dissapate in excess of 5Kw in one
42u box... These are air cooled...

> and encase the boards in chilled mineral oil. The conductivity of the
> oil is legendary, and nonexistent. use a small unit to chill one end,
> and a couple of stirring units to keep it circulating.

liquid cooling is typically used only as a last resort by hardware
vendors when the individual components need to have more heat dissapated
than the thermal conductivity of air will allow.

> The biggest
> issue is not heat within the unit but the effect on other units in the
> same room.

no matter what you have to exhaust the heat from the room.

>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>

--
--------------------------------------------------------------------------
Joel Jaeggli
joelja at darkwing.uoregon.edu
Academic User Services
consult at gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From Eugene.Leitl at lrz.uni-muenchen.de  Mon Mar 12 13:31:46 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de)
Date: Mon, 12 Mar 2001 22:31:46 +0100
Subject: Beowulf digest, Vol 1 #315 - 9 msgs
References: <Pine.LNX.4.21.0103122040250.31599-100000@amy.udd.htu.se>
Message-ID: <3AAD4042.487C7928@lrz.uni-muenchen.de>

Andreas Boklund wrote:

> I have already started to filter him out so i hope he wont send enything
> usefull to this list in the future :(

SOP should be instant unsubscription.

Actually, CHEMINF-L is way worse. Here you can get ~10 out of office
autoreplies to a post, some of them make it to the list. Unfortunately,
there's no license to mail.


From yoon at bh.kyungpook.ac.kr  Mon Mar 12 15:11:24 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Tue, 13 Mar 2001 08:11:24 +0900
Subject: (no subject)
References: <Pine.GSO.3.96.1010306112358.15108A-100000@research.amnh.org> <3AA51874.488827EF@niptron.com> <004501c0aafe$51e437e0$954f1e42@ne.mediaone.net>
Message-ID: <002001c0ab49$c8cfd580$5f72f2cb@TEST>

There are many Projects related to the distributed computing.

SETI, Distributed.net, Process Tree, ..... ....   Grid Computing 

Have a nice day !


----- Original Message ----- 
From: David Grant <davidgrant at mediaone.net>
To: <beowulf at beowulf.org>
Sent: Monday, March 12, 2001 11:11 PM
Subject: (no subject)


> here's an interesting spin on clustering....
> 
> 
> FROM: Smart Partner Magazine, a ZD Net Company
> Date: MARCH 5, 2001
> 
> 
> WE WANT YOUR CYCLES
> 
>     By David Hakala
> 
> Free ISP Juno Online Services wants it's users to contribute their idle
> clock cycles to a distributed "supercomputer" which the struggling service
> would rent to research organizations and corporations that need massive
> computation power.
> 
> The plan would require users to install client software and leave their PC's
> on 24 hours a day. The client would start processing data when the PC's
> screensaver kicks in, and upload the results when the user connects to the
> Internet.
> 
> Currently, the Juno Virtual Supercomputer Network consists of a few
> volunteers.  But CEO Charles Ardai says that participation may be required.
> 
> So far, however, there are no takers.  Maybe researchers are leery of
> outsourcing critical apps to freeloaders.
> 
> -David Hakala, Smart Partner Magazine
> 
> 
> 
> 

---------------------------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon at bh.kyungpook.ac.kr
jhyoon at mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
 
Imagination is more important than knowledge.  A. Einstein
"??????? ??? ???" ??? ??, " ??? ??? ??" ?? ??? ??(???? ???? ???? ??)
"????? '???? ????? ??'??? ? ? ???, ??? ??? ????? ??? ????."   ?? ???   
"??? ?? ??? ??? ??? ??? ??? ??"  ??? 2000.4.22
"???? ???? ?? ??? ??? ??? ????" ? ?? 2000.4.29
"???? ??? ??? ??? ??? ????" ? ?? 2000.4.24
http://www.kichun.co.kr   2001.1.6
http://www.c3tv.com    2001.1.10 
----------------------------------------------------------------------------------------


From jalton at olsh.cx  Mon Mar 12 16:44:05 2001
From: jalton at olsh.cx (James Alton)
Date: Mon, 12 Mar 2001 16:44:05 -0800
Subject: 10/100 NICs
Message-ID: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx>

When building a beowulf, especially when using cheap 10/100 cards,
performance matters a lot. Thatt is why I am asking the question of, if I am
going to buy a 10/100 card, what is faster? A Kingston card, or a 3com 905B
card? Is there a page that shows benchmarks of 10/100 cards? Would a cheapo
realtek card (10/100) get the same performance as far as speed? Also, is
gigabit worth it to put in every node? (What type of speed would I expect
from gigabit? 1000Mbits/sec? lol.)

James Alton
jalton at olsh.cx


From szii at sziisoft.com  Mon Mar 12 18:35:40 2001
From: szii at sziisoft.com (szii at sziisoft.com)
Date: Mon, 12 Mar 2001 18:35:40 -0800
Subject: 10/100 NICs
References: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx>
Message-ID: <019501c0ab66$4de5e5e0$fd02a8c0@surfmetro.com>

I saw, somewhere, a benchmarking between a whole slew of cards.
I remember that the Intel EtherExpressPro 100s had the lowest
latency, although I prefer the 3c905B myself, which was 2nd or 3rd.

-M

----- Original Message -----
From: James Alton <jalton at olsh.cx>
To: <beowulf at beowulf.org>
Sent: Monday, March 12, 2001 4:44 PM
Subject: 10/100 NICs


> When building a beowulf, especially when using cheap 10/100 cards,
> performance matters a lot. Thatt is why I am asking the question of, if I
am
> going to buy a 10/100 card, what is faster? A Kingston card, or a 3com
905B
> card? Is there a page that shows benchmarks of 10/100 cards? Would a
cheapo
> realtek card (10/100) get the same performance as far as speed? Also, is
> gigabit worth it to put in every node? (What type of speed would I expect
> from gigabit? 1000Mbits/sec? lol.)
>
> James Alton
> jalton at olsh.cx
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From siegert at sfu.ca  Mon Mar 12 20:16:34 2001
From: siegert at sfu.ca (Martin Siegert)
Date: Mon, 12 Mar 2001 20:16:34 -0800
Subject: 10/100 NICs
In-Reply-To: <019501c0ab66$4de5e5e0$fd02a8c0@surfmetro.com>; from szii@sziisoft.com on Mon, Mar 12, 2001 at 06:35:40PM -0800
References: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx> <019501c0ab66$4de5e5e0$fd02a8c0@surfmetro.com>
Message-ID: <20010312201634.A5222@stikine.ucs.sfu.ca>

On Mon, Mar 12, 2001 at 06:35:40PM -0800, szii at sziisoft.com wrote:
> I saw, somewhere, a benchmarking between a whole slew of cards.
> I remember that the Intel EtherExpressPro 100s had the lowest
> latency, although I prefer the 3c905B myself, which was 2nd or 3rd.

I cannot confirm this: in my tests (http://www.sfu.ca/~siegert/nic-test.html)
the eepro100 always had the highest latency.

Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================


From mathboy at velocet.ca  Mon Mar 12 22:47:10 2001
From: mathboy at velocet.ca (Velocet)
Date: Tue, 13 Mar 2001 01:47:10 -0500
Subject: DDR ram & thrashing L1/L2 cache
Message-ID: <20010313014710.M44330@velocet.ca>

Anyone played with boards that support DDR ram @ 266Mhz with Gaussian98 or
other computational software that thrashes the cache?

Im finding that the increased bus speeds on any of the boards and CPUs
I've been testing with g98 is the biggest speedup over any other factor.
(cache size on board, cache speed, etc).

I am wondering if anyone has comparisons for the same CPUs with the same
amount of ram on DDR and non DDR boards+memory for jobs that thrash cache.

Thanks.

/kc
--
Ken Chase, math at velocet.ca * Velocet Communications Inc.  * Toronto, CANADA


From rauch at inf.ethz.ch  Tue Mar 13 01:24:35 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Tue, 13 Mar 2001 10:24:35 +0100 (CET)
Subject: 10/100 NICs
In-Reply-To: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx>
Message-ID: <Pine.LNX.4.21.0103131017480.15855-100000@maloney.inf.ethz.ch>

On Mon, 12 Mar 2001, James Alton wrote:
> Also, is gigabit worth it to put in every node? (What type of speed
> would I expect from gigabit? 1000Mbits/sec? lol.)

On our older 400 MHz PII cluster with Linux kernel 2.2.x we got a TCP
performance with the standard Linux TCP stack of about 42 MB/s. With 
"speculative defragmentation" and true zero-copy we were able to
sustain about 65 MB/s [1].

Now that we upgraded our cluster to 1 GHz PIII (STL2 boards) with
Linux kernel 2.4.1, we get a standard TCP performance of over 100
MB/s.

- Felix

[1] http://www.cs.inf.ethz.ch/CoPs/publications/
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From shahin at labf.org  Tue Mar 13 02:52:44 2001
From: shahin at labf.org (Mofeed Shahin)
Date: Tue, 13 Mar 2001 21:22:44 +1030
Subject: Dual Athlon
In-Reply-To: <Pine.LNX.4.30.0103101030520.9639-100000@ganesh.phy.duke.edu>
References: <Pine.LNX.4.30.0103101030520.9639-100000@ganesh.phy.duke.edu>
Message-ID: <01031321224402.01059@relativity.labf.org>

So Robert, when are you going to let us know the results of the Dual Athlon ?
:-)

Mof.


From rgb at phy.duke.edu  Tue Mar 13 03:50:16 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 13 Mar 2001 06:50:16 -0500 (EST)
Subject: Dual Athlon
In-Reply-To: <01031321224402.01059@relativity.labf.org>
Message-ID: <Pine.LNX.4.30.0103130641230.19514-100000@ganesh.phy.duke.edu>

On Tue, 13 Mar 2001, Mofeed Shahin wrote:

> So Robert, when are you going to let us know the results of the Dual Athlon ?
> :-)
>
> Mof.

They got my account setup yesterday, but for some reason I'm having a
hard time connecting via ssh (it's rejecting my password).  We've tried
both a password they sent me and an MD5 crypt I sent them. Very strange
-- I use OpenSSH routinely to connect all over the place so I'm
reasonably sure my client is OK.  Anyway, I expect it is something
trivial and that I'll get in sometime this morning.  I spent the time
yesterday that I couldn't get in profitably anyway packaging stream and
a benchmark sent to me by Thomas Guignol of the list up into make-ready
tarball/RPM's.  At the moment my list looks something like:

stream
guignol
cpu-rate
lmbench (ass'td)
LAM/MPI plus two benchmarks (Josip and Doug each suggested one)
EPCC OpenMP microbenchmarks (probably with PGI)
possibly some fft timings (Martin Seigert)

in roughly that order, depending on how much time I get and how well
things go.  I'm going to TRY to build a page with all the tests I used
in tarball/rpm form, results, and commentary.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From brian at chpc.utah.edu  Tue Mar 13 07:04:43 2001
From: brian at chpc.utah.edu (Brian Haymore)
Date: Tue, 13 Mar 2001 08:04:43 -0700
Subject: DDR ram & thrashing L1/L2 cache
References: <20010313014710.M44330@velocet.ca>
Message-ID: <3AAE370B.9BBEAE75@chpc.utah.edu>

Velocet wrote:
> 
> Anyone played with boards that support DDR ram @ 266Mhz with Gaussian98 or
> other computational software that thrashes the cache?
> 
> Im finding that the increased bus speeds on any of the boards and CPUs
> I've been testing with g98 is the biggest speedup over any other factor.
> (cache size on board, cache speed, etc).
> 
> I am wondering if anyone has comparisons for the same CPUs with the same
> amount of ram on DDR and non DDR boards+memory for jobs that thrash cache.
> 
> Thanks.
> 
> /kc
> --
> Ken Chase, math at velocet.ca * Velocet Communications Inc.  * Toronto, CANADA
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


I will have these results from a 1.2Ghz Athlon with a 266Mhz FSB and
PC2100 memory in a few days.  I'll post my results to the list then.


-- 
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112-0190

Email: brian at chpc.utah.edu - Phone: (801) 585-1755 - Fax: (801) 585-5366


From becker at scyld.com  Tue Mar 13 08:02:42 2001
From: becker at scyld.com (Donald Becker)
Date: Tue, 13 Mar 2001 11:02:42 -0500 (EST)
Subject: 10/100 NICs
In-Reply-To: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx>
Message-ID: <Pine.LNX.4.10.10103131053460.879-100000@vaio.greennet>

On Mon, 12 Mar 2001, James Alton wrote:

> When building a beowulf, especially when using cheap 10/100 cards,
> performance matters a lot. Thatt is why I am asking the question of, if I am
> going to buy a 10/100 card, what is faster? A Kingston card, or a 3com 905B
> card?

The 3c905B will usually win, since it can
   Receive into arbitrarily aligned Rx buffers
   Calculate UDP/TCP/IP checksums in hardware

The tulip design has an excellent multicast filter, and should be used
when there is heavy multicast traffic.  Note that some of the tulip
clones (ADMtek and ASIX) omit the 16 element perfect filter and 512 slot
hash filter, instead substituting only a 64 slot hash filter.

> Is there a page that shows benchmarks of 10/100 cards?

The benchmarks vary by the CPU in use.  The Rx alignment doesn't matter
with some CPUs, but is a killer with the IA64 and Alpha.

> Would a cheapo
> realtek card (10/100) get the same performance as far as speed?

No -- the RTL chips require the CPU to do an extra copy for every packet
transferred, both Rx and Tx.

> Also, is gigabit worth it to put in every node? (What type of speed
> would I expect from gigabit? 1000Mbits/sec? lol.)

It depends on the cost.  Right now the switch cost is the real issue.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From Scott.Delinger at ualberta.ca  Tue Mar 13 08:15:02 2001
From: Scott.Delinger at ualberta.ca (Scott L. Delinger)
Date: Tue, 13 Mar 2001 09:15:02 -0700
Subject: 10/100 NICs
In-Reply-To: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx>
References: <PDEGIMOMHBNEFINLLBMPGEBACAAA.jalton@olsh.cx>
Message-ID: <p05010404b6d3f4a73b59@[129.128.2.254]>

At 16.44 -0800 2001.3.12, James Alton wrote:
>Is there a page that shows benchmarks of 10/100 cards?

http://www.netperf.org/

Browse the database for Fast Ethernet.

-- 

Scott L. Delinger, Ph.D.
Senior System Administrator/Interim IT Manager
Department of Chemistry, University of Alberta
Edmonton, Alberta, Canada  T6G 2G2
Scott.Delinger at ualberta.ca


From chris at ambigc.com  Tue Mar 13 09:07:27 2001
From: chris at ambigc.com (Chris Hamilton)
Date: Wed, 14 Mar 2001 01:07:27 +0800
Subject: 2.2.18 with updated bonding patch acting wierd
Message-ID: <010c01c0abe0$18a71140$0c0a0a0a@SNARF>

I have 2 computers with 3 3com 3c905s on each Abit KA7 (Athlon) motherboard.
One of the three cards eth0  on each node is the Interoffice connection for
direct access to the node (to use the node as a regular Linux system).  The
other two are bonded on a cheap vlan'ed switch for cluster communication.
The network cards are not sharing interrupts, though they are with other
components. I have also tried it with the bonded cards sharing an irq and
got the same results.

The 2.2.18 kernel is patched with devfs, mosix, the tcp-patch-for-2.2.17-14
tcp_nodelay type patch, and the latest 2.2.18 patch for bonding.

When I build the bonding and ethernet into the kernel I get a network result
similar to http://www.beowulf.org/pipermail/beowulf/2000-October/010325.html
in that the connections seem to only accidentally see each other.  The
switch has a large amount of activity.

Now what is really interesting is that I then proceeded to put all 4 bonded
Ethernets (2 per 2 computers) on to the same vlan. Presto I have connections
with 125Mbps TCP and 175Mbps UDP according to netperf.

Now my questions:
Why are they even working on the same lan?  Are they falling into the mode 1
i.e.. backup and not round robin?
Why won't they work separated on by vlans?
Why is my TCP so crappy? -- the interoffice connections ran through the same
switch gives 95Mbps.
Why does the if* tools hang and the network fail to connect (though ifconfig
successfully set up the network earlier) when I make bonding and 3c59x
modules and not monolithic?

Thank you for any insights,

Chris Hamilton


From diehl at borg.umn.edu  Tue Mar 13 10:31:35 2001
From: diehl at borg.umn.edu (Jim Diehl)
Date: Tue, 13 Mar 2001 12:31:35 -0600 (CST)
Subject: Gig-E equipment suggestions
Message-ID: <Pine.BSF.4.32.0103131218540.16998-100000@congo.borg.umn.edu>

Hello,

I am a student with the University of Minnesota Fibre Channel research
group.  We are currently comparing FC SANs to Ethernet-based storage
methods using Linux.  In order to have a fair (gigabit speed) comparison
we need to purchase some Gig-E equipment.

We are looking for one 7 or 8 port Gig-E switch and 5 or 6 GNICs.  These
GNICs must have Linux drivers, of course.

I have compiled a list and would greatly appreaciate any comments or
suggestions (on this list or elsewhere) regarding what equipment to
purchase.

GigE NICs (all with some kind of on-board TCP/IP offloading)
    Netgear GA620[T] (same linux driver as 3com)  $325
    D-Link DGE-500[sx/t] (linux support??)  $265 (or $120 for copper)
    3Com 3C985B-sx (same driver as netgear)  $640
    Intel PRO/1000 F (on sale 2 for 1)  $550 (for 2)

And some copper and optical switches: Switches:
    Netgear GS504 (optical)  4-port $1400-1500
    Netgear GS504T (copper)  4-port $900-1000
    D-Link DGS-3204 4-port $1300-1400
    D-Link DGS-3208F 8-port $2700
    3Com 4900sx Superstack 12-port $4600
    Intel OpenBox something 7-port  $6500


I know the Netgear and 3Com cards have drivers available at
http://jes.home.cern.ch/jes/gige/acenic.html
but I'm not sure about the D-Link card.

Do you recommend copper (cat5) or optical versions?  Is there such a thing
as an 8-port copper GigE switch (I can't find them on the various vendor's
sites)?


Thank you for your time and I look forward to any of your input.


Jim Diehl
University of Minnesota Fibre Channel Group
www.borg.umn.edu/fc


From Eugene.Leitl at lrz.uni-muenchen.de  Tue Mar 13 13:15:36 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de)
Date: Tue, 13 Mar 2001 22:15:36 +0100
Subject: [Fwd: CCL:Short Budget]
Message-ID: <3AAE8DF8.FD28E4E@lrz.uni-muenchen.de>


-------- Original Message --------
From: "Familia Esguerra Neira" <esguerra at mentecolectiva.com>
Subject: CCL:Short Budget
To: chemistry at ccl.net


Hello all,


I would like to ask for your advice on buying a PC for running gaussian 98
and gamess-us jobs under Slackware Linux 7.1. My budget is U$4000 and I
have been looking at Dell computers for the fastest machine I can get with
this money.

I am undecided on whether to buy a dual pentium III 933MHz or a single
pentium IV 1.5 GHz processor both with 1Gb PC800 RDRAM; if any of you out
there has any information as to which of this two options would work
better, or other suggestion on how to invest my money in order to obtain
the fastest performing machine I would greatly appreciate it.

Thanking your advice,

Mauricio Esguerra Neira
Grupo de Quimica Teorica
Universidad Nacional de Colombia


-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search    |    Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu


From dvos12 at calvin.edu  Tue Mar 13 16:18:13 2001
From: dvos12 at calvin.edu (David Vos)
Date: Tue, 13 Mar 2001 19:18:13 -0500 (EST)
Subject: Dual Athlon
In-Reply-To: <Pine.LNX.4.30.0103130641230.19514-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.GSO.4.21.0103131916010.103-100000@ursa.calvin.edu>

I've had combatibility problems between OpenSSH and ssh.com's
implementation.  I had two linux boxen that could telnet back and forth,
but could not ssh.  I put ssh.com's on both and the problem went away.

David

On Tue, 13 Mar 2001, Robert G. Brown wrote:

> On Tue, 13 Mar 2001, Mofeed Shahin wrote:
> 
> > So Robert, when are you going to let us know the results of the Dual Athlon ?
> > :-)
> >
> > Mof.
> 
> They got my account setup yesterday, but for some reason I'm having a
> hard time connecting via ssh (it's rejecting my password).  We've tried
> both a password they sent me and an MD5 crypt I sent them. Very strange
> -- I use OpenSSH routinely to connect all over the place so I'm
> reasonably sure my client is OK.  Anyway, I expect it is something
> trivial and that I'll get in sometime this morning.  I spent the time
> yesterday that I couldn't get in profitably anyway packaging stream and
> a benchmark sent to me by Thomas Guignol of the list up into make-ready
> tarball/RPM's.  At the moment my list looks something like:
> 
> stream
> guignol
> cpu-rate
> lmbench (ass'td)
> LAM/MPI plus two benchmarks (Josip and Doug each suggested one)
> EPCC OpenMP microbenchmarks (probably with PGI)
> possibly some fft timings (Martin Seigert)
> 
> in roughly that order, depending on how much time I get and how well
> things go.  I'm going to TRY to build a page with all the tests I used
> in tarball/rpm form, results, and commentary.
> 
>    rgb
> 
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


From jakob at unthought.net  Tue Mar 13 16:29:08 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Wed, 14 Mar 2001 01:29:08 +0100
Subject: [Announce] ANTS-0.4.10
Message-ID: <20010314012908.B16797@unthought.net>

Hi everyone !

A long time ago, in a galaxy far far away, I announced a job distribution
system called "jobd".

That name clashed with another project, so I changed the name to ANTS
(Autonomous Networked Task Scheduler).  This name also reflects that my system
does not use a central scheduler or server - all nodes are equal.

The system accepts jobs (such as compile jobs) on the local node, allocates
a job-slot on the best suited node, and remotely executes the job there. Think
of it as a "clever rsh".

I use the system in production for large compile jobs. I can compile the
Linux-2.4.2 kernel for i686 in 73 seconds on a 9-cpu cluster.

Documentation, .tar.gz and .src.rpm is available at http://unthought.net/antsd/

Comments, suggestions, questions, contributions etc. are welcome of course.

Cheers,

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:


From fmuldoo at alpha2.eng.lsu.edu  Wed Mar 14 16:45:18 2001
From: fmuldoo at alpha2.eng.lsu.edu (Frank Muldoon)
Date: Wed, 14 Mar 2001 18:45:18 -0600
Subject: Fortran 90 and BeoMPI
Message-ID: <3AB0109E.B0116669@me.lsu.edu>

Does anyone out there use F90 with BeoMPI ?  I have set up a cluster and
gotten BeoMPI to work with f77 i.e. g77.  I am having problems with the
link part of the step.  I have edited the supplied mpif90 script and
haven't had any more luck then trying "mpif90 -f90=lf95 myfile.f90".  I
get error messages about being unable to find the mpi routines in the
link stage.

Thanks,
Frank

--
Frank Muldoon
Computational Fluid Dynamics Research Group
Louisiana State University
Baton Rouge, LA 70803
225-344-7676 (h)
225-578-5217 (w)


From rgb at phy.duke.edu  Tue Mar 13 17:23:14 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 13 Mar 2001 20:23:14 -0500 (EST)
Subject: Dual Athlon
In-Reply-To: <Pine.GSO.4.21.0103131916010.103-100000@ursa.calvin.edu>
Message-ID: <Pine.LNX.4.30.0103132003080.12236-100000@lucifer.rgb.private.net>

On Tue, 13 Mar 2001, David Vos wrote:

> I've had combatibility problems between OpenSSH and ssh.com's
> implementation.  I had two linux boxen that could telnet back and forth,
> but could not ssh.  I put ssh.com's on both and the problem went away.

I've experienced similar things in the past, but ssh -v indicates:

debug: Remote protocol version 1.99, remote software version OpenSSH_2.3.0p1
debug: no match: OpenSSH_2.3.0p1
Enabling compatibility mode for protocol 2.0
debug: Local version string SSH-2.0-OpenSSH_2.3.0p1

which suggests that they are using OpenSSH also, albeit a slightly
earlier revision.  The rest of the verbose handshaking proceeds
perfectly up to password entry:

debug: authentications that can continue: publickey,password
debug: next auth method to try is publickey
debug: next auth method to try is password
rgb at dual's password:
debug: authentications that can continue: publickey,password
debug: next auth method to try is password
Permission denied, please try again.
rgb at dual's password:
rgb at lucifer|T:113>

(where I've tried typing my password and the password for the other
account they tried to roll for me maybe fifty times by now -- it is
impossible that I'm mistyping).  I'm pretty well stuck at this point
until they unstick me.  I'd get exactly the same "Permission denied"
message if the login fails because my account doesn't really exist and
I'm warped into NOUSER or if there really is a Failed password or if the
account exists but has e.g. a bad shell or bad /etc/passwd file entry.

I can debug this sort of thing in five minutes on my own system, but I'm
at their mercy on theirs.  So far today, the guy I wrote to suggest a
few simple tests (like him trying to login and/or ssh to my account with
the same password they gave me) hasn't responded at all.  I'll give them
until tomorrow and then I'll try escalating a bit.

   rgb

>
> David
>
> On Tue, 13 Mar 2001, Robert G. Brown wrote:
>
> > On Tue, 13 Mar 2001, Mofeed Shahin wrote:
> >
> > > So Robert, when are you going to let us know the results of the Dual Athlon ?
> > > :-)
> > >
> > > Mof.
> >
> > They got my account setup yesterday, but for some reason I'm having a
> > hard time connecting via ssh (it's rejecting my password).  We've tried
> > both a password they sent me and an MD5 crypt I sent them. Very strange
> > -- I use OpenSSH routinely to connect all over the place so I'm
> > reasonably sure my client is OK.  Anyway, I expect it is something
> > trivial and that I'll get in sometime this morning.  I spent the time
> > yesterday that I couldn't get in profitably anyway packaging stream and
> > a benchmark sent to me by Thomas Guignol of the list up into make-ready
> > tarball/RPM's.  At the moment my list looks something like:
> >
> > stream
> > guignol
> > cpu-rate
> > lmbench (ass'td)
> > LAM/MPI plus two benchmarks (Josip and Doug each suggested one)
> > EPCC OpenMP microbenchmarks (probably with PGI)
> > possibly some fft timings (Martin Seigert)
> >
> > in roughly that order, depending on how much time I get and how well
> > things go.  I'm going to TRY to build a page with all the tests I used
> > in tarball/rpm form, results, and commentary.
> >
> >    rgb
> >
> > --
> > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From newt at scyld.com  Tue Mar 13 18:32:18 2001
From: newt at scyld.com (Daniel Ridge)
Date: Tue, 13 Mar 2001 21:32:18 -0500 (EST)
Subject: Fortran 90 and BeoMPI
In-Reply-To: <3AB0109E.B0116669@me.lsu.edu>
Message-ID: <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com>

Frank,

I would suggest ditching the mpif90 wrapper altogether and just invoke the
Fortran compiler directly.

'lf95 -lmpif myfile.f90' (or equivalent command line)

The point of the compiler wrappers that some MPI vendors ship is usually
one of:

	1. hide the rat's nest of little libraries from the end user

	2. maintain compatibility with an earlier MPI that (see 1)

These reasons leave a bad taste in my mouth. You should be able to treat
MPI like any other library.

Let me know if this works for you.

Regards,
	Dan Ridge
	Scyld Computing Corporation

On Wed, 14 Mar 2001, Frank Muldoon wrote:

> Does anyone out there use F90 with BeoMPI ?  I have set up a cluster and
> gotten BeoMPI to work with f77 i.e. g77.  I am having problems with the
> link part of the step.  I have edited the supplied mpif90 script and
> haven't had any more luck then trying "mpif90 -f90=lf95 myfile.f90".  I
> get error messages about being unable to find the mpi routines in the
> link stage.
> 
> Thanks,
> Frank


From rgb at phy.duke.edu  Tue Mar 13 18:35:07 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 13 Mar 2001 21:35:07 -0500 (EST)
Subject: Dual Athlon
In-Reply-To: <Pine.LNX.4.30.0103132003080.12236-100000@lucifer.rgb.private.net>
Message-ID: <Pine.LNX.4.30.0103132134060.12236-100000@lucifer.rgb.private.net>

On Tue, 13 Mar 2001, Robert G. Brown wrote:

(A bunch of stuff about ssh that is irrelevant -- I'm in!)

I should start accumulating numbers in an hour or two if things go well
from this point.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From cozzi at hertz.rad.nd.edu  Tue Mar 13 18:43:11 2001
From: cozzi at hertz.rad.nd.edu (Marc Cozzi)
Date: Tue, 13 Mar 2001 21:43:11 -0500
Subject: SMP support with the scyld package
Message-ID: <F163413C9250D211A55C0060979D52802A30F0@hertz.rad.nd.edu>

greetings,

I'm considering several dual 1GHz, 1GB Intel/Asus systems. Has anyone
used the Beowulf package from Scyld Computing Corporation with
SMP systems? Does one have to rebuild the kernel to enable SMP
support or is it turned on by default? Are there issues with BProc
and SMP?

Thanks,

Marc Cozzi
Univ. of Notre Dame
cozzi at nd.edu


From lindahl at conservativecomputer.com  Tue Mar 13 19:58:46 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Tue, 13 Mar 2001 22:58:46 -0500
Subject: Fortran 90 and BeoMPI
In-Reply-To: <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com>; from newt@scyld.com on Tue, Mar 13, 2001 at 09:32:18PM -0500
References: <3AB0109E.B0116669@me.lsu.edu> <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com>
Message-ID: <20010313225846.A7961@wumpus.hpti.com>

On Tue, Mar 13, 2001 at 09:32:18PM -0500, Daniel Ridge wrote:

> The point of the compiler wrappers that some MPI vendors ship is usually
> one of:
> 
> 	1. hide the rat's nest of little libraries from the end user
> 
> 	2. maintain compatibility with an earlier MPI that (see 1)

3. Do clever things like support "-real8" and "-int8" compiler
options.

If you never want to do anything clever, sure, you can treat MPI like
any other library. The reality is that Fortran compilers rarely expose
enough information to let you do that, and still support the
combinations of features that users actually use.

-- greg


From newt at scyld.com  Tue Mar 13 21:26:32 2001
From: newt at scyld.com (Daniel Ridge)
Date: Wed, 14 Mar 2001 00:26:32 -0500 (EST)
Subject: SMP support with the scyld package
In-Reply-To: <F163413C9250D211A55C0060979D52802A30F0@hertz.rad.nd.edu>
Message-ID: <Pine.LNX.4.21.0103140023310.19033-100000@eleanor.wdhq.scyld.com>

On Tue, 13 Mar 2001, Marc Cozzi wrote:

> greetings,
> 
> I'm considering several dual 1GHz, 1GB Intel/Asus systems. Has anyone
> used the Beowulf package from Scyld Computing Corporation with
> SMP systems? Does one have to rebuild the kernel to enable SMP
> support or is it turned on by default? Are there issues with BProc
> and SMP?

Scyld's distribution ships with SMP and UP kernels. No problems with
respect to UP/SMP with bproc. You can also mix-n-match with no ill
effects.

Regards,
	Dan Ridge
	Scyld Computing Corporation 


From agrajag at linuxpower.org  Tue Mar 13 21:20:41 2001
From: agrajag at linuxpower.org (Jag)
Date: Tue, 13 Mar 2001 21:20:41 -0800
Subject: SMP support with the scyld package
In-Reply-To: <F163413C9250D211A55C0060979D52802A30F0@hertz.rad.nd.edu>; from cozzi@hertz.rad.nd.edu on Tue, Mar 13, 2001 at 09:43:11PM -0500
References: <F163413C9250D211A55C0060979D52802A30F0@hertz.rad.nd.edu>
Message-ID: <20010313212041.S7935@kotako.analogself.com>

On Tue, 13 Mar 2001, Marc Cozzi wrote:

> greetings,
> 
> I'm considering several dual 1GHz, 1GB Intel/Asus systems. Has anyone
> used the Beowulf package from Scyld Computing Corporation with
> SMP systems? Does one have to rebuild the kernel to enable SMP
> support or is it turned on by default? Are there issues with BProc
> and SMP?

Scyld ships UP and SMP kernel.  I have a cluster that is running the SMP
kernel (although the machines only have on processer per at the moment).
Everything works fine with the one caveat that before you make the node
boot image with beosetup, you have to make sure /boot/vmlinuz is
pointing to the SMP kernel (that or specify a different kernel when
making the image in beosetup).  My install was done as an overlay
install, I'm not sure if you use Scyld's modified anaconda on the CD if
it will do that correctly or not.

BProc will still treat each machine as one node even if it has two
processors in it.  However, I believe that beompi does understand the
concept of multiple processors per node and can work with it.
Unfortunately I don't have a cluster of SMP machines, so I haven't been
able to really test that.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010313/d4c78ee4/attachment.sig>

From tlovie at pokey.mine.nu  Tue Mar 13 21:35:18 2001
From: tlovie at pokey.mine.nu (Thomas Lovie)
Date: Wed, 14 Mar 2001 00:35:18 -0500
Subject: Dual Athlon
In-Reply-To: <Pine.GSO.4.21.0103131916010.103-100000@ursa.calvin.edu>
Message-ID: <NDBBLHMKNAFPMALFDAGJAEIPCLAA.tlovie@pokey.mine.nu>

I had this problem too, what you have to do is compile OpenSSH with the
--with-md5-passwords configure option.  It appears that this isn't turned on
by default.  Then everything works fine, since it appears that OpenSSH is
trying to do a standard DES crypt then comparing this value to the MD5 in
the shadow file.

I have complete compatability between OpenSSH and SSH.com's version 2 of the
protocol.

Tom Lovie.

-----Original Message-----
From: beowulf-admin at beowulf.org [mailto:beowulf-admin at beowulf.org]On
Behalf Of David Vos
Sent: Tuesday, March 13, 2001 7:18 PM
To: Robert G. Brown
Cc: Beowulf Mailing List
Subject: Re: Dual Athlon


I've had combatibility problems between OpenSSH and ssh.com's
implementation.  I had two linux boxen that could telnet back and forth,
but could not ssh.  I put ssh.com's on both and the problem went away.

David

On Tue, 13 Mar 2001, Robert G. Brown wrote:

> On Tue, 13 Mar 2001, Mofeed Shahin wrote:
>
> > So Robert, when are you going to let us know the results of the Dual
Athlon ?
> > :-)
> >
> > Mof.
>
> They got my account setup yesterday, but for some reason I'm having a
> hard time connecting via ssh (it's rejecting my password).  We've tried
> both a password they sent me and an MD5 crypt I sent them. Very strange
> -- I use OpenSSH routinely to connect all over the place so I'm
> reasonably sure my client is OK.  Anyway, I expect it is something
> trivial and that I'll get in sometime this morning.  I spent the time
> yesterday that I couldn't get in profitably anyway packaging stream and
> a benchmark sent to me by Thomas Guignol of the list up into make-ready
> tarball/RPM's.  At the moment my list looks something like:
>
> stream
> guignol
> cpu-rate
> lmbench (ass'td)
> LAM/MPI plus two benchmarks (Josip and Doug each suggested one)
> EPCC OpenMP microbenchmarks (probably with PGI)
> possibly some fft timings (Martin Seigert)
>
> in roughly that order, depending on how much time I get and how well
> things go.  I'm going to TRY to build a page with all the tests I used
> in tarball/rpm form, results, and commentary.
>
>    rgb
>
> --
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From edwards at icantbelieveimdoingthis.com  Tue Mar 13 22:31:51 2001
From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834)
Date: Tue, 13 Mar 2001 23:31:51 -0700
Subject: [Fwd: CCL:Short Budget]
References: <3AAE8DF8.FD28E4E@lrz.uni-muenchen.de>
Message-ID: <3AAF1057.1020307@icantbelieveimdoingthis.com>

Eugene.Leitl at lrz.uni-muenchen.de wrote:

> -------- Original Message --------
> From: "Familia Esguerra Neira" <esguerra at mentecolectiva.com>
> Subject: CCL:Short Budget
> To: chemistry at ccl.net
> 
> 
> Hello all,
> 
> 
> I would like to ask for your advice on buying a PC for running gaussian 98
> and gamess-us jobs under Slackware Linux 7.1. My budget is U$4000 and I
> have been looking at Dell computers for the fastest machine I can get with
> this money.
> 
> I am undecided on whether to buy a dual pentium III 933MHz or a single
> pentium IV 1.5 GHz processor both with 1Gb PC800 RDRAM; if any of you out
> there has any information as to which of this two options would work
> better, or other suggestion on how to invest my money in order to obtain
> the fastest performing machine I would greatly appreciate it.
> 
> Thanking your advice,
> 
> Mauricio Esguerra Neira
> Grupo de Quimica Teorica
> Universidad Nacional de Colombia
> 
> 
> 
> 
> 
> -= This is automatically added to each message by mailing script =-
> CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
> MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
> CHEMISTRY-SEARCH at ccl.net -- archive search    |    Gopher: gopher.ccl.net 70
> Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> 

I'm running Guassian-98 and GAMESS-US (also GAMESS-UK). I'm using athlon 
750's that behave like intel's running at 1.15 times the clock speed. 
Price/performance dictatest that I use athlon's. By the way, I've been 
using Debian very successfully with these software packages.

Art Edwards


From fmuldoo at alpha2.eng.lsu.edu  Thu Mar 15 00:26:18 2001
From: fmuldoo at alpha2.eng.lsu.edu (Frank Muldoon)
Date: Thu, 15 Mar 2001 02:26:18 -0600
Subject: Fortran 90 and BeoMPI
References: <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com>
Message-ID: <3AB07CAA.507E39BD@me.lsu.edu>

I just got done trying linking directly to the mpi libraries using 2 F95 compilers (Lahey & NAG).  Both behave the
same way as before (output below).  I was under the impression that it was often necessary to have separate builds
for f90/f95 and f77.  For instance the MPICH install guide says "During configuration, a number of F90-specific
arguments can be specified. See the output of configure -help. In particular, when using the NAG Fortran 90
compiler, you whould specify -f90nag."

Thanks,
Frank


[root at cfd1 temp]# /usr/local/NAGf95/bin/f95 -lmpif /root/temp/mpi_heat.f90
Extension: /usr/include/mpi-beowulf/mpif.h, line 233: Byte count on numeric data type
           detected at *@8
Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol TIME_INTEGRATION
         detected at END@<end-of-statement>
Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol SUM_RES
         detected at END@<end-of-statement>
Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol COMM1D
         detected at END@<end-of-statement>
Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol NID
         detected at END@<end-of-statement>
[f95 continuing despite warning messages]
Deleted feature used: /root/temp/mpi_heat.f90, line 65: PAUSE statement
Deleted feature used: /root/temp/mpi_heat.f90, line 66: PAUSE statement
Deleted feature used: /root/temp/mpi_heat.f90, line 67: PAUSE statement
Deleted feature used: /root/temp/mpi_heat.f90, line 68: PAUSE statement
Deleted feature used: /root/temp/mpi_heat.f90, line 103: PAUSE statement
mpi_heat.o: In function `main':
mpi_heat.o(.text+0x7e): undefined reference to `mpi_init_'
mpi_heat.o(.text+0xc7): undefined reference to `mpi_comm_size_'
mpi_heat.o(.text+0xe8): undefined reference to `mpi_comm_rank_'
mpi_heat.o(.text+0x486): undefined reference to `mpi_barrier_'
mpi_heat.o(.text+0x786): undefined reference to `mpi_isend_'
mpi_heat.o(.text+0x801): undefined reference to `mpi_isend_'
mpi_heat.o(.text+0x886): undefined reference to `mpi_irecv_'
mpi_heat.o(.text+0x90b): undefined reference to `mpi_irecv_'
mpi_heat.o(.text+0xbe1): undefined reference to `mpi_wait_'
mpi_heat.o(.text+0xc1a): undefined reference to `mpi_wait_'
mpi_heat.o(.text+0xc53): undefined reference to `mpi_wait_'
mpi_heat.o(.text+0xc8c): undefined reference to `mpi_wait_'
mpi_heat.o(.text+0x19c6): undefined reference to `mpi_reduce_'
mpi_heat.o(.text+0x1ac9): undefined reference to `mpi_finalize_'
/usr/bin/../lib/libmpif.so: undefined reference to `getarg_'
/usr/bin/../lib/libmpif.so: undefined reference to `f__xargc'
collect2: ld returned 1 exit status
[root at cfd1 temp]#
[root at cfd1 temp]#
[root at cfd1 temp]#
[root at cfd1 temp]# lf95 -lmpif /root/temp/mpi_heat.f90
Compiling file /root/temp/mpi_heat.f90.
Compiling program unit main at line 1:
mpi_heat.o: In function `SSN4':
mpi_heat.o(.text+0x3d): undefined reference to `mpi_init_'
mpi_heat.o: In function `SSN6':
mpi_heat.o(.text+0x61): undefined reference to `mpi_comm_size_'
mpi_heat.o: In function `SSN7':
mpi_heat.o(.text+0x78): undefined reference to `mpi_comm_rank_'
mpi_heat.o: In function `SSN17':
mpi_heat.o(.text+0x28d): undefined reference to `mpi_barrier_'
mpi_heat.o: In function `SSN22':
mpi_heat.o(.text+0x6f5): undefined reference to `mpi_isend_'
mpi_heat.o: In function `SSN23':
mpi_heat.o(.text+0x738): undefined reference to `mpi_isend_'
mpi_heat.o: In function `SSN24':
mpi_heat.o(.text+0x76e): undefined reference to `mpi_irecv_'
mpi_heat.o: In function `SSN25':
mpi_heat.o(.text+0x7b1): undefined reference to `mpi_irecv_'
mpi_heat.o: In function `SSN27':
mpi_heat.o(.text+0x97a): undefined reference to `mpi_wait_'
mpi_heat.o: In function `SSN28':
mpi_heat.o(.text+0x9b0): undefined reference to `mpi_wait_'
mpi_heat.o: In function `SSN29':
mpi_heat.o(.text+0x9e6): undefined reference to `mpi_wait_'
mpi_heat.o: In function `SSN30':
mpi_heat.o(.text+0xa1c): undefined reference to `mpi_wait_'
mpi_heat.o: In function `SSN46':
mpi_heat.o(.text+0x11d2): undefined reference to `mpi_reduce_'
mpi_heat.o: In function `SSN50':
mpi_heat.o(.text+0x12ed): undefined reference to `mpi_finalize_'
mpi_heat.o(.data+0x0): undefined reference to `mpi_finalize_'
mpi_heat.o(.data+0x4): undefined reference to `mpi_reduce_'
mpi_heat.o(.data+0x8): undefined reference to `mpi_wait_'
mpi_heat.o(.data+0xc): undefined reference to `mpi_irecv_'
mpi_heat.o(.data+0x10): undefined reference to `mpi_isend_'
mpi_heat.o(.data+0x14): undefined reference to `mpi_barrier_'
mpi_heat.o(.data+0x18): undefined reference to `mpi_comm_rank_'
mpi_heat.o(.data+0x1c): undefined reference to `mpi_comm_size_'
mpi_heat.o(.data+0x20): undefined reference to `mpi_init_'
mpi_heat.o(.data+0x24): undefined reference to `mpi_wtime_'
mpi_heat.o(.data+0x28): undefined reference to `mpi_wtick_'
mpi_heat.o(.data+0x2c): undefined reference to `mpi_null_copy_fn_'
mpi_heat.o(.data+0x30): undefined reference to `mpi_null_delete_fn_'
mpi_heat.o(.data+0x34): undefined reference to `mpi_dup_fn_'
/usr/bin/../lib/libmpif.so: undefined reference to `f__xargc'

--
Frank Muldoon
Computational Fluid Dynamics Research Group
Louisiana State University
Baton Rouge, LA 70803
225-344-7676 (h)
225-388-5217 (w)


From jcownie at etnus.com  Wed Mar 14 02:14:10 2001
From: jcownie at etnus.com (James Cownie)
Date: Wed, 14 Mar 2001 10:14:10 +0000
Subject: Fortran 90 and BeoMPI 
In-Reply-To: Your message of "Tue, 13 Mar 2001 21:32:18 EST."
             <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com> 
Message-ID: <14d8IE-0FG-00@etnus.com>

Newt wrote :-

> The point of the compiler wrappers that some MPI vendors ship is usually
> one of:
> 
> 	1. hide the rat's nest of little libraries from the end user
> 
> 	2. maintain compatibility with an earlier MPI that (see 1)
> 
> These reasons leave a bad taste in my mouth. You should be able to treat
> MPI like any other library.

I think you missed the main reason that the MPICH folks, at least,
implemented wrappers for the compilers which is :-

0. Give a consistent command for compiling MPI codes no matter which
   platform you are currently working on.

There is a class of people (particularly those at the "National Labs")
who have accounts on many different machines, and support code on lots
of them. To these people having a common way of asking to compile
an MPI code is a big win. For most of us, who only have one machine to
worry about, this is less important, of course.

-- Jim 

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com


From Eugene.Leitl at lrz.uni-muenchen.de  Wed Mar 14 02:54:23 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Wed, 14 Mar 2001 11:54:23 +0100 (MET)
Subject: 630/730 Website (fwd)
Message-ID: <Pine.GSO.4.03.10103141153530.17226-100000@sun1.lrz-muenchen.de>

---------- Forwarded message ----------
Date: Wed, 14 Mar 2001 10:49:29 +1000
From: Brian Stephenson <brian0029 at bigpond.com>
To: Linuxbios <linuxbios at lanl.gov>
Subject: 630/730 Website

I started to put together a website for SiS/Linuxbios users, it is at
http://www.users.bigpond.com/brian0029  I started on it about a
week ago but I haven't been keeping up with Linuxbios so I'm not
very current with things, if anyone is interested in helping let me
know, it needs an up to date howto and checking for errors and
other things, it seems to display differently in my netscape and explorer,
anyway its just a begining.

Regards
Brian


From cozzi at hertz.rad.nd.edu  Wed Mar 14 03:54:24 2001
From: cozzi at hertz.rad.nd.edu (Marc Cozzi)
Date: Wed, 14 Mar 2001 06:54:24 -0500
Subject: SMP support with the scyld package/codine/NFS
Message-ID: <F163413C9250D211A55C0060979D52802A30F1@hertz.rad.nd.edu>

WOW, that was a fast response! Almost as fast as the
DEC True-64 managers list.

Thanks for the replies Dan, Jag.

Other questions I have with this configuration have to do with
operating in the current environment. I currently have lots of
SUN 420R SMP systems, IRIX, AIX, and DECs running behind a firewall.
The SUN systems use the Codine batch job scheduling submission
software (now from SUN) previously from Gridware. All though
somewhat limited, works well in this shop. SUN has recently released
a version for Linux with claims to support more platforms in the
near future, (IRIX, True-64, AIX...) SUN is making the Codine
software available at no cost!! Also seems very stable...

Also used with all these systems is a common user file system
NFS mounted on all boxes. User authentication is via NIS running
on the SUN Solaris 8 systems. What documentation I could find for
the Scyld software indicates that a master box must be setup with
two Ethernets. One pointing to the "outside" and the other to the
"inside". I assume this is running ipchains/ipforward and acting
somewhat like a firewall. Is this going to cause problems/prevent
me from using the existing NFS mounts and NIS authentication scheme?
Can I just bring up all the Scyld nodes, including master Scyld system,
on the internal network?

Once again, thanks for all the experts help and suggestions.

  marc


-----Original Message-----
From: Daniel Ridge [mailto:newt at scyld.com]
Sent: Wednesday, March 14, 2001 12:27 AM
To: Marc Cozzi
Cc: 'beowulf at beowulf.org'
Subject: Re: SMP support with the scyld package


On Tue, 13 Mar 2001, Marc Cozzi wrote:

> greetings,
> 
> I'm considering several dual 1GHz, 1GB Intel/Asus systems. Has anyone
> used the Beowulf package from Scyld Computing Corporation with
> SMP systems? Does one have to rebuild the kernel to enable SMP
> support or is it turned on by default? Are there issues with BProc
> and SMP?

Scyld's distribution ships with SMP and UP kernels. No problems with
respect to UP/SMP with bproc. You can also mix-n-match with no ill
effects.

Regards,
	Dan Ridge
	Scyld Computing Corporation 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From lowther at att.net  Wed Mar 14 04:13:52 2001
From: lowther at att.net (lowther at att.net)
Date: Wed, 14 Mar 2001 07:13:52 -0500
Subject: SMP support with the scyld package/codine/NFS
References: <F163413C9250D211A55C0060979D52802A30F1@hertz.rad.nd.edu>
Message-ID: <3AAF6080.993A3180@att.net>

Marc Cozzi wrote:
> 
 What documentation I could find for
> the Scyld software indicates that a master box must be setup with
> two Ethernets. One pointing to the "outside" and the other to the
> "inside". 

You can alias your eth1 to eth0.  I did it once.  Just don't remember
how at the moment.

Ken


From andreas at amy.udd.htu.se  Wed Mar 14 06:53:31 2001
From: andreas at amy.udd.htu.se (Andreas Boklund)
Date: Wed, 14 Mar 2001 15:53:31 +0100 (CET)
Subject: SMP support with the scyld package/codine/NFS
In-Reply-To: <F163413C9250D211A55C0060979D52802A30F1@hertz.rad.nd.edu>
Message-ID: <Pine.LNX.4.21.0103141540550.3734-100000@amy.udd.htu.se>

My cluster is not runnig Scyld but the univeristy is using NIS from an
IRIX and loads the home volume from another IRIX(NFS), and some of my
applications resides on a Quad LINUX from Dell. 

I have set up a simple iptables firewall/mascuerading on my master node
that masquerades the nodes from the rest of the world, including
the NFS/NIS servers. But still it lets log on the the nodes via the NIS
and mount all NFS volumes that i like.

The interesting part is the IP-forwarding and Masquerading section.
The rest is just that i dont want ppl to get access to my kluster, a
computer lab has access to the unix network. So just compile a kernel with 
iptables (2.4.x) or do the same stuff with the 2.2 version of the firewall
code.

What it goes for Scyld i have managed to use only one interface by just
changing the value (in the preferences tab i think) from eth1 to eth0.
After that i could assign my nodes real world ip-adresses and allow them
to contact other computers. well it seemed to work for me, never did that
much testing though.

Good luck
//Andreas
PS. Feel free to comment my config options, if you have any ideas of
improvement :)


***** The start section of my Netfilter script in /etc/rc.d/init.d ****

<SNIP>
echo "Turning on IP-forwarding & Masquerading:"
          echo 1 > /proc/sys/net/ipv4/ip_forward
          iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

        echo "Starting to filter packets"

                ## Stop all packets that comes in on the wrong interface
                for file in /proc/sys/net/ipv4/conf/*/rp_filter; do
                        echo 1 > $file
                done

                # Open a few ports
                iptables -A INPUT -p TCP --destination-port 21 -i eth1 -j ACCEPT #ftp
                iptables -A INPUT -p UDP --destination-port 21 -i eth1 -j ACCEPT #ftp
                iptables -A INPUT -p TCP --destination-port 22 -i eth1 -j ACCEPT #ssh
                iptables -A INPUT -p UDP --destination-port 22 -i eth1 -j ACCEPT #ssh
                iptables -A INPUT -p TCP --destination-port 53 -i eth1 -j ACCEPT #DNS
                iptables -A INPUT -p UDP --destination-port 53 -i eth1 -j ACCEPT #DNS
                iptables -A INPUT -p TCP --destination-port 111 -i eth1 -j ACCEPT #NIS/YP
                iptables -A INPUT -p UDP --destination-port 111 -i eth1 -j ACCEPT #NIS/YP
                iptables -A INPUT -p UDP --destination-port 2049 -i eth1 -j ACCEPT #NFS

                # Allow old/inside connections to reach out and get answers
                iptables -A INPUT -m state --state ESTABLISHED,RELATED -i eth1 -j ACCEPT

                # open ICMP, let ppl ping the master
                iptables -A INPUT -p ICMP -i eth1 -j ACCEPT

                # Drop all now unmatched packets
                iptables -A INPUT -i eth1 -j DROP
</SNIP>


On Wed, 14 Mar 2001, Marc Cozzi wrote:

> WOW, that was a fast response! Almost as fast as the
> DEC True-64 managers list.
> 
> Thanks for the replies Dan, Jag.
> 
> Other questions I have with this configuration have to do with
> operating in the current environment. I currently have lots of
> SUN 420R SMP systems, IRIX, AIX, and DECs running behind a firewall.
> The SUN systems use the Codine batch job scheduling submission
> software (now from SUN) previously from Gridware. All though
> somewhat limited, works well in this shop. SUN has recently released
> a version for Linux with claims to support more platforms in the
> near future, (IRIX, True-64, AIX...) SUN is making the Codine
> software available at no cost!! Also seems very stable...
> 
> Also used with all these systems is a common user file system
> NFS mounted on all boxes. User authentication is via NIS running
> on the SUN Solaris 8 systems. What documentation I could find for
> the Scyld software indicates that a master box must be setup with
> two Ethernets. One pointing to the "outside" and the other to the
> "inside". I assume this is running ipchains/ipforward and acting
> somewhat like a firewall. Is this going to cause problems/prevent
> me from using the existing NFS mounts and NIS authentication scheme?
> Can I just bring up all the Scyld nodes, including master Scyld system,
> on the internal network?
> 
> Once again, thanks for all the experts help and suggestions.
> 
>   marc
> 
> 
> -----Original Message-----
> From: Daniel Ridge [mailto:newt at scyld.com]
> Sent: Wednesday, March 14, 2001 12:27 AM
> To: Marc Cozzi
> Cc: 'beowulf at beowulf.org'
> Subject: Re: SMP support with the scyld package
> 
> 
> 
> On Tue, 13 Mar 2001, Marc Cozzi wrote:
> 
> > greetings,
> > 
> > I'm considering several dual 1GHz, 1GB Intel/Asus systems. Has anyone
> > used the Beowulf package from Scyld Computing Corporation with
> > SMP systems? Does one have to rebuild the kernel to enable SMP
> > support or is it turned on by default? Are there issues with BProc
> > and SMP?
> 
> Scyld's distribution ships with SMP and UP kernels. No problems with
> respect to UP/SMP with bproc. You can also mix-n-match with no ill
> effects.
> 
> Regards,
> 	Dan Ridge
> 	Scyld Computing Corporation 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
*********************************************************
*   Administator of Amy and Sfinx(Iris23)               *
*                                                       *
*   Voice: 070-7294401                                  *
*   ICQ: 12030399                                       *
*   Email: andreas at shtu.htu.se, boklund at linux.nu        *
*                                                       *
*   That is how you find me, How do -I- find you ?      *
*********************************************************


From newt at scyld.com  Wed Mar 14 06:55:54 2001
From: newt at scyld.com (Daniel Ridge)
Date: Wed, 14 Mar 2001 09:55:54 -0500 (EST)
Subject: Fortran 90 and BeoMPI 
In-Reply-To: <14d8IE-0FG-00@etnus.com>
Message-ID: <Pine.LNX.4.21.0103140916170.19014-100000@eleanor.wdhq.scyld.com>

On Wed, 14 Mar 2001, James Cownie wrote:

> Newt wrote :-
> 
> > The point of the compiler wrappers that some MPI vendors ship is usually
> > one of:
> > 
> > 	1. hide the rat's nest of little libraries from the end user
> > 
> > 	2. maintain compatibility with an earlier MPI that (see 1)
> > 
> > These reasons leave a bad taste in my mouth. You should be able to treat
> > MPI like any other library.
> 
> I think you missed the main reason that the MPICH folks, at least,
> implemented wrappers for the compilers which is :-

No -- I understand completely how this happened. I'm saying that the
MPI spec doesn't seem to require this kind of mechanism (unlink PVM
which explicitly includes its own build system). 

> 0. Give a consistent command for compiling MPI codes no matter which
>    platform you are currently working on.

Right. This is a general software problem. Lots of people need consistient
build environments. I would suggest that people who would be in a
position to require per-platform wranglings for MPI often also need to
perform similar wranglings to accomodate differences in the C library
or in the Fortran environment. MPI compiler wrappers hardly seem like
the right place to accomodate these per-platform differences.

./configure seems much more palatable than a per-library compiler
wrappers. I've seen a number of apps that are MPI enabled and which
supply a configure script which work just fine without using the
compiler wrappers.

What I would like is something straight out of the movie 'Network'.
I would like people to go to their windows, open then, and shout
"I'm mad as hell and I'm not going to take it!"

I think that -- with enough collective cleverness -- we could come up
with a better solution.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From newt at scyld.com  Wed Mar 14 07:07:40 2001
From: newt at scyld.com (Daniel Ridge)
Date: Wed, 14 Mar 2001 10:07:40 -0500 (EST)
Subject: SMP support with the scyld package/codine/NFS
In-Reply-To: <F163413C9250D211A55C0060979D52802A30F1@hertz.rad.nd.edu>
Message-ID: <Pine.LNX.4.21.0103140958500.19014-100000@eleanor.wdhq.scyld.com>

On Wed, 14 Mar 2001, Marc Cozzi wrote:

> WOW, that was a fast response! Almost as fast as the
> DEC True-64 managers list.
> 
> Thanks for the replies Dan, Jag.

No problem. We use a sophisticated message sorting system that runs a
neural network on a Scyld Beowulf to prioritize our responses. :)

> Also used with all these systems is a common user file system
> NFS mounted on all boxes. User authentication is via NIS running
> on the SUN Solaris 8 systems. What documentation I could find for
> the Scyld software indicates that a master box must be setup with
> two Ethernets. One pointing to the "outside" and the other to the
> "inside". I assume this is running ipchains/ipforward and acting
> somewhat like a firewall. Is this going to cause problems/prevent
> me from using the existing NFS mounts and NIS authentication scheme?
> Can I just bring up all the Scyld nodes, including master Scyld system,
> on the internal network?

You can run a Scyld master node with just one ethernet. The easiset way to
do this is to supply "eth0:0" as the Beowulf address. This will overlay
a new IP address on top of eth0's regular address.

You can configure your nodes to participate in site-wide NFS if you like
-- although we don't currently provide any tools to help with this case.

With NIS -- this is something you can run on the frontend but which is
completely unnecessary on the nodes. With the Scyld software, one never
really 'logs in' to the nodes anyway. All jobs are 'pre authenticated' and
run with the UID that the job had on the frontend. If that frontend
happens to use NIS, no problem.

Our goal is for you to be able to treat a Scyld Beowulf system as
a single computer for the purposes site integration.

Regards,
	Dan Ridge
	Scyld Computing Corporation 


From lindahl at conservativecomputer.com  Wed Mar 14 07:24:43 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed, 14 Mar 2001 10:24:43 -0500
Subject: Fortran 90 and BeoMPI
In-Reply-To: <3AB07CAA.507E39BD@me.lsu.edu>; from fmuldoo@alpha2.eng.lsu.edu on Thu, Mar 15, 2001 at 02:26:18AM -0600
References: <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com> <3AB07CAA.507E39BD@me.lsu.edu>
Message-ID: <20010314102443.A9114@wumpus.hpti.com>

On Thu, Mar 15, 2001 at 02:26:18AM -0600, Frank Muldoon wrote:

> mpi_heat.o: In function `main':
> mpi_heat.o(.text+0x7e): undefined reference to `mpi_init_'

Ah. This is the usual problem that MPI was built with the g77
underscore convention (mpi_init__), and NAG is using the "single added
underscore" convention. Can't you give NAG a flag to get it to behave
like g77?

-- g


From bogdan.costescu at iwr.uni-heidelberg.de  Wed Mar 14 07:28:09 2001
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 14 Mar 2001 16:28:09 +0100 (CET)
Subject: Fortran 90 and BeoMPI 
In-Reply-To: <Pine.LNX.4.21.0103140916170.19014-100000@eleanor.wdhq.scyld.com>
Message-ID: <Pine.LNX.4.30.0103141617250.21442-100000@kenzo.iwr.uni-heidelberg.de>

On Wed, 14 Mar 2001, Daniel Ridge wrote:

> ... I would suggest that people who would be in a
> position to require per-platform wranglings for MPI often also need to
> perform similar wranglings to accomodate differences in the C library
> or in the Fortran environment. MPI compiler wrappers hardly seem like
> the right place to accomodate these per-platform differences.

I'd like to disagree 8-)
I have here 2 clusters, one running jobs on top of LAM-MPI and the other
running jobs on top of MPICH on top of SCore. By using mpifxx, I'm able to
compile things without knowing where the include and lib files are and
even more without knowing which is the right order of linking the libs;
just try to do this by hand for LAM-MPI for example!
Another example is the 64 bit platforms where you can compile for both 32
and 64 bit by specifying a simple flag like -32 or -64. Doing linking by
hand means that _I_ have to choose the libraries and I might choose the
wrong one(s) ! Sometimes this is noticed by the linker, but not always...

> ./configure seems much more palatable than a per-library compiler
> wrappers. I've seen a number of apps that are MPI enabled and which
> supply a configure script which work just fine without using the
> compiler wrappers.

Of course, if every software package would be using ./configure,
everything would be easy. Try to convince the maintainers of big Fortran
packages like Gaussian or CHARMM to switch to ./configure 8-(

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


From agrajag at linuxpower.org  Wed Mar 14 07:28:26 2001
From: agrajag at linuxpower.org (Jag)
Date: Wed, 14 Mar 2001 07:28:26 -0800
Subject: Fortran 90 and BeoMPI
In-Reply-To: <Pine.LNX.4.30.0103141617250.21442-100000@kenzo.iwr.uni-heidelberg.de>; from bogdan.costescu@iwr.uni-heidelberg.de on Wed, Mar 14, 2001 at 04:28:09PM +0100
References: <Pine.LNX.4.21.0103140916170.19014-100000@eleanor.wdhq.scyld.com> <Pine.LNX.4.30.0103141617250.21442-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <20010314072825.T7935@kotako.analogself.com>

On Wed, 14 Mar 2001, Bogdan Costescu wrote:

> On Wed, 14 Mar 2001, Daniel Ridge wrote:
> 
> > ... I would suggest that people who would be in a
> > position to require per-platform wranglings for MPI often also need to
> > perform similar wranglings to accomodate differences in the C library
> > or in the Fortran environment. MPI compiler wrappers hardly seem like
> > the right place to accomodate these per-platform differences.
> 
> I'd like to disagree 8-)
> I have here 2 clusters, one running jobs on top of LAM-MPI and the other
> running jobs on top of MPICH on top of SCore. By using mpifxx, I'm able to
> compile things without knowing where the include and lib files are and
> even more without knowing which is the right order of linking the libs;
> just try to do this by hand for LAM-MPI for example!
> Another example is the 64 bit platforms where you can compile for both 32
> and 64 bit by specifying a simple flag like -32 or -64. Doing linking by
> hand means that _I_ have to choose the libraries and I might choose the
> wrong one(s) ! Sometimes this is noticed by the linker, but not always...

Exactly, which is why autoconf (a program commonly used to make the
configure scripts) exists.

> 
> > ./configure seems much more palatable than a per-library compiler
> > wrappers. I've seen a number of apps that are MPI enabled and which
> > supply a configure script which work just fine without using the
> > compiler wrappers.
> 
> Of course, if every software package would be using ./configure,
> everything would be easy. Try to convince the maintainers of big Fortran
> packages like Gaussian or CHARMM to switch to ./configure 8-(

I've never had to compile these programs before, but shouldn't they have
their own configure/Makefile system setup so that you don't have to do
the linking and such by hand?


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010314/9b2ca905/attachment.sig>

From lindahl at conservativecomputer.com  Wed Mar 14 07:40:30 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed, 14 Mar 2001 10:40:30 -0500
Subject: Fortran 90 and BeoMPI
In-Reply-To: <Pine.LNX.4.21.0103140916170.19014-100000@eleanor.wdhq.scyld.com>; from newt@scyld.com on Wed, Mar 14, 2001 at 09:55:54AM -0500
References: <14d8IE-0FG-00@etnus.com> <Pine.LNX.4.21.0103140916170.19014-100000@eleanor.wdhq.scyld.com>
Message-ID: <20010314104030.A9316@wumpus.hpti.com>

On Wed, Mar 14, 2001 at 09:55:54AM -0500, Daniel Ridge wrote:

> What I would like is something straight out of the movie 'Network'.
> I would like people to go to their windows, open then, and shout
> "I'm mad as hell and I'm not going to take it!"
> 
> I think that -- with enough collective cleverness -- we could come up
> with a better solution.

Um, that's what already happened, and the solution is what you see.

-- g


From bogdan.costescu at iwr.uni-heidelberg.de  Wed Mar 14 08:12:38 2001
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Wed, 14 Mar 2001 17:12:38 +0100 (CET)
Subject: Fortran 90 and BeoMPI
In-Reply-To: <20010314072825.T7935@kotako.analogself.com>
Message-ID: <Pine.LNX.4.30.0103141654240.21442-100000@kenzo.iwr.uni-heidelberg.de>

On Wed, 14 Mar 2001, Jag wrote:

> I've never had to compile these programs before,

Lucky you!  8-)

> ... but shouldn't they have their own configure/Makefile system setup so
> that you don't have to do the linking and such by hand?

Given that these are programs that run on a multitude of platforms, they
expect some kind of common denominator. For example, they try to link with
libmpi.a or libmpi.so (-lmpi). Now show me with either LAM-MPI or MPICH
how this would work 8-)
Since late last year, CHARMM has two new install options: LAMMPI and MPICH
which are used for signaling that the Makefile has to be modified to add
the respective libraries, while it functioned for years on non-Linux
platforms with just -lmpi. However, I never use these options: I always
modify the Makefile to replace f77 with mpif77 and I don't care about the
rest.

There is another problem: when you have several compilers installed on the
same system and different MPI libraries compiled for each of them. The
SCore system, for example, provides an option (e.g. mpif77 -fc pgi) in
order to choose which combination of compiler, flags and include/libraries
to use. The same would probably apply to MPI on top of several transport
libraries, each with their own include/libs.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


From dvos12 at calvin.edu  Wed Mar 14 09:02:40 2001
From: dvos12 at calvin.edu (David Vos)
Date: Wed, 14 Mar 2001 12:02:40 -0500 (EST)
Subject: Scyld boot disk & recompiling the kernel
Message-ID: <Pine.GSO.4.21.0103141201290.19331-100000@ursa.calvin.edu>

I just tested booting a slave node from my Scyld boot disk, and after
about 10 minutes of printing dots on the screen, it prints a message about
a failed RARP and reboots.  The node's MAC address appeared in the
Master's beosetup.

Also, how do you recompile the kernel in Scyld.  I have to edit the
schedular code so that instead of sending an idle signal to the CPU, it
sends a message to a custom-built PCI card (some electrical engineering
students' senior project).  I found where to add the code, but I need to
know how to compile it in without breaking bproc and stuff.

David


From newt at scyld.com  Wed Mar 14 09:48:46 2001
From: newt at scyld.com (Daniel Ridge)
Date: Wed, 14 Mar 2001 12:48:46 -0500 (EST)
Subject: Scyld boot disk & recompiling the kernel
In-Reply-To: <Pine.GSO.4.21.0103141201290.19331-100000@ursa.calvin.edu>
Message-ID: <Pine.LNX.4.21.0103141244260.19014-100000@eleanor.wdhq.scyld.com>

On Wed, 14 Mar 2001, David Vos wrote:

> I just tested booting a slave node from my Scyld boot disk, and after
> about 10 minutes of printing dots on the screen, it prints a message about
> a failed RARP and reboots.  The node's MAC address appeared in the
> Master's beosetup.

Where did it appear in 'beosetup'.

If it appears under 'unknown' you have to drag-n-drop it into the
center column. Then press the apply button.

> Also, how do you recompile the kernel in Scyld.  I have to edit the
> schedular code so that instead of sending an idle signal to the CPU, it
> sends a message to a custom-built PCI card (some electrical engineering
> students' senior project).  I found where to add the code, but I need to
> know how to compile it in without breaking bproc and stuff.

The kernel source that is in /usr/src/linux is bproc-patched.

It's mostly a compile-and-go exercise. You can make the nodes use this
new kernel by editing /etc/beowulf/config and running beoboot -2 -n
to regenerate the phase-2 boot image.

Regards,
	Scyld Computing Corporation


From siegert at sfu.ca  Wed Mar 14 10:21:10 2001
From: siegert at sfu.ca (Martin Siegert)
Date: Wed, 14 Mar 2001 10:21:10 -0800
Subject: Fortran 90 and BeoMPI
In-Reply-To: <3AB07CAA.507E39BD@me.lsu.edu>; from fmuldoo@alpha2.eng.lsu.edu on Thu, Mar 15, 2001 at 02:26:18AM -0600
References: <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com> <3AB07CAA.507E39BD@me.lsu.edu>
Message-ID: <20010314102110.A22059@stikine.ucs.sfu.ca>

On Thu, Mar 15, 2001 at 02:26:18AM -0600, Frank Muldoon wrote:
> I just got done trying linking directly to the mpi libraries using 2 F95 compilers (Lahey & NAG).  Both behave the
> same way as before (output below).  I was under the impression that it was often necessary to have separate builds
> for f90/f95 and f77.  For instance the MPICH install guide says "During configuration, a number of F90-specific
> arguments can be specified. See the output of configure -help. In particular, when using the NAG Fortran 90
> compiler, you whould specify -f90nag."
> 
> Thanks,
> Frank
> 
> 
> 
> [root at cfd1 temp]# /usr/local/NAGf95/bin/f95 -lmpif /root/temp/mpi_heat.f90
> Extension: /usr/include/mpi-beowulf/mpif.h, line 233: Byte count on numeric data type
>            detected at *@8
> Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol TIME_INTEGRATION
>          detected at END@<end-of-statement>
> Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol SUM_RES
>          detected at END@<end-of-statement>
> Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol COMM1D
>          detected at END@<end-of-statement>
> Warning: /root/temp/mpi_heat.f90, line 109: Unused symbol NID
>          detected at END@<end-of-statement>
> [f95 continuing despite warning messages]
> Deleted feature used: /root/temp/mpi_heat.f90, line 65: PAUSE statement
> Deleted feature used: /root/temp/mpi_heat.f90, line 66: PAUSE statement
> Deleted feature used: /root/temp/mpi_heat.f90, line 67: PAUSE statement
> Deleted feature used: /root/temp/mpi_heat.f90, line 68: PAUSE statement
> Deleted feature used: /root/temp/mpi_heat.f90, line 103: PAUSE statement
> mpi_heat.o: In function `main':
> mpi_heat.o(.text+0x7e): undefined reference to `mpi_init_'
> mpi_heat.o(.text+0xc7): undefined reference to `mpi_comm_size_'
> mpi_heat.o(.text+0xe8): undefined reference to `mpi_comm_rank_'
> mpi_heat.o(.text+0x486): undefined reference to `mpi_barrier_'
> mpi_heat.o(.text+0x786): undefined reference to `mpi_isend_'
> mpi_heat.o(.text+0x801): undefined reference to `mpi_isend_'
> mpi_heat.o(.text+0x886): undefined reference to `mpi_irecv_'
> mpi_heat.o(.text+0x90b): undefined reference to `mpi_irecv_'
> mpi_heat.o(.text+0xbe1): undefined reference to `mpi_wait_'
> mpi_heat.o(.text+0xc1a): undefined reference to `mpi_wait_'
> mpi_heat.o(.text+0xc53): undefined reference to `mpi_wait_'
> mpi_heat.o(.text+0xc8c): undefined reference to `mpi_wait_'
> mpi_heat.o(.text+0x19c6): undefined reference to `mpi_reduce_'
> mpi_heat.o(.text+0x1ac9): undefined reference to `mpi_finalize_'
> /usr/bin/../lib/libmpif.so: undefined reference to `getarg_'
> /usr/bin/../lib/libmpif.so: undefined reference to `f__xargc'
> collect2: ld returned 1 exit status
> [root at cfd1 temp]#
> [root at cfd1 temp]#
> [root at cfd1 temp]#
> [root at cfd1 temp]# lf95 -lmpif /root/temp/mpi_heat.f90
> Compiling file /root/temp/mpi_heat.f90.
> Compiling program unit main at line 1:
> mpi_heat.o: In function `SSN4':
> mpi_heat.o(.text+0x3d): undefined reference to `mpi_init_'
> mpi_heat.o: In function `SSN6':
> mpi_heat.o(.text+0x61): undefined reference to `mpi_comm_size_'
> mpi_heat.o: In function `SSN7':
> mpi_heat.o(.text+0x78): undefined reference to `mpi_comm_rank_'
> mpi_heat.o: In function `SSN17':
> mpi_heat.o(.text+0x28d): undefined reference to `mpi_barrier_'
> mpi_heat.o: In function `SSN22':
> mpi_heat.o(.text+0x6f5): undefined reference to `mpi_isend_'
> mpi_heat.o: In function `SSN23':
> mpi_heat.o(.text+0x738): undefined reference to `mpi_isend_'
> mpi_heat.o: In function `SSN24':
> mpi_heat.o(.text+0x76e): undefined reference to `mpi_irecv_'
> mpi_heat.o: In function `SSN25':
> mpi_heat.o(.text+0x7b1): undefined reference to `mpi_irecv_'
> mpi_heat.o: In function `SSN27':
> mpi_heat.o(.text+0x97a): undefined reference to `mpi_wait_'
> mpi_heat.o: In function `SSN28':
> mpi_heat.o(.text+0x9b0): undefined reference to `mpi_wait_'
> mpi_heat.o: In function `SSN29':
> mpi_heat.o(.text+0x9e6): undefined reference to `mpi_wait_'
> mpi_heat.o: In function `SSN30':
> mpi_heat.o(.text+0xa1c): undefined reference to `mpi_wait_'
> mpi_heat.o: In function `SSN46':
> mpi_heat.o(.text+0x11d2): undefined reference to `mpi_reduce_'
> mpi_heat.o: In function `SSN50':
> mpi_heat.o(.text+0x12ed): undefined reference to `mpi_finalize_'
> mpi_heat.o(.data+0x0): undefined reference to `mpi_finalize_'
> mpi_heat.o(.data+0x4): undefined reference to `mpi_reduce_'
> mpi_heat.o(.data+0x8): undefined reference to `mpi_wait_'
> mpi_heat.o(.data+0xc): undefined reference to `mpi_irecv_'
> mpi_heat.o(.data+0x10): undefined reference to `mpi_isend_'
> mpi_heat.o(.data+0x14): undefined reference to `mpi_barrier_'
> mpi_heat.o(.data+0x18): undefined reference to `mpi_comm_rank_'
> mpi_heat.o(.data+0x1c): undefined reference to `mpi_comm_size_'
> mpi_heat.o(.data+0x20): undefined reference to `mpi_init_'
> mpi_heat.o(.data+0x24): undefined reference to `mpi_wtime_'
> mpi_heat.o(.data+0x28): undefined reference to `mpi_wtick_'
> mpi_heat.o(.data+0x2c): undefined reference to `mpi_null_copy_fn_'
> mpi_heat.o(.data+0x30): undefined reference to `mpi_null_delete_fn_'
> mpi_heat.o(.data+0x34): undefined reference to `mpi_dup_fn_'
> /usr/bin/../lib/libmpif.so: undefined reference to `f__xargc'

The problem is g77 and libraries built to work with g77:
g77 has the "unfortunate" (to put it mildly) property to append two
underscores to a function name, if the function name already contains
an underscore. E.g., if your fortran program calls MPI_Comm_rank g77
calls mpi_comm_rank__ and looks for that reference in the MPI library.
Hence a library built to work with g77 contains mpi_comm_rank__ and not
mpi_comm_rank_. Sigh. All other compilers I have worked with so far
just append a single underscore (e.g., mpi_comm_rank_) regardless of
whether the function name already contains an underscore.
Solution? As a workaround you could call mpi_comm_rank_ from your program...
Which makes your program non portable, etc. Very ugly.
Otherwise you need a new library. I wish that all libraries that support
fortran would be built by appending a single underscore to function
names by default (and thus breaking compatability with g77). Only if
support for g77 is explicitely required should a wrapper for g77 be
included as well. Performance wise that should be irrelevant: g77 is
the slowest compiler around anyway so an additional wrapper doesn't
matter much.

Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================


From rgb at phy.duke.edu  Wed Mar 14 10:38:39 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 14 Mar 2001 13:38:39 -0500 (EST)
Subject: Dual Athlon Results Test Page
Message-ID: <Pine.LNX.4.30.0103141328520.22477-100000@ganesh.phy.duke.edu>

Dear Fellow 'Wulfvolken:

I'm starting to get results from various benchmarks on the dual Athlon.
Since there are a LOT of benchmarks at this point and a lot of detail
describing the system, I've created a website for the benchmark run(s).
I will fill it in as things complete and I have time.  If this system
interests you, please check the website out frequently this week and let
me know if any results are insane as they appear (so I have time to
repair them before I get kicked off -- or have to share:-( this coming
weekend).

I'm trying to start with ones that I think will be of greatest general
interest, but there is a bit of "easy to run quickly" or "useful to me
personally";-) mixed in there as well.  At the moment I've done stream
(probably the most requested benchmark), cpu-rate (not quite finished in
dual mode), a benchmark contributed by Thomas Guignol, and have done but
not yet recorded my Monte Carlo benchmark.

All system configuration data (that I have) is on the website, and it is
pretty complete.  Anything that I'm missing that you'd like, ask me and
I'll try to add it.

I expect to post a set of links to quite a bit of the source code I used
to run the benchmarks on the site, but this will probably need to wait
until after I finish things up.  I'm trying to package things neatly as
I go (so stream is available in a make/build/install-ready tarball and
matching (src or i386) rpm, ditto cpu-rate, ditto guignol,...).

The dual athlon test site URI is:

  http://www.phy.duke.edu/brahma/dual_athlon

Thanks,

     rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From rgb at phy.duke.edu  Wed Mar 14 11:08:26 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 14 Mar 2001 14:08:26 -0500 (EST)
Subject: Fortran 90 and BeoMPI
In-Reply-To: <20010314102110.A22059@stikine.ucs.sfu.ca>
Message-ID: <Pine.LNX.4.30.0103141405530.22477-100000@ganesh.phy.duke.edu>

On Wed, 14 Mar 2001, Martin Siegert wrote:

> The problem is g77 and libraries built to work with g77:
> g77 has the "unfortunate" (to put it mildly) property to append two
> underscores to a function name, if the function name already contains
> an underscore. E.g., if your fortran program calls MPI_Comm_rank g77
> calls mpi_comm_rank__ and looks for that reference in the MPI library.
> Hence a library built to work with g77 contains mpi_comm_rank__ and not
> mpi_comm_rank_. Sigh. All other compilers I have worked with so far
> just append a single underscore (e.g., mpi_comm_rank_) regardless of
> whether the function name already contains an underscore.
> Solution? As a workaround you could call mpi_comm_rank_ from your program...
> Which makes your program non portable, etc. Very ugly.
> Otherwise you need a new library. I wish that all libraries that support
> fortran would be built by appending a single underscore to function
> names by default (and thus breaking compatability with g77). Only if
> support for g77 is explicitely required should a wrapper for g77 be
> included as well. Performance wise that should be irrelevant: g77 is
> the slowest compiler around anyway so an additional wrapper doesn't
> matter much.

Thanks!  This explains something that puzzled the hell out of me when I
sought (unsuccessfully) to integrate a single timer object module with
both the C and Fortran version of stream on the dual athlon tests.
my_second() wasn't such a great name, then.

I just don't use Fortran (if I can possibly help it, and I nearly
always can:-) these days...

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From RSchilling at affiliatedhealth.org  Wed Mar 14 12:26:23 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Wed, 14 Mar 2001 12:26:23 -0800
Subject: pvm, rsh on FreeBSD and environment variables
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1116@mail1.affiliatedhealth.org>

Using PVM on FreeBSD, and would like to know if anyone is doing the same.
Curious to know how you get rsh to pick up PVM_* environment variables on
the slave hosts.

I know ssh is popular to use, but curious about working with rsh this
option, and if there is a way to get rsh to see the PVM_* environment
variables at all

Richard Schilling
Web Integration Programmer/Webmaster
phone: 360.856.7129
fax: 360.856.7166
URL: http://www.affiliatedhealth.org

Affiliated Health Services
Information Systems
1971 Highway 20
Mount Vernon, WA USA


From sshealy at asgnet.psc.sc.edu  Wed Mar 14 12:24:35 2001
From: sshealy at asgnet.psc.sc.edu (Scott Shealy)
Date: Wed, 14 Mar 2001 15:24:35 -0500
Subject: DHCP - Channel Bonding?
Message-ID: <5773B442597BD2118B9800105A1901EE1B4DAB@asgnet2>

Anyone know if you can use channel bonding with DHCP....

Thanks,
Scott Shealy


From dcs at iastate.edu  Wed Mar 14 12:39:02 2001
From: dcs at iastate.edu (Dan Smith)
Date: Wed, 14 Mar 2001 14:39:02 CST
Subject: Bproc or BeoMPI
Message-ID: <200103142039.OAA27010@pv74b1.vincent.iastate.edu>

I wasn't sure where to send this question as it might be a Bproc thing or
maybe just an MPI thing, so I apologize to those who will see it twice.

I am playing with the MPI_Gather procedure under BeoMPI and I am getting
error messages that look like the following: 

p1_25945:  p4_error: interrupt SIGSEGV: 11
p0_25943:  p4_error: interrupt SIGSEGV: 11

This is just using two processors.  I get as many error messages as
processors running the program.  The program still does pretty much what it
is supposed to do, though.  Are these BProc errors or MPI errors and where
can I find info on these error codes?  

A couple of notes:  

1) This only occurs when the value being passed to the root process is
   (1+i)*2, where i is the process number (i = 0..#_of_processors-1).  If I
   use (1+i)*3 or something other than multiplying by 2, then I do not get the
   errors.  

2) Process number 6 also fails to pass the correct value to the root process
   every time, even when I don't get the error messages. It always passes
   the number 7.

3) I shut down each slave node one at a time and ran the program each time. 
   I still the error messgaes and the wrong value passed by process 6.

This is really baffling.

Any help is greatly appreciated. 

Dan

---
Daniel C. Smith                  |    Iowa State University
Graduate Assistant               |    Department of Physics and Astronomy
dcs at iastate.edu                  |    Ames, IA 50011


From newt at scyld.com  Wed Mar 14 13:00:43 2001
From: newt at scyld.com (Daniel Ridge)
Date: Wed, 14 Mar 2001 16:00:43 -0500 (EST)
Subject: [bproc]Bproc or BeoMPI
In-Reply-To: <200103142039.OAA27010@pv74b1.vincent.iastate.edu>
Message-ID: <Pine.LNX.4.21.0103141559350.6767-100000@eleanor.wdhq.scyld.com>

> I wasn't sure where to send this question as it might be a Bproc thing or
> maybe just an MPI thing, so I apologize to those who will see it twice.
> 
> I am playing with the MPI_Gather procedure under BeoMPI and I am getting
> error messages that look like the following: 
> 
> p1_25945:  p4_error: interrupt SIGSEGV: 11
> p0_25943:  p4_error: interrupt SIGSEGV: 11

For these purposes -- BeoMPI is MPICH.

I'm willing to spin a new BeoMPI against a later MPICH if there is a
problem in MPICH.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From natorro at fenix.ifisicacu.unam.mx  Wed Mar 14 13:14:20 2001
From: natorro at fenix.ifisicacu.unam.mx (Carlos Lopez)
Date: Wed, 14 Mar 2001 15:14:20 -0600
Subject: Installing Scyld on an Alpha cluster
References: <200103142039.OAA27010@pv74b1.vincent.iastate.edu>
Message-ID: <3AAFDF2C.B68EE8A3@fenix.ifisicacu.unam.mx>

Hi, I'm pretty new to the list, we, at the Physics Institute at UNAM
are trying to install Scyld on a Alpa cluster, so I was wondering if
anyone have had the experience, if so can you give me some tips before
I start trying it. Thanks a lot inadvance.

natorro


From Dean.Carpenter at pharma.com  Wed Mar 14 13:10:12 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Wed, 14 Mar 2001 16:10:12 -0500
Subject: DHCP - Channel Bonding?
Message-ID: <759FC8B57540D311B14E00902727A0C002EC4793@a1mbx01.pharma.com>

I was wondering the same thing, or rather a similar thing.  We're going to
be testing some compute nodes that have dual 10/100 NICs onboard.  It would
be nice to be able to use both in a bonded setup via the standard Scyld
beoboot method.

I would assume that the stage 1 boot would use just one nic to start up, but
the final stage 3 one would enslave the two eth0 and eth1 once they're up ?

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


-----Original Message-----
From: Scott Shealy [mailto:sshealy at asgnet.psc.sc.edu]
Sent: Wednesday, March 14, 2001 3:25 PM
To: 'beowulf at beowulf.org'
Subject: DHCP - Channel Bonding?


Anyone know if you can use channel bonding with DHCP....

Thanks,
Scott Shealy


From Alan.Holbrook at compaq.com  Wed Mar 14 13:09:52 2001
From: Alan.Holbrook at compaq.com (Holbrook, Alan)
Date: Wed, 14 Mar 2001 16:09:52 -0500
Subject: Installing Scyld on an Alpha cluster
Message-ID: <C75AC7811FC27342B59E6387A4E66E9D29CCD3@tayexc18.americas.cpqcorp.net>

Carlos,

We have had Scyld running on Alpha Beowulfs we produce out of my division at
Compaq.  If you'd care to contact me, I'm sure we can help with advice.

Regards,
Alan Holbrook
Compaq Computer Corporation
Product Manager, 
     Linux Beowulf Clusters
     High Performance Interconnects
CustomSYSTEMS Division
> * Voice: 603.884.2078
> * FAX:   603.884.0622
> * alan.holbrook at compaq.com 
> 
> 


-----Original Message-----
From: Carlos Lopez [mailto:natorro at fenix.ifisicacu.unam.mx]
Sent: Wednesday, March 14, 2001 4:14 PM
Cc: beowulf at beowulf.org
Subject: Installing Scyld on an Alpha cluster


Hi, I'm pretty new to the list, we, at the Physics Institute at UNAM
are trying to install Scyld on a Alpa cluster, so I was wondering if
anyone have had the experience, if so can you give me some tips before
I start trying it. Thanks a lot inadvance.

natorro

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From mlucas at imagelinks.com  Wed Mar 14 15:00:19 2001
From: mlucas at imagelinks.com (Mark Lucas)
Date: Wed, 14 Mar 2001 18:00:19 -0500
Subject: Huinalu Linux SuperCluster
Message-ID: <p04320406b6d5a80d717e@[208.24.120.15]>

Just came across this:

Huinalu is a 520-processor IBM Netfinity Linux Supercluster. It 
consists of 260 nodes, each housing two Pentium III 933 megahertz 
processors. Their combined theoretical peak performance is a 
staggering 478 billion floating point operations per second 
(gigaflops). It is, at the present time, the world's most powerful 
Linux Supercluster.

at http://www.mhpcc.edu/doc/huinalu/huinalu-intro.html

Does anyone have any specifics on the hardware cost of this system? 
Is IBM selling configured Beowulf clusters?

Thanks in advance.

Mark
-- 
**********************
Mark R Lucas
Chief Technical Officer
ImageLinks Inc.
4450 W Eau Gallie Blvd
Suite 164
Melbourne Fl 32934

321 253 0011 (work)
321 253 5559 (fax)

mlucas at imagelinks.com
**********************


From edwards at icantbelieveimdoingthis.com  Wed Mar 14 15:44:29 2001
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Wed, 14 Mar 2001 16:44:29 -0700
Subject: MPI chokes
Message-ID: <20010314164429.A32392@icantbelieveimdoingthis.com>

I've installed Scyld on a small cluster and I'm trying to
run the test programs that come with beompi

The codes run on one node. However, when I try to run
on multiple nodes I get the following error

jarrett/home/edwardsa>mpirun -np 2 pi3p
p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
    p4_error: latest msg from perror: Invalid argument
jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2

I have asked about this in a previous message, so here
are two more specific questions.

The master node has a hostname that is not node0. The first
slave node is, as far as beosetup, is node0. Is this a problem?

When beompi assigns nodes does it look at a machines file?
Should I install a HOSTNAME file on each slave?

Art Edwards


From mettke at lucent.com  Wed Mar 14 16:58:28 2001
From: mettke at lucent.com (Mike Mettke)
Date: Wed, 14 Mar 2001 19:58:28 -0500
Subject: Intel 420T has been cancelled
References: <Pine.LNX.4.30.0103141328520.22477-100000@ganesh.phy.duke.edu>
Message-ID: <3AB013B4.114AE6FF@lucent.com>

Everybody,

the Intel 420T switch (48 ports, 2 GBIC, 20Gbps switch fabric, 20 microseconds latency, $1500) has been cancelled
by Intel.
Well, the price was too good anyway ....


regards,
Mike


From rbbrigh at valeria.mp.sandia.gov  Wed Mar 14 17:50:28 2001
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Wed, 14 Mar 2001 18:50:28 -0700 (MST)
Subject: Huinalu Linux SuperCluster
In-Reply-To: <p04320406b6d5a80d717e@[208.24.120.15]> from "Mark Lucas"
 at Mar 14, 2001 06:00:19 PM
Message-ID: <200103150151.SAA31516@dogbert.mp.sandia.gov>

> 
> Huinalu is a 520-processor IBM Netfinity Linux Supercluster. It 
> consists of 260 nodes, each housing two Pentium III 933 megahertz 
> processors. Their combined theoretical peak performance is a 
> staggering 478 billion floating point operations per second 
> (gigaflops). It is, at the present time, the world's most powerful 
> Linux Supercluster.
> 
> at http://www.mhpcc.edu/doc/huinalu/huinalu-intro.html
> 

Actually, no it's not -- at least not for a cluster intended to support
parallel apps. The Siberia Cplant cluster at Sandia that is currently
#82 on the top 500 list has a peak theoretical perfomance of 580 GFLOPS.
It has demonstrated (with the MPLinpack benchmark) 247.6 GFLOPS.  The latest
Cplant cluster, called Antarctica, has 1024+ 466 MHz Alpha nodes, with a
peak theoretical performance of more than 954 GFLOPS.

Keep in mind that peak theoretical performance accurately measures your ability
to spend money, while MPLinpack performance accurately measures your ability
to seek pr -- I mean it measures the upper bound on compute performance from
a parallel app.

-Ron


From joelja at darkwing.uoregon.edu  Wed Mar 14 18:49:10 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed, 14 Mar 2001 18:49:10 -0800 (PST)
Subject: Dual Athlon
In-Reply-To: <Pine.GSO.4.21.0103131916010.103-100000@ursa.calvin.edu>
Message-ID: <Pine.LNX.4.30.0103141843150.26334-100000@twin.uoregon.edu>

I had that issue between ssh-2.0.13 and openssh-2.5.x. One thing you might
try (which worked for me) was forcing ssh2 compatibility with openssh by
adding the -2 flag, to your openssh client string... I haven't done a
tcpdump of the negotiation to see what it was doing yet, but I did note
that it wasn't doing it with openssh 2.3 versions.

joelja

On Tue, 13 Mar 2001, David Vos wrote:

> I've had combatibility problems between OpenSSH and ssh.com's
> implementation.  I had two linux boxen that could telnet back and forth,
> but could not ssh.  I put ssh.com's on both and the problem went away.
>
> David
>
> On Tue, 13 Mar 2001, Robert G. Brown wrote:
>
> > On Tue, 13 Mar 2001, Mofeed Shahin wrote:
> >
> > > So Robert, when are you going to let us know the results of the Dual Athlon ?
> > > :-)
> > >
> > > Mof.
> >
> > They got my account setup yesterday, but for some reason I'm having a
> > hard time connecting via ssh (it's rejecting my password).  We've tried
> > both a password they sent me and an MD5 crypt I sent them. Very strange
> > -- I use OpenSSH routinely to connect all over the place so I'm
> > reasonably sure my client is OK.  Anyway, I expect it is something
> > trivial and that I'll get in sometime this morning.  I spent the time
> > yesterday that I couldn't get in profitably anyway packaging stream and
> > a benchmark sent to me by Thomas Guignol of the list up into make-ready
> > tarball/RPM's.  At the moment my list looks something like:
> >
> > stream
> > guignol
> > cpu-rate
> > lmbench (ass'td)
> > LAM/MPI plus two benchmarks (Josip and Doug each suggested one)
> > EPCC OpenMP microbenchmarks (probably with PGI)
> > possibly some fft timings (Martin Seigert)
> >
> > in roughly that order, depending on how much time I get and how well
> > things go.  I'm going to TRY to build a page with all the tests I used
> > in tarball/rpm form, results, and commentary.
> >
> >    rgb
> >
> > --
> > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli				       joelja at darkwing.uoregon.edu
Academic User Services			     consult at gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From hendriks at hendriks.cx  Wed Mar 14 21:16:06 2001
From: hendriks at hendriks.cx (Erik Arjan Hendriks)
Date: Thu, 15 Mar 2001 00:16:06 -0500
Subject: [bproc]MPI chokes
In-Reply-To: <20010314164429.A32392@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Wed, Mar 14, 2001 at 04:44:29PM -0700
References: <20010314164429.A32392@icantbelieveimdoingthis.com>
Message-ID: <20010315001606.A1939@hendriks.cx>

On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
> I've installed Scyld on a small cluster and I'm trying to
> run the test programs that come with beompi
> 
> The codes run on one node. However, when I try to run
> on multiple nodes I get the following error
> 
> jarrett/home/edwardsa>mpirun -np 2 pi3p
> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
>     p4_error: latest msg from perror: Invalid argument
> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
> 
> I have asked about this in a previous message, so here
> are two more specific questions.
> 
> The master node has a hostname that is not node0. The first
> slave node is, as far as beosetup, is node0. Is this a problem?

In BProc's terms, the nodes are numbered 0 through n-1.  The front end
is node -1.
 
> When beompi assigns nodes does it look at a machines file?
> Should I install a HOSTNAME file on each slave?

BProc doesn't use any host names anywhere so nothing involving
hostnames will affect whether or an rfork works.

There's some other MPI issue going on here.

- Erik


From bcomisky at pobox.com  Wed Mar 14 23:00:02 2001
From: bcomisky at pobox.com (Bill Comisky)
Date: Wed, 14 Mar 2001 23:00:02 -0800 (PST)
Subject: dual PIII 133MHz FSB motherboards
Message-ID: <Pine.LNX.4.30.0103140947050.29906-100000@twain>

I'm currently looking into dual PIII 133MHz FSB motherboards for a
diskless cluster connected by fast ethernet.  I've narrowed the field a
bit, and am looking for testimonials on any of the motherboards listed
below.  Any comments on Linux compatibility, stability, and performance
will be greatly appreciated.

Also, if anyone has rack mounted any of these, what rackmount case height
did they fit in?  Realize that for those boards without integrated NIC a
card must be added (how much height will this add?).  All boards listed
are for socket 370.  The prices listed are pricewatch.com low end.

ServerWorks ServerSet III LE Chipset:
-------------------------------------

Tyan Tiger LE (S2515): $440 :  2 Intel 82559 NICs, video, angled DIMM
slots, IDE raid (not needed)

Tyan Thunder LE (S2510NG): $380 :  2 Intel 82559 NICs, video.  Available
with integrated SCSI (S2510U3NG for $500)

SuperMicro 370DLE: $280 : 1 Intel 82559 NIC


VIA Apollo Pro133A Chipset:
---------------------------

Tyan 2507D (Tiger 230): $120 :  no NIC or video

Tyan 2505 (Tiger 200): $270 : 2 Intel 82559 integrated NICs, integrated
video

Abit VP6 : $150 : no NIC or video


thanks!
Bill

--
Bill Comisky
bcomisky at pobox.com


From yoon at bh.kyungpook.ac.kr  Thu Mar 15 01:07:37 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Thu, 15 Mar 2001 18:07:37 +0900
Subject: Will you add my benchmark for your cluster.top500.org database ?
Message-ID: <001201c0ad2f$6568a980$5f72f2cb@TEST>

I am very happy to see the topcluster site in the top500.org. 

At those time (Feb 2000), I sent my idea to you. but no response at that time.

As you know well, 
I suggested the so-called "New Suggestion for so called http://www.beowulf-top500.org to the mailing list in the beowulf 
group Feb 1st 2000. 

you can refer it,

http://www.beowulf.org/pipermail/beowulf/2000-February/008221.html
http://www.beowulf.org/pipermail/beowulf/2000-February/008253.html

and  www.topcluster.org (? which was linked to the www.beowulf.org ) was born. 

So I submited my 2 PC(486 & 586) Korea Beowulf  information which was made in 1998 to www.topcluster.org

and my "Korea Beowulf" information was linked in the  www.topcluster.org  until it moved to your http://clusters.top500.org   site.

but In your site, I can't find my Korea Beowulf information.

Now, My Suggestion is very simple, Will you add my "korea Beowulf" information (2 PC beowulf made in my home) 

which you omit it to link in your site ?

You may think it is very humble, but for me it was very big happiness to think about my first beowulf made in home.
 
Will you relay above message to the below the professors ?
Hans Meuer (Mannheim, Germany),
Erich Strohmaier (Berkeley, USA),
Jack Dongarra (Knoxville, USA), 
Horst Simon (Berkeley, USA) 

Thank you very much

---------------------------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon at bh.kyungpook.ac.kr
jhyoon at mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
 
Imagination is more important than knowledge.  A. Einstein
----------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010315/4a5fc843/attachment.html>

From johannes.grohn at sonera.com  Thu Mar 15 04:48:05 2001
From: johannes.grohn at sonera.com (Johannes =?ISO-8859-1?Q?Gr=F6hn?=)
Date: Thu, 15 Mar 2001 12:48:05 GMT
Subject: changing IP with scyld
Message-ID: <20010315.12480500@grohnjo1.tkk.tele.fi>

Hello,
I'm working on a small cluster running Scyld.  Recently I changed the IP 
addresses range of the nodes and also the ip of the master in 
/etc/beowulf/config.  After rebooting, the nodes appear to be Up but 
Unavailable in BeoStatus.  At this point I can run mpi_mandel as root.  
However, if I make the nodes available with BeoSetup, I can no longer run 
any mpi programs.

I get the following error:
[root at grendel /root]# NP=8 mpi_mandel
p0_10360:  p4_error: net_create_slave: bproc_rfork: -1
    p4_error: latest msg from perror: Input/output error
bm_list_10363:  p4_error: interrupt SIGINT: 2
[root at grendel /root]#

Are there other files that I need to update with the new IP information?

Thanks for your time,
Johannes Gr?hn

-- 
__________________________________________________________
| Sonera Corporation          |  Elim?enkatu 15 3krs      |
| M&M Research                |  00051, Sonera, Finland   |
| Johannes Gr?hn              |  Mobile: +358 40 716 0899 |
| johannes.grohn at sonera.com   |  Office: +358 20 406 2359 |
|_____________________________|___________________________|


From rauch at inf.ethz.ch  Thu Mar 15 05:34:22 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Thu, 15 Mar 2001 14:34:22 +0100 (CET)
Subject: Mysterious kernel hangs
Message-ID: <Pine.LNX.4.21.0103151424130.24291-100000@maloney.inf.ethz.ch>

We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
nodes, but machines mysteriously freeze :-(

The nodes have STL2 boards (Version A28808-301), onboard adaptec SCSI
controllers (7899P), onboard intel Fast Ethernet adapters (82557
[Ethernet Pro 100]) and additional Packet Engines Hamachi GNIC-II
Gigabit Ethernet cards.

We tried kernels 2.2.x, 2.4.1 and now even 2.4.2-ac20, but it seems to
be the same problem with all kernels: When we run experiments which
use the network intensively, any of the machines will just freeze
after a few hours. The frozen machine does not respond to anything and
up to now we were not able to see any log-entries related to the
freeze on virtual console 10 :-(   We switched now on all the "Kernel
Hacking" stuff in the kernel configuration (especially the logging)
and we will try again, hopefuly we will at least see some log outputs.

The freezes do also happen if we let non-network-intensive jobs run on
the machines (e.g. SETI at home), but clearly they happen less often.

Does anyone of you have any ideas what could go wrong or what we could
try to find the cause of the problems?

Regards,
Felix
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From newt at scyld.com  Thu Mar 15 05:42:49 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 15 Mar 2001 08:42:49 -0500 (EST)
Subject: changing IP with scyld
In-Reply-To: <20010315.12480500@grohnjo1.tkk.tele.fi>
Message-ID: <Pine.LNX.4.21.0103150837250.6767-100000@eleanor.wdhq.scyld.com>

On Thu, 15 Mar 2001, Johannes [ISO-8859-1] Gr?hn wrote:

> Hello,
> I'm working on a small cluster running Scyld.  Recently I changed the IP 
> addresses range of the nodes and also the ip of the master in 
> /etc/beowulf/config.  After rebooting, the nodes appear to be Up but 
> Unavailable in BeoStatus.  At this point I can run mpi_mandel as root.  
> However, if I make the nodes available with BeoSetup, I can no longer run 
> any mpi programs.

What version of Scyld are you using? In the preview release, nodes that
had a boot error would show up as 'unavailable'. In the current release,
nodes that had a boot error would show up as 'error'.

There is a per-node boot log that often contains the answer to 'why'
questions. These files live in /var/log/beowulf/node.<num> and can also
be viewed directly through 'beosetup' if you are using the 27BZ-7 or later
version of Scyld Beowulf.

Nodes are always available through BProc during the unavailable and error
phases -- we use the BProc mechanism for configuring a node at boot time.
However, the slave filesystem may be in a transitioning state that
prevents applications from running except as root.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From rgb at phy.duke.edu  Thu Mar 15 06:39:17 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 15 Mar 2001 09:39:17 -0500 (EST)
Subject: Mysterious kernel hangs
In-Reply-To: <Pine.LNX.4.21.0103151424130.24291-100000@maloney.inf.ethz.ch>
Message-ID: <Pine.LNX.4.30.0103150908040.24613-100000@ganesh.phy.duke.edu>

On Thu, 15 Mar 2001, Felix Rauch wrote:

> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
>
> The nodes have STL2 boards (Version A28808-301), onboard adaptec SCSI
> controllers (7899P), onboard intel Fast Ethernet adapters (82557
> [Ethernet Pro 100]) and additional Packet Engines Hamachi GNIC-II
> Gigabit Ethernet cards.
>
> We tried kernels 2.2.x, 2.4.1 and now even 2.4.2-ac20, but it seems to
> be the same problem with all kernels: When we run experiments which
> use the network intensively, any of the machines will just freeze
> after a few hours. The frozen machine does not respond to anything and
> up to now we were not able to see any log-entries related to the
> freeze on virtual console 10 :-(   We switched now on all the "Kernel
> Hacking" stuff in the kernel configuration (especially the logging)
> and we will try again, hopefuly we will at least see some log outputs.
>
> The freezes do also happen if we let non-network-intensive jobs run on
> the machines (e.g. SETI at home), but clearly they happen less often.
>
> Does anyone of you have any ideas what could go wrong or what we could
> try to find the cause of the problems?.

Dear Felix,

If this is happening on all 16 nodes, it sounds very, very much like a
kernel deadlock, although problems with the specific motherboard/chipset
cannot be ruled out.  Can't help you with the motherboard if that turns
out to be a problem, but you might check with the kernel list to see if
there are known problems. To debug the possibility of a bad device
driver or SMP deadlock, try the following:

  a) Boot half the boxen with UP kernels.  See if the freezes still
occur on the UP boxen.  If they don't, you almost certainly have a
deadlock problem within drivers in the SMP kernel(S).  Join (at least
temporarily) the linux SMP kernel list and seek help there and on the
relevant driver list(s).

  b) Since the problem occurs across kernel revisions (and since the
kernels are generally SMP-stable) it is almost certainly in a driver,
whether or not the UP-kernel systems lock up.  If it isn't in the
motherboard.

  c) Of the devices you are running, I'd suspect the onboard adaptec or
the Gigabit card; although Don can probably offer a more informed
opinion on the eepro I have the general impression that it is pretty
stable (it certainly works fine for us in many boxes).  Legend has it
that the aic7xxx driver is in a state of upheaval currently as the
entire scsi stack is being rebuilt and fixed in the 2.4.x series -- I
cannot even get the aic7xxx module to load, for example, on a dual PIII
that I've been trying to install with RH 7.1beta/wolverine.  I don't
know if this would affect the 2.2.x kernels, though.

However, I don't know for sure what devices the aic7xxx supports well
these days because I finally exited the aic7xxx list because current
UDMA controllers and big, fast drives are obsoleting SCSI for all but
the most demanding server applications -- all I have are a few legacy
Adaptec controllers to support and (knock on wood) they work fine in the
2.2.16 kernels.  I do remember that Doug Ledford added some very handy
debugging features to the driver module to help debug serious (and very
similar) problems I encountered with e.g. the onboard 7890 in our Dell
Poweredge 2300's two years ago when the device was first released --
turn these on and see if they help at all.

Can't help you at all with the Packet Engines driver.

  d) To identify and repair the "problem child", all I can suggest is
the usual trick of removing components one at a time until the systems
(hopefully) magically stabilize.  Then either replace the component
(which may be the cheapest solution even if you have to throw the bad
components away - time is expensive and replacement is fast and easy) or
(more responsibly) join the relevant device list or kernel list and
communicate with the device/kernel maintainer(s).

Remember that IDE drives are cheap, fast, and work just fine for most
local disk needs on a node, so just disabling all your adaptec
controllers (if that turns out to be the problem) and putting IDE drives
in would cost you maybe $1.5-2K but could save you days or even weeks of
systems programming effort screwing around with the onboard controllers.
You can always be a good citizen with one box as a holdout and help Doug
Ledford fix the driver while using all the rest.  In fact in the short
run you could likely/maybe run diskless with 15 nodes (if that
stabilized the systems) and help work on the driver.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From edwards at icantbelieveimdoingthis.com  Thu Mar 15 07:46:55 2001
From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834)
Date: Thu, 15 Mar 2001 08:46:55 -0700
Subject: [bproc]MPI chokes
References: <20010314164429.A32392@icantbelieveimdoingthis.com> <20010315001606.A1939@hendriks.cx>
Message-ID: <3AB0E3EF.6010601@icantbelieveimdoingthis.com>

Erik Arjan Hendriks wrote:

> On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
> 
>> I've installed Scyld on a small cluster and I'm trying to
>> run the test programs that come with beompi
>> 
>> The codes run on one node. However, when I try to run
>> on multiple nodes I get the following error
>> 
>> jarrett/home/edwardsa>mpirun -np 2 pi3p
>> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
>>     p4_error: latest msg from perror: Invalid argument
>> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
>> 
>> I have asked about this in a previous message, so here
>> are two more specific questions.
>> 
>> The master node has a hostname that is not node0. The first
>> slave node is, as far as beosetup, is node0. Is this a problem?
> 
> In BProc's terms, the nodes are numbered 0 through n-1.  The front end
> is node -1.
>  
> 
>> When beompi assigns nodes does it look at a machines file?
>> Should I install a HOSTNAME file on each slave?
> 
> BProc doesn't use any host names anywhere so nothing involving
> hostnames will affect whether or an rfork works.
> 
> There's some other MPI issue going on here.
> 
> - Erik
> 
> 
> 

Thanks for the reply. The program dies in the PMPI_INIT phase. What 
should I be doing to figure this out?

Art Edwards


From agrajag at linuxpower.org  Thu Mar 15 07:44:48 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 15 Mar 2001 07:44:48 -0800
Subject: [bproc]MPI chokes
In-Reply-To: <3AB0E3EF.6010601@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Thu, Mar 15, 2001 at 08:46:55AM -0700
References: <20010314164429.A32392@icantbelieveimdoingthis.com> <20010315001606.A1939@hendriks.cx> <3AB0E3EF.6010601@icantbelieveimdoingthis.com>
Message-ID: <20010315074448.W7935@kotako.analogself.com>

On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:

> Erik Arjan Hendriks wrote:
> 
> > On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
> > 
> >> I've installed Scyld on a small cluster and I'm trying to
> >> run the test programs that come with beompi
> >> 
> >> The codes run on one node. However, when I try to run
> >> on multiple nodes I get the following error
> >> 
> >> jarrett/home/edwardsa>mpirun -np 2 pi3p
> >> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
> >>     p4_error: latest msg from perror: Invalid argument
> >> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
> >> 

<snip>

> > 
> > BProc doesn't use any host names anywhere so nothing involving
> > hostnames will affect whether or an rfork works.
> > 
> > There's some other MPI issue going on here.
> > 
> > - Erik
> > 
> 
> Thanks for the reply. The program dies in the PMPI_INIT phase. What 
> should I be doing to figure this out?

Based on the error messages from your previous message, it looks like it
is trying to rfork to a node that is down.  What does the output of
'bpstat' on your cluster look like?


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010315/11649afa/attachment.sig>

From edwards at icantbelieveimdoingthis.com  Thu Mar 15 08:15:48 2001
From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834)
Date: Thu, 15 Mar 2001 09:15:48 -0700
Subject: [bproc]MPI chokes
References: <20010314164429.A32392@icantbelieveimdoingthis.com> <20010315001606.A1939@hendriks.cx> <3AB0E3EF.6010601@icantbelieveimdoingthis.com> <20010315074448.W7935@kotako.analogself.com>
Message-ID: <3AB0EAB4.8080502@icantbelieveimdoingthis.com>

Jag wrote:

> On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:
> 
> 
>> Erik Arjan Hendriks wrote:
>> 
>> 
>>> On Wed, Mar 14, 2001 at 04:44:29PM -0700, Art Edwards wrote:
>>> 
>>> 
>>>> I've installed Scyld on a small cluster and I'm trying to
>>>> run the test programs that come with beompi
>>>> 
>>>> The codes run on one node. However, when I try to run
>>>> on multiple nodes I get the following error
>>>> 
>>>> jarrett/home/edwardsa>mpirun -np 2 pi3p
>>>> p0_28682:  p4_error: net_create_slave: bproc_rfork: -1
>>>>     p4_error: latest msg from perror: Invalid argument
>>>> jarrett/home/edwardsa>bm_list_28683:  p4_error: interrupt SIGINT: 2
>>>> 
>>> 
> <snip>
> 
>>> BProc doesn't use any host names anywhere so nothing involving
>>> hostnames will affect whether or an rfork works.
>>> 
>>> There's some other MPI issue going on here.
>>> 
>>> - Erik
>>> 
>> 
>> Thanks for the reply. The program dies in the PMPI_INIT phase. What 
>> should I be doing to figure this out?
> 
> Based on the error messages from your previous message, it looks like it
> is trying to rfork to a node that is down.  What does the output of
> 'bpstat' on your cluster look like?
> 
> 
> Jag

Here is the output from bpstat

jarrett/home/edwardsa>bpstat
Node    Address         Status
0       192.168.1.100   up
1       192.168.1.101   up
2       192.168.1.102   up
3       192.168.1.103   up
4       192.168.1.104   up
5       192.168.1.105   up
6       192.168.1.106   up
7       192.168.1.107   down
8       192.168.1.108   down
9       192.168.1.109   down
10      192.168.1.110   down
11      192.168.1.111   down
12      192.168.1.112   down
13      192.168.1.113   down
14      192.168.1.114   down
15      192.168.1.115   down
16      192.168.1.116   down
17      192.168.1.117   down
18      192.168.1.118   down
19      192.168.1.119   down
20      192.168.1.120   down
21      192.168.1.121   down
22      192.168.1.122   down
23      192.168.1.123   down
24      192.168.1.124   down
25      192.168.1.125   down
26      192.168.1.126   down
27      192.168.1.127   down
28      192.168.1.128   down
29      192.168.1.129   down
30      192.168.1.130   down
31      192.168.1.131   down


Art Edwards


From agrajag at linuxpower.org  Thu Mar 15 08:10:41 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 15 Mar 2001 08:10:41 -0800
Subject: [bproc]MPI chokes
In-Reply-To: <3AB0EAB4.8080502@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Thu, Mar 15, 2001 at 09:15:48AM -0700
References: <20010314164429.A32392@icantbelieveimdoingthis.com> <20010315001606.A1939@hendriks.cx> <3AB0E3EF.6010601@icantbelieveimdoingthis.com> <20010315074448.W7935@kotako.analogself.com> <3AB0EAB4.8080502@icantbelieveimdoingthis.com>
Message-ID: <20010315081041.X7935@kotako.analogself.com>

On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:

> > Based on the error messages from your previous message, it looks like it
> > is trying to rfork to a node that is down.  What does the output of
> > 'bpstat' on your cluster look like?
> > 
> > 
> > Jag
> 
> Here is the output from bpstat
> 
> jarrett/home/edwardsa>bpstat
> Node    Address         Status
> 0       192.168.1.100   up
> 1       192.168.1.101   up
> 2       192.168.1.102   up
> 3       192.168.1.103   up
> 4       192.168.1.104   up
> 5       192.168.1.105   up
> 6       192.168.1.106   up
> 7       192.168.1.107   down
> 8       192.168.1.108   down
> 9       192.168.1.109   down

<snip>

Ok.. You seem to be running Scyld's PREVIEW release (27BZ-6).  At the
end of January, Scyld had an actual release (27BZ-7).  The 27BZ-7
release included updated software, including updates for the beompi,
which is Scyld's MPI package.

I never tried to run MPI programs on the preview release, but my guess
is that it is getting confused by all the "down" nodes.  I've played
with MPI on the 27BZ-7 release and have had no problems when there were
down nodes.  So, I would recommend to you that you upgrade to the latest
release.

Also, the reason you have so many "down" nodes is that you gave it a
large IP range to use for slave nodes.  If you want there to be not as
many "down" nodes (that are really nodes that just don't exist), you
should use the beosetup program, click on preferences, and adjust the IP
range so that there are as many IPs as there are slave nodes.

Hope this helps,


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010315/7a148393/attachment.sig>

From rbbrigh at valeria.mp.sandia.gov  Thu Mar 15 11:12:05 2001
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Thu, 15 Mar 2001 12:12:05 -0700 (MST)
Subject: Huinalu Linux SuperCluster
In-Reply-To: <3AB03330.BBED66A2@myri.com> from "Patrick Geoffray" at Mar
 14, 2001 10:12:48 PM
Message-ID: <200103151913.MAA32264@dogbert.mp.sandia.gov>

> 
> > Actually, no it's not -- at least not for a cluster intended to support
> > parallel apps. The Siberia Cplant cluster at Sandia that is currently
> > #82 on the top 500 list has a peak theoretical perfomance of 580 GFLOPS.
> > It has demonstrated (with the MPLinpack benchmark) 247.6 GFLOPS.  The latest
> > Cplant cluster, called Antarctica, has 1024+ 466 MHz Alpha nodes, with a
> > peak theoretical performance of more than 954 GFLOPS.
> 
> The last NCSA Linux cluster (Urbana-Champaign, IL) provides 512
> dual PIII 1GHz, so a theoritical peak of 1 TFLOPS :
> http://access.ncsa.uiuc.edu/Headlines/01Headlines/010116.IBM.html

I didn't think that machine had been deployed yet, since the above press
release says it will be installed in the Summer.  I restricted the Antarctica
number to what we currently have up and running as a parallel machine.
There are another 400+ 466 MHz Alphas sitting next those 1024 nodes that
will be integrated in the next few weeks.  And thoeretical peak performance
of a theoretical machine accurately measures your ability to do math...

> 
> > Keep in mind that peak theoretical performance accurately measures your ability
> > to spend money, while MPLinpack performance accurately measures your ability
> > to seek pr -- I mean it measures the upper bound on compute performance from
> > a parallel app.
> 
> Very true (actually, it measures the upper bound on compute
> performance of a dense linear algebra double precision
> computation, which indeed covers a large set of // apps. There is
> a lot of other codes that do not behave like LU, specially for the
> ratio computation/communication).

Yes.  This was an attempt at humor rather than an exact characterization.
(Do // apps scale worse than || apps? :)

> 
> I don't know MPLinpack. Don't you mean HPLinpack ?

Sorry, this may be Sandia terminology -- massively parallel linpack.  I was
speaking of the benchmark and not the acutal code.  The measuements I quoted
were using a Sandia-developed version of the solver, but we have been using
the HPLinpack code from UTK since it was released.

-Ron


From hahn at coffee.psychology.mcmaster.ca  Thu Mar 15 13:35:25 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Thu, 15 Mar 2001 16:35:25 -0500 (EST)
Subject: Mysterious kernel hangs
In-Reply-To: <Pine.LNX.4.30.0103150908040.24613-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.LNX.4.10.10103151632070.30347-100000@coffee.psychology.mcmaster.ca>

> If this is happening on all 16 nodes, it sounds very, very much like a
> kernel deadlock, although problems with the specific motherboard/chipset
> cannot be ruled out.  Can't help you with the motherboard if that turns

I see essentially zero reason to blame the kernel, especially since:
	1. it's new hardware, of unknown quality and config.
	2. he's tried a HUGE variety of kernels (2.2 shares very 
	little with 2.4.2-ac20!)
	3. he's demonstrated it under purely compute loads.

I'd be looking at whether the power is clean, bios/jumper settings,
reducing the number of cards, lm-sensors, checking s-specs, etc.


From rbbrigh at valeria.mp.sandia.gov  Thu Mar 15 15:38:35 2001
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Thu, 15 Mar 2001 16:38:35 -0700 (MST)
Subject: Huinalu Linux SuperCluster
In-Reply-To: <3AB118CE.6C23F8BE@myri.com> from "Patrick Geoffray" at Mar
 15, 2001 02:32:30 PM
Message-ID: <200103152339.QAA00356@dogbert.mp.sandia.gov>

> 
> > number to what we currently have up and running as a parallel machine.
> > There are another 400+ 466 MHz Alphas sitting next those 1024 nodes that
> > will be integrated in the next few weeks.
> 
> My dream... 
> How do you do to get all of these toys at Sandia ? (blackmail some
> politicians ?) 
> If you figure out that you have too many machines, a lot of people would
> be very happy to help you :-)

Actually, the Cplant system software was designed from the beginning to
support a cluster on the order of 10,000 nodes.  The fact that we had fewer
was just a limitation of the budget.  Our need for help is independent of
the number of machines, but comes from the desire to have a more robust
environment and more advanced features.  The large number of machines should
be an enticement for working with/for us, but it isn't the primary reason
we need help. (This probably isn't the right forum for recruiting, but send
your resumes to jobs at cs.sandia.gov if you would like to join us.)

> 
> How many nodes in Cplant these days (total) ?
> 

The total is hard to get at without a breakdown of the different production
and development clusters:

   SNL/NM
   ------
   Alaska       272   500 MHz EV56
   Barrow        96   500 MHz EV56
   Siberia      592   500 MHz EV6
   Antarctica 
     SON         84   80 466 MHz EV6 + 4 500 MHz EV6
     SRN         24   500 MHz EV6
     Middle    1536   466 MHz EV6
   Iceberg       32   500 MHz EV56
   Icberg2       16   500 MHz EV6
               ----
               2652
   
   SNL/CA
   ------
   Asilomar     128   433 MHz EV56
   Carmel       128   500 MHz EV6
   Diablo       256   466 MHz EV6
   ?             32   466 MHz EV6
                ---
                544

               ----
               3196

Antarctica is designed to be switchable like the ASCI/Red machine, so it
has a large middle section that can move between open, unclassified, and
(currently missing) classified "heads".

-Ron


From rajkumar at csse.monash.edu.au  Fri Mar 16 01:21:44 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Fri, 16 Mar 2001 20:21:44 +1100
Subject: Fwd: Clusters@TOP500 Debuts -- TOP500 Team Is Publishing a New 
 ListAboutHigh-Performance Clusters
Message-ID: <3AB1DB28.48A2C5ED@csse.monash.edu.au>

Dear All,

I am forwarding the news release that was posted yesterday as I did not see this news
on Beowulf list.

A new TopClusters (100 to start with) initiative which is a result of discussion 
that happened between TFCC community with TOP500 Team (Jack Dongarra). There was lot 
of discussion on TFCC mailing list during Feb. 2000 about this:
    http://www.listproc.bucknell.edu/archives/tfcc-l/200002/
There was discussion on Beowulf list on this subject as well.
The discussion had very much focused on the creation of listing of all major sites that 
run clusters and the choice of appropriate benchmarks to be used for measuring the 
performance of various parameters including numeric, I/O, database, web, TPC, 
simulation, application level performance.

Our earlier sites:
  http://www.TopClusters.org and
  http://www.Top500Clusters.org
now points to this new site. 

I just noticed Press is already and reported results from this TopClusters project:
in one of Australian newspapers as
  Australia ranked in cluster supercomputing heavyweights
  http://it.mycareer.com.au/breaking/20010316/A29826-2001Mar16.html?itnewsletter

Please submit your cluster details and information for this resource.

Thanks.
Raj

-------- Original Message --------
Subject: Clusters at TOP500 Debuts -- TOP500 Team Is Publishing a New List AboutHigh-Performance
Clusters
Date: Wed, 14 Mar 2001 22:43:04 -0600 (CST)
From: Anas Nashif <nashif at top500.org>
To: top500-info at top500.org

Precedence: bulk

PRESS RELEASE 
Contact: Hans W. Meuer, Erich Strohmaier, Jack J Dongarra and Horst D.
Simon at clusters at top500.org 
============================================================== 

Clusters at TOP500 Debuts -- TOP500 Team Is Publishing a New List About
High-Performance Clusters 
  

Reflecting the strong emerging trend of cluster computing in
high-performance computing (HPC), the team which has compiled the TOP500
list of global supercomputing sites has developed a similar list to rank
the world's top 100 cluster computing systems. 

A variety of concepts and technologies are used to build these clusters
and they are used for quite different applications. "It is quite possible
that by the middle of this decade clusters in their myriad forms will be
the dominant high-end computing architecture," said Thomas Sterling of the
California Institute of Technology and the NASA Jet Propulsion Laboratory
in his editorial for the start of this new project.  

Currently there is no publicly available basis which would allow the
compilation of statistics about different technologies and the application
areas of cluster computing.  To provide a basis for these statistics about
cluster computing, the TOP500 team therefore decided to assemble a
separate list of high-performance computing clusters called "Clusters @
TOP500." 

"This is similar to the situation in the general HPC market a decade ago
before we started the TOP500 project," said Hans W. Meuer, Professor at
the University of Mannheim, Germany, who began the work that led to the
TOP500 Supercomputer lists. "Unfortunately, the coverage of cluster
computing by the TOP500 is not sufficient to produce specialized
statistics about this increasingly important HPC segment. This is mainly
due to the scarcity of results of the Linpack benchmark on such systems." 

The selection of an appropriate cluster specific benchmark for such a
ranking is critical and the collection of results for it time consuming,
according to Erich Strohmaier, a benchmarking expert at the U.S Department
of Energy's National Energy Research Scientific Computing Center (NERSC)
and a member of the TOP500 team since it began. To promote the development
of this new list the TOP500 team therefore decided to start the collection
of data about high-performance clusters and rank them initially by
peak-performance only. 

At the same time TOP500 team is discussing with the Institute of
Electrical and Electronics Engineers' Task Force on Cluster Computing
(IEEE TFCC) the proper choice of a benchmark for ranking cluster. 

"This benchmark will be used to rank the new cluster list once a
sufficient number of results are available,"  said Jack Dongarra of the
University of Tennessee and a member of the TOP500 Team. "In the meantime
the HPC cluster community will already benefit from the available
information about prevailing cluster technologies and applications." 

The collection of information has already started and will continue on an
ongoing basis. More background information, access to all collected data,
and interfaces for submitting information about new cluster systems can be
found at http://clusters.top500.org/
  

About the TOP500 Supercomputer Sites 

The TOP500 project was started in 1993 to provide a reliable basis for
tracking and detecting trends in high-performance computing. Twice a year,
a list of the sites operating the 500 most powerful computer systems is
assembled and released. The best performance on the Linpack benchmark is
used as performance measure for ranking the computer systems. The list
contains a variety of information including the system specifications and
its major application areas. Analyzing these data in the past has revealed
major trends in HPC architectures, technologies, and applications together
with the changes in market shares of companies and geographical
distribution of consumers and producers of HPC systems. All information
about the TOP500 and its results can be accessed at http://www.top500.org/
  

About the IEEE TFCC 

The TFCC is an international forum, which promotes cluster computing
research and education. It participates in helping to set up and promote
technical standards in this area. The Task Force is concerned with issues
related to the design, analysis, development and implementation of
cluster-based systems. Of particular interest are: cluster hardware
technologies, distributed environments, application tools and utilities,
as well as the development and optimization of cluster-based applications.
Additional information my be found at http://www.ieeetfcc.org/

# # #


From jfduff at mtu.edu  Fri Mar 16 06:21:48 2001
From: jfduff at mtu.edu (John Duff)
Date: Fri, 16 Mar 2001 09:21:48 -0500 (EST)
Subject: using graphics cards as generic FLOP crunchers
Message-ID: <200103161421.JAA13685@tamarack.cs.mtu.edu>

Hello,

There are groups at Stanford (WireGL) and Princeton who have done work on 
parallel graphics on PC clusters.  They put a high-end PC graphics card (such 
as an NVidia card) in each slave node of a cluster, and then parallelize the 
rendering of 3D scenes across the cluster, taking advantage of the video
hardware acceleration, and then combine the image either on a big tiled
projecter or on a single computer's monitor.  This is all well and good,
but it struck me that when other groups at these universities who have no
interest in graphics use the same cluster, all that computing horsepower
in the GPUs on the graphics cards just sits idle.  Would it be possible
to write some sort of thin wrapper API over OpenGL that heavy-duty
number-crunching parallel apps could use to offload some of the FLOPs from 
the main cpu(s) on each slave node to the gpu(s) on the graphics card?
It would seem pretty obvious that the main cpu(s) would always be faster
for generic FLOP computations, so I would think only specific apps might
benefit from the extra cycles of the gpu(s).  Of course, the synchronization
issues might be too much of a pain to deal with in the end as well.  Has
anyone heard of someone trying this, or know of any showstopper issues?

Thanks,
John


From award at andorra.ad  Thu Mar 15 06:35:35 2001
From: award at andorra.ad (Alan Ward)
Date: Wed, 15 Mar 2001 15:35:35 +0100
Subject: Mysterious kernel hangs
References: <Pine.LNX.4.21.0103151424130.24291-100000@maloney.inf.ethz.ch>
Message-ID: <38CF9FB7.209F40C6@andorra.ad>

It may seem simplistic, but have you any reason to think 
your machines aren't simply overheating?

There can be a lot of Joules going 'round in a dual box. Try
them out at say, 800 MHz, see if there's a difference. Idem
with
the case open.

Best regards,
Alan Ward


Felix Rauch ha escrit:
> 
> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
> 
> The nodes have STL2 boards (Version A28808-301), onboard adaptec SCSI
> controllers (7899P), onboard intel Fast Ethernet adapters (82557
> [Ethernet Pro 100]) and additional Packet Engines Hamachi GNIC-II
> Gigabit Ethernet cards.
> 
> We tried kernels 2.2.x, 2.4.1 and now even 2.4.2-ac20, but it seems to
> be the same problem with all kernels: When we run experiments which
> use the network intensively, any of the machines will just freeze
> after a few hours. The frozen machine does not respond to anything and
> up to now we were not able to see any log-entries related to the
> freeze on virtual console 10 :-(   We switched now on all the "Kernel
> Hacking" stuff in the kernel configuration (especially the logging)
> and we will try again, hopefuly we will at least see some log outputs.
> 
> The freezes do also happen if we let non-network-intensive jobs run on
> the machines (e.g. SETI at home), but clearly they happen less often.
> 
> Does anyone of you have any ideas what could go wrong or what we could
> try to find the cause of the problems?
> 
> Regards,
> Felix
> --
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lindahl at conservativecomputer.com  Fri Mar 16 09:23:20 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Fri, 16 Mar 2001 12:23:20 -0500
Subject: Huinalu Linux SuperCluster
In-Reply-To: <p04320406b6d5a80d717e@[208.24.120.15]>; from mlucas@imagelinks.com on Wed, Mar 14, 2001 at 06:00:19PM -0500
References: <p04320406b6d5a80d717e@[208.24.120.15]>
Message-ID: <20010316122320.A5524@wumpus>

On Wed, Mar 14, 2001 at 06:00:19PM -0500, Mark Lucas wrote:

> Huinalu is a 520-processor IBM Netfinity Linux Supercluster. It 
> consists of 260 nodes, each housing two Pentium III 933 megahertz 
> processors. Their combined theoretical peak performance is a 
> staggering 478 billion floating point operations per second 
> (gigaflops). It is, at the present time, the world's most powerful 
> Linux Supercluster.

Not only is CPlant already faster, but in a few more weeks the FSL
AlphaLinux cluster will be expanded to ~600 Alphas, which will give it
a theoretical peak of 835 GFlops.

And I hear that the 1 TFlop NCSA cluster is actually installed and
running.

It can be dangerous to claim that anything is the fastest ;-)

-- greg


From lindahl at conservativecomputer.com  Fri Mar 16 09:26:29 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Fri, 16 Mar 2001 12:26:29 -0500
Subject: Fortran 90 and BeoMPI
In-Reply-To: <Pine.LNX.4.30.0103141654240.21442-100000@kenzo.iwr.uni-heidelberg.de>; from bogdan.costescu@iwr.uni-heidelberg.de on Wed, Mar 14, 2001 at 05:12:38PM +0100
References: <20010314072825.T7935@kotako.analogself.com> <Pine.LNX.4.30.0103141654240.21442-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <20010316122629.B5524@wumpus>

On Wed, Mar 14, 2001 at 05:12:38PM +0100, Bogdan Costescu wrote:

> There is another problem: when you have several compilers installed on the
> same system and different MPI libraries compiled for each of them. The
> SCore system, for example, provides an option (e.g. mpif77 -fc pgi) in
> order to choose which combination of compiler, flags and include/libraries
> to use. The same would probably apply to MPI on top of several transport
> libraries, each with their own include/libs.

The easiest way to work around grungy, ugly build systems is to create
some name like "mpif77" (or "xlf" ;-) and write a script which does
the right thing under the hood -- add extra arguments, delete
arguments that aren't right, link the local mpi. I've done this a
bunch of times; it's not that hard to take something that is quite
complex and has never been ported to Linux and compile it that way.

-- g


From dvos12 at calvin.edu  Fri Mar 16 09:55:33 2001
From: dvos12 at calvin.edu (David Vos)
Date: Fri, 16 Mar 2001 12:55:33 -0500 (EST)
Subject: Scyld
Message-ID: <Pine.GSO.4.21.0103161234550.2849-100000@ursa.calvin.edu>

I just did a fresh install of Sycld 27BZ-7 (I think it is the newest copy)
using the default Gnome installation, but setting my own partitions.  
After booting, it dumped out a bunch of error messages such as:

Finding module dependencies depmod: error in loading shared libraries:
libbproc.so.1: cannot open shared object file: No such file or directory

And:
modprobe: Can't open dependencies file /lib/modules/2.2.17-33/modules.dep
(no such file or directory)

So I recompiled the kernel for the second error, but modules.dep was never
created. (I did the full make clean, make menuconfig, make dep, make
bzImage, make modules, make modules_install).  The error never went away.

For the second, I found the file in /usr/lib, so I added /usr/lib to
/etc/ld.so.conf and ran ldconfig.  Still didn't help.

I don't know what is up.  Is this version of Scyld supposed to work
differently?  If so, it may be the scratch the CD-ROM came with.

David


From fmuldoo at alpha2.eng.lsu.edu  Sat Mar 17 11:13:13 2001
From: fmuldoo at alpha2.eng.lsu.edu (Frank Muldoon)
Date: Sat, 17 Mar 2001 13:13:13 -0600
Subject: problems compiling mpich-1.2.1 on Linux PC
Message-ID: <3AB3B748.1E1AFB93@me.lsu.edu>

I am having problems compiling mpich-1.2.1 on a PC running RedHat 7.0
using gcc.  I have tried a number of configure options and have always
gotten the same error when I try to compile mpich.  I have no problem
compiling it on an Alpha running unix.

Thanks,
Frank

gcc -DHAVE_SLOGCONF_H   -DMPI_LINUX  -c slog_irec_write.c -I..
-I/usr4/programs/mpi_NAGf95/mpich-1.2.1/mpe/slog_api/include
slog_irec_write.c: In function `SLOG_Irec_SetMinRec':
slog_irec_write.c:1171: `SLOG_nodeID_t' is promoted to `int' when passed
through `...'
slog_irec_write.c:1171: (so you should pass `int' not `SLOG_nodeID_t' to
`va_arg')
slog_irec_write.c:1172: `SLOG_cpuID_t' is promoted to `int' when passed
through `...'
make[7]: *** [slog_irec_write.o] Error 1
make[6]: *** [sloglib] Error 2
make[5]: *** [libslog.a] Error 2
make[4]: *** [default] Error 2
make[3]: *** [build_libs_progs] Error 2
make[2]: *** [mpelib] Error 1
make[1]: *** [mpi-addons] Error 2
make: *** [mpi] Error 2
lab2220b.me.lsu.edu>


--
Frank Muldoon
Computational Fluid Dynamics Research Group
Louisiana State University
Baton Rouge, LA 70803
225-344-7676 (h)
225-388-5217 (w)


From hack at nt-nv.com  Fri Mar 16 11:35:52 2001
From: hack at nt-nv.com (hack at nt-nv.com)
Date: Fri, 16 Mar 2001 15:35:52 -0400
Subject: Scyld
In-Reply-To: <Pine.GSO.4.21.0103161234550.2849-100000@ursa.calvin.edu>
References: <Pine.GSO.4.21.0103161234550.2849-100000@ursa.calvin.edu>
Message-ID: <01031615375201.01026@portal1.secure-ezone.com>

The libbproc.so.1 library as you stated is in /usr/lib.  Create a link from
/lib to /usr/lib and try again.  It worked for me.

ln -s /usr/lib/libbproc.so.1 /lib/libbproc.so.1

Brian.

On Fri, 16 Mar 2001, you wrote:
> I just did a fresh install of Sycld 27BZ-7 (I think it is the newest copy)
> using the default Gnome installation, but setting my own partitions.  
> After booting, it dumped out a bunch of error messages such as:
> 
> Finding module dependencies depmod: error in loading shared libraries:
> libbproc.so.1: cannot open shared object file: No such file or directory
> 
> And:
> modprobe: Can't open dependencies file /lib/modules/2.2.17-33/modules.dep
> (no such file or directory)
> 
> So I recompiled the kernel for the second error, but modules.dep was never
> created. (I did the full make clean, make menuconfig, make dep, make
> bzImage, make modules, make modules_install).  The error never went away.
> 
> For the second, I found the file in /usr/lib, so I added /usr/lib to
> /etc/ld.so.conf and ran ldconfig.  Still didn't help.
> 
> I don't know what is up.  Is this version of Scyld supposed to work
> differently?  If so, it may be the scratch the CD-ROM came with.
> 
> David
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Fri Mar 16 15:19:01 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 16 Mar 2001 18:19:01 -0500 (EST)
Subject: AMD annoyed...
Message-ID: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu>

Dear List Persons,

Well, it turns out AMD is annoyed by my "publishing" pre-release
benchmarks of the dual Athlon, in spite of all the caveats and warnings
that I was testing a pre-release stepping of the chipset.  At a guess,
the folks in marketing there don't think that technical folks know the
difference between pre-release numbers and final numbers (and perhaps
don't understand just how enthusiastic this site was making folks about
the dual athlon even in the brief time it was up).  Who knows, they
could be right.

Although I doubt that there is anything they could do to force me, as a
matter of courtesy to them and to my hosts at ASL I'm taking the site
down "for a few weeks" until AMD gives the go-ahead.  I imagine that
they'll want me to redo the numbers if and when they get around to
releasing the system.  Sigh.  Hopefully I'll have time -- this week was
spring break (and I spent about three days messing with this that I'm
less likely to have in April).

Hopefully, anybody who really needed the preliminary numbers to help
make critical purchase decisions has looked already.  I do intend to put
the numbers back up (and complete the two or three benchmarks that
weren't quite finished) when permitted to do so.

Bemusedly yours,

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From JParker at coinstar.com  Fri Mar 16 16:00:11 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Fri, 16 Mar 2001 16:00:11 -0800
Subject: problems compiling mpich-1.2.1 on Linux PC
Message-ID: <OFCAE58F9B.13D82801-ON88256A11.0083B147@coinstar.com>

G'Day !

I use Debian, so I may be wrong, but didn't Redhat put a development 
compiler in the 7.0 release ?  Have you tried using a known stable version 
of GCC ?

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!


Frank Muldoon <fmuldoo at alpha2.eng.lsu.edu>
Sent by: beowulf-admin at beowulf.org
03/17/01 11:13 AM

 
        To:     beowulf at beowulf.org
        cc: 
        Subject:        problems compiling mpich-1.2.1 on Linux PC

I am having problems compiling mpich-1.2.1 on a PC running RedHat 7.0
using gcc.  I have tried a number of configure options and have always
gotten the same error when I try to compile mpich.  I have no problem
compiling it on an Alpha running unix.

Thanks,
Frank

gcc -DHAVE_SLOGCONF_H   -DMPI_LINUX  -c slog_irec_write.c -I..
-I/usr4/programs/mpi_NAGf95/mpich-1.2.1/mpe/slog_api/include
slog_irec_write.c: In function `SLOG_Irec_SetMinRec':
slog_irec_write.c:1171: `SLOG_nodeID_t' is promoted to `int' when passed
through `...'
slog_irec_write.c:1171: (so you should pass `int' not `SLOG_nodeID_t' to
`va_arg')
slog_irec_write.c:1172: `SLOG_cpuID_t' is promoted to `int' when passed
through `...'
make[7]: *** [slog_irec_write.o] Error 1
make[6]: *** [sloglib] Error 2
make[5]: *** [libslog.a] Error 2
make[4]: *** [default] Error 2
make[3]: *** [build_libs_progs] Error 2
make[2]: *** [mpelib] Error 1
make[1]: *** [mpi-addons] Error 2
make: *** [mpi] Error 2
lab2220b.me.lsu.edu>


--
Frank Muldoon
Computational Fluid Dynamics Research Group
Louisiana State University
Baton Rouge, LA 70803
225-344-7676 (h)
225-388-5217 (w)


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010316/cb09e12c/attachment.html>

From siegert at sfu.ca  Fri Mar 16 16:38:07 2001
From: siegert at sfu.ca (Martin Siegert)
Date: Fri, 16 Mar 2001 16:38:07 -0800
Subject: annoyed (was: AMD annoyed...)
In-Reply-To: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu>; from rgb@phy.duke.edu on Fri, Mar 16, 2001 at 06:19:01PM -0500
References: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu>
Message-ID: <20010316163807.A30797@stikine.ucs.sfu.ca>

On Fri, Mar 16, 2001 at 06:19:01PM -0500, Robert G. Brown wrote:
> 
> Well, it turns out AMD is annoyed by my "publishing" pre-release
> benchmarks of the dual Athlon, in spite of all the caveats and warnings
> that I was testing a pre-release stepping of the chipset.  At a guess,
> the folks in marketing there don't think that technical folks know the
> difference between pre-release numbers and final numbers (and perhaps
> don't understand just how enthusiastic this site was making folks about
> the dual athlon even in the brief time it was up).  Who knows, they
> could be right.

Now I am annoyed.
I few weeks ago I basically had decided to build a 96 node beowulf out
of dual Athlons.
Then AMD pushed back the release of the duals to the second quarter of
2001. Very bad if your grant money arrives in a few weeks and your
fellow researchers want to use the new cluster right away.
And I still have to test the thing (stability, setup, etc.) ...

Now, Robert did part of my job by benchmarking the dual Athlon (thanks Robert!)
and they "encourage" him to take his site down ...

Does AMD wants to force me to buy Intels? Sigh.

Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================


From andreas at amy.udd.htu.se  Sat Mar 17 10:37:47 2001
From: andreas at amy.udd.htu.se (Andreas Boklund)
Date: Sat, 17 Mar 2001 19:37:47 +0100 (CET)
Subject: Scyld and hostnames on the nodes. HOW?
In-Reply-To: <200103161421.JAA13685@tamarack.cs.mtu.edu>
Message-ID: <Pine.LNX.4.21.0103171930120.23450-100000@amy.udd.htu.se>

Howdy Beowulfers & others.

I have a small problem on my Scyld setup. I am using 10'pc's
just to see if it would be interesing to use scyld for our
next cluster or not. We are running a lot of Fluent simulations.
I have no problem spawning jobs onto the nodes (i think).

But i get a "gethostbyname" error. And that is correct when 
i try to do bpsh 0 hostname -f or -s or -i i get a lot of 
errors. I have added the entry that executes 'hostname node0'
in the node_up script but i must be missing something vital.

I hope that the hostname issue is the only problem i will have,
but since i havent gotten any further i dont know. Is anyone 
out there running fluent or similar apps on Scyld? If its not
possible to run them diskless i might as well go back to 
the setup i have on the other cluster and use it again
atleast i know how that stuff works 100%, or i can run mosix.

Sorry for rambeling but i have been tackling this problem for
the last 11 hours and now im heading home for food and a
bath. Ill get back into it tomorrow morning so i hope someone
that has an answer for me works on saturdays :(

Best regards
//Andreas


*********************************************************
*   Administator of Amy and Sfinx(Iris23)               *
*                                                       *
*   Voice: 070-7294401                                  *
*   ICQ: 12030399                                       *
*   Email: andreas at shtu.htu.se, boklund at linux.nu        *
*                                                       *
*   That is how you find me, How do -I- find you ?      *
*********************************************************


From davidgrant at mediaone.net  Sat Mar 17 12:48:39 2001
From: davidgrant at mediaone.net (David Grant)
Date: Sat, 17 Mar 2001 15:48:39 -0500
Subject: annoyed (was: AMD annoyed...)
References: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu> <20010316163807.A30797@stikine.ucs.sfu.ca>
Message-ID: <006701c0af23$aad55e00$954f1e42@ne.mediaone.net>

Martin,

I've had many previous firm commitments from AMD that date back to Q3 of
last year.  I know your frustration.  I've passed on information to end
users with FIRM assurances from AMD only to have the timeline changed again.

The current official statement from AMD is that the dual Athlon CPU option
"should"  be released by late Q2.   That looks to be June in my books.   I
don't bet, but if I was a betting man, I'd say we'll see it sometime this
summer as a best case scenario.....

time may prove me wrong....but I don't think so....

just my .02


David A. Grant,  V.P. Cluster Technologies
GSH Intelligent Integrated Systems
95 Fairmount St. Fitchburg Ma 01450
Phone 603.898.9717       Fax 603.898.9719
Email: davidg at gshiis.com      Web: www.gshiis.com
"Providing High Performance Computing Solutions for Over a Decade"


----- Original Message -----
From: "Martin Siegert" <siegert at sfu.ca>
To: "Robert G. Brown" <rgb at phy.duke.edu>
Cc: <beowulf at beowulf.org>
Sent: Friday, March 16, 2001 7:38 PM
Subject: annoyed (was: AMD annoyed...)


> On Fri, Mar 16, 2001 at 06:19:01PM -0500, Robert G. Brown wrote:
> >
> > Well, it turns out AMD is annoyed by my "publishing" pre-release
> > benchmarks of the dual Athlon, in spite of all the caveats and warnings
> > that I was testing a pre-release stepping of the chipset.  At a guess,
> > the folks in marketing there don't think that technical folks know the
> > difference between pre-release numbers and final numbers (and perhaps
> > don't understand just how enthusiastic this site was making folks about
> > the dual athlon even in the brief time it was up).  Who knows, they
> > could be right.
>
> Now I am annoyed.
> I few weeks ago I basically had decided to build a 96 node beowulf out
> of dual Athlons.
> Then AMD pushed back the release of the duals to the second quarter of
> 2001. Very bad if your grant money arrives in a few weeks and your
> fellow researchers want to use the new cluster right away.
> And I still have to test the thing (stability, setup, etc.) ...
>
> Now, Robert did part of my job by benchmarking the dual Athlon (thanks
Robert!)
> and they "encourage" him to take his site down ...
>
> Does AMD wants to force me to buy Intels? Sigh.
>
> Martin
>
> ========================================================================
> Martin Siegert
> Academic Computing Services                        phone: (604) 291-4691
> Simon Fraser University                            fax:   (604) 291-4242
> Burnaby, British Columbia                          email: siegert at sfu.ca
> Canada  V5A 1S6
> ========================================================================
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at plogic.com  Sat Mar 17 14:15:31 2001
From: deadline at plogic.com (Douglas Eadline)
Date: Sat, 17 Mar 2001 17:15:31 -0500 (EST)
Subject: AMD annoyed...
In-Reply-To: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.LNX.4.30.0103171711580.31102-100000@otto.plogic.internal>

Just the other day someone posted a link to a press release
that said AMD was interested in Beowulf Systems.

Go figure.

Doug
-------------------------------------------------------------------
Paralogic, Inc.           |     PEAK     |      Voice:+610.814.2800
130 Webster Street        |   PARALLEL   |        Fax:+610.814.5844
Bethlehem, PA 18015 USA   |  PERFORMANCE |    http://www.plogic.com
-------------------------------------------------------------------


From lowther at att.net  Sat Mar 17 14:48:37 2001
From: lowther at att.net (lowther at att.net)
Date: Sat, 17 Mar 2001 17:48:37 -0500
Subject: AMD annoyed...
References: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu>
Message-ID: <3AB3E9C5.BF4DF1E1@att.net>

"Robert G. Brown" wrote:
> 
> 
> Bemusedly yours,
> 

I saw the page briefly before it came down.  Would it be 'out of bounds'
to give a hint for those waiting whether or not AT THIS MOMENT it is
wise for them to wait longer or would they be just as well off going
ahead with their projects based on currently available technologies? 
I'm sure if  you had a glowing report, they wouldn't be upset at all if
you were to say something positive based on what you know.  After all,
those waiting know the performance of a single processor board and just
need to know in a price/performance framework whether or not they can
get at least, say 190% performance gain before they should consider
single board solutions?  Not that anyone should take your silence as
anything other than a gentlemanly agreement with AMD.

Ken


From rgb at phy.duke.edu  Sat Mar 17 15:11:57 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 17 Mar 2001 18:11:57 -0500 (EST)
Subject: AMD annoyed...
In-Reply-To: <3AB3E9C5.BF4DF1E1@att.net>
Message-ID: <Pine.LNX.4.30.0103171757010.3271-100000@ganesh.phy.duke.edu>

On Sat, 17 Mar 2001 lowther at att.net wrote:

> "Robert G. Brown" wrote:
> >
> >
> > Bemusedly yours,
> >
>
> I saw the page briefly before it came down.  Would it be 'out of bounds'
> to give a hint for those waiting whether or not AT THIS MOMENT it is
> wise for them to wait longer or would they be just as well off going
> ahead with their projects based on currently available technologies?
> I'm sure if  you had a glowing report, they wouldn't be upset at all if
> you were to say something positive based on what you know.  After all,
> those waiting know the performance of a single processor board and just
> need to know in a price/performance framework whether or not they can
> get at least, say 190% performance gain before they should consider
> single board solutions?  Not that anyone should take your silence as
> anything other than a gentlemanly agreement with AMD.

I'm less optimistic than you are about what they would or wouldn't be
annoyed by, but let's try it.  After all, I have no agreement with AMD
at all -- they haven't even talked to me in person.  I only have heard
through "channels" that they object to my publishing free advertising in
a venue rich in large scale and technologically knowledgeable purchasers
(turnkey companies and end users both) that they couldn't pay an agency
any money to penetrate and which, if they did, nobody would take
seriously.

My summary report would be that folks interested in running CPU bound
code are (as one might expect) perfectly safe waiting for the dual
Athlon if its release time and expected price point match their needs.
I saw nothing at all that would make me hesitate to get it for CPU bound
code.  For code that is mixed CPU and memory bound code the picture is
less clear.  Very subjectively it had no major "problems" but OTOH its
performance curve was, not unreasonably, quite different from Intel
duals.  I experienced no system instabilities (the test system didn't
crash even under 100% loads over 20 hour periods), although in three or
four days that may or may not be significant and there were subsystems
I didn't test at all.

If your code is expected to be heavily memory I/O IPC bound -- two
processors doing a lot of talking to each other on the same system --
then you might do better with Intel.  Or you might not -- one reason AMD
is probably worried about the pre-release numbers is that they are
pre-release, and they may be working on specific subsystems that would
significantly alter these numbers.  Also, we're talking about a
performance >>profile<<, which is a very nonlinear function.  Intel
might be optimal in one region and AMD in another.

For folks in that situation, I'd say that the preliminary numbers I had
posted might well convince a lot of people near a neck in the (beta)
profile to wait, and would definitely convince people who are already in
productive territory to wait, but since AMD won't let me post the
figures and numbers, we'll never know who is who...

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From mathboy at velocet.ca  Sat Mar 17 15:24:33 2001
From: mathboy at velocet.ca (Velocet)
Date: Sat, 17 Mar 2001 18:24:33 -0500
Subject: AMD annoyed...
In-Reply-To: <Pine.LNX.4.30.0103171711580.31102-100000@otto.plogic.internal>; from deadline@plogic.com on Sat, Mar 17, 2001 at 05:15:31PM -0500
References: <Pine.LNX.4.30.0103161800210.3271-100000@ganesh.phy.duke.edu> <Pine.LNX.4.30.0103171711580.31102-100000@otto.plogic.internal>
Message-ID: <20010317182433.Z27759@velocet.ca>

On Sat, Mar 17, 2001 at 05:15:31PM -0500, Douglas Eadline's all...
> 
> Just the other day someone posted a link to a press release
> that said AMD was interested in Beowulf Systems.

Well they're obviously reading this list since they pounced on our poor
volonteer, so perhaps they're reading this too and will actually
quickly move to 1) save face  2) maintain our interest in AMD as a leader
for processor choice in really cheap but effective Beowulf/supercomputing
solutions.

(I think I dropped enough corporate buzzwords to set their alarms off, no? :)

[ AMD - in all seriousness, a statement of your position with an EXPLANATION
  would do A LOT to shore up our burgeoning doubts and wild speculation. ]

/kc

> 
> Go figure.
> 
> Doug
> -------------------------------------------------------------------
> Paralogic, Inc.           |     PEAK     |      Voice:+610.814.2800
> 130 Webster Street        |   PARALLEL   |        Fax:+610.814.5844
> Bethlehem, PA 18015 USA   |  PERFORMANCE |    http://www.plogic.com
> -------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From cozzi at hertz.rad.nd.edu  Sat Mar 17 18:54:10 2001
From: cozzi at hertz.rad.nd.edu (Marc Cozzi)
Date: Sat, 17 Mar 2001 21:54:10 -0500
Subject: AMD annoyed...
Message-ID: <F163413C9250D211A55C0060979D52802A30F5@hertz.rad.nd.edu>

I know they lost a sale of 88 CPU to me. I wanted to wait a bit
to see what SMP plans would develop with AMD. Unfortunately the money
would not wait. Went with Intel. Maybe next time around?

 marc


-----Original Message-----
From: Velocet [mailto:mathboy at velocet.ca]
Sent: Saturday, March 17, 2001 6:25 PM
To: beowulf at beowulf.org
Subject: Re: AMD annoyed...


On Sat, Mar 17, 2001 at 05:15:31PM -0500, Douglas Eadline's all...
> 
> Just the other day someone posted a link to a press release
> that said AMD was interested in Beowulf Systems.

Well they're obviously reading this list since they pounced on our poor
volonteer, so perhaps they're reading this too and will actually
quickly move to 1) save face  2) maintain our interest in AMD as a leader
for processor choice in really cheap but effective Beowulf/supercomputing
solutions.

(I think I dropped enough corporate buzzwords to set their alarms off, no?
:)

[ AMD - in all seriousness, a statement of your position with an EXPLANATION
  would do A LOT to shore up our burgeoning doubts and wild speculation. ]

/kc

> 
> Go figure.
> 
> Doug
> -------------------------------------------------------------------
> Paralogic, Inc.           |     PEAK     |      Voice:+610.814.2800
> 130 Webster Street        |   PARALLEL   |        Fax:+610.814.5844
> Bethlehem, PA 18015 USA   |  PERFORMANCE |    http://www.plogic.com
> -------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
CANADA 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From iseker at isbank.net.tr  Sun Mar 18 14:02:26 2001
From: iseker at isbank.net.tr (ILGIN SEKER)
Date: Mon, 19 Mar 2001 00:02:26 +0200
Subject: OT: Mailing list question
Message-ID: <3AB53072.7FA8B2DE@isbank.net.tr>

Does anybody know how I can change the list mode from digest-mode to
normal-mode? Or whom should I ask?


From rgb at phy.duke.edu  Sun Mar 18 15:25:29 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sun, 18 Mar 2001 18:25:29 -0500 (EST)
Subject: OT: Mailing list question
In-Reply-To: <3AB53072.7FA8B2DE@isbank.net.tr>
Message-ID: <Pine.LNX.4.30.0103181824370.3271-100000@ganesh.phy.duke.edu>

On Mon, 19 Mar 2001, ILGIN SEKER wrote:

> Does anybody know how I can change the list mode from digest-mode to
> normal-mode? Or whom should I ask?


> To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf

Pretty clear to me...  Connect, identify yourself using your mailman
passwd, and change the delivery mode.  Nothing to it.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From goenzoy at gmx.net  Sun Mar 18 15:56:58 2001
From: goenzoy at gmx.net (Gottfried F. Zojer)
Date: Mon, 19 Mar 2001 00:56:58 +0100
Subject: OT:(maybe)NVidea GeForce(76GFLOPS)
Message-ID: <3AB54B49.E020CEBD@gmx.net>

Hi,

Maybe OT for beowulf but I read in a german computer magazine
something about the impressive floating-point performance of the
new GeForce 3.Does anybody know any reference in what form
it will influence  the performance of a cluster primary for 3D Apps.

Thanks in advance

Gottfried F. Zojer


From dvos12 at calvin.edu  Sun Mar 18 15:55:49 2001
From: dvos12 at calvin.edu (David Vos)
Date: Sun, 18 Mar 2001 18:55:49 -0500 (EST)
Subject: OT: Mailing list question
In-Reply-To: <3AB53072.7FA8B2DE@isbank.net.tr>
Message-ID: <Pine.GSO.4.21.0103181854540.14272-100000@ursa.calvin.edu>

Go to:
http://www.beowulf.org/mailman/listinfo/beowulf
and enter your email address in the box at the bottom of the page and
select "Edit Options."

David

On Mon, 19 Mar 2001, ILGIN SEKER wrote:

> Does anybody know how I can change the list mode from digest-mode to
> normal-mode? Or whom should I ask?
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


From billygotee at operamail.com  Sun Mar 18 21:19:06 2001
From: billygotee at operamail.com (Brandon Arnold)
Date: Sun, 18 Mar 2001 23:19:06 -0600
Subject: Sharing LNE100TX blues...
Message-ID: <003a01c0b034$2196e8e0$6801a8c0@communicomm.com>

I've been battling with my LNE100TX v4 NIC for the past 3 days.  I see its a common problem with most Linux folks that have bought it, but since I've read lots of posts claiming success, I have faith that I'm doing something wrong.

For all obvious reasons, the NIC isn't there.  Be aware, it works fine when I boot into Windows ME, so the card is plugged in securely.  I tried to use the netdrivers.tgz that was included on the installation disk for Linux users.  The files unzipped correctly, but only a few compiled.  I wont go into extensive detail about that, because most posts suggested compiling the updated tulip.c driver at Becker's site.  Therefore, I downloaded the latest tulip.c, pci-scan.c, pci-scan.h, and kern_compat.h, ran DOS2UNIX on them and then placed them all in the /usr/src/modules directory.  The compile command gave me fits because it is different for Mandrake 7.2, but I finally came up with the following:

For tulip.c:

# gcc -I/usr/src/linux-2.2.17/include -DMODULE -Wall -Wstrict-prototypes -O6 -c /usr/src/modules/tulip.c

For pci-scan.c:

# gcc -I/usr/src/linux-2.2.17/include -DMODULE -D__KERNEL__ -DEXPORT_SYMTAB -Wall -Wstrict-prototypes -O6 -c /usr/src/modules/pci-scan.c

They both returned the compiled tulip.o and pci-scan.o files, but both times I received an assembling error in modprob.c.  I'm not sure if this matters--it seems like it would. :-)  Anyway I ran insmod on both of the compiled files.  pci-scan.o installed correctly, but when I tried to install tulip.o, it returned 'tulip is busy' or something to that effect.  It seems like I read a few posts where people fixed this problem by changing the slot that the card was in.  Anyway, I changed the card to the slot directly below it, but still no cigar.  Of course, Windows ME still works.  Please forgive me on the non-descriptiveness of the errors, I had to reboot into Windows ME to type this message and forgot to record the exact statement. I think I included enough information for a guru to cite my problem, but if you need the exact statements, by all means, post and I'll include those as well.  I guess what I want is first hand instructions from someone else who had the same problem in Linux Mandrake 7.x and got it fixed, but I also welcome help from anyone else who sees my mistake. Thanks in advance!

-Brandon Arnold
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010318/cee84ef7/attachment.html>

From davidge at 1stchina.com  Mon Mar 19 00:19:44 2001
From: davidge at 1stchina.com (David)
Date: Mon, 19 Mar 2001 16:19:44 +0800
Subject: problem when booting the salve machine from floppy disk
Message-ID: <200103190826.DAA30610@blueraja.scyld.com>

Hello everyone,

I've installed the scyld beowulf front-end machine from 
RPMS. and I boot salve machine using a floppy disk 
generated by beoboot program on front-end machine. 
when the salve booting, It stoped at (looping at this step)
Sending RAPP requests ...............
screen. It seemed that there is no answer from front-end
machine. But I found a file named "unknown_addresses" 
was generated at the /var/beowulf directory on front-end
machine. 
Below is a copy of that file.
[root at cluster beowulf]# cat unknown_addresses
unknown 52:54:AB:DD:E5:C4
while 52:54:AB:DD:E5:C4 is the NIC's address of my salve
machine.
Anyone can tell me what make this occurred and how to solve
this question?

Thanks.


Sincerely yours,
David Ge <davidge at 1stchina.com>
Room 604, No. 168, Qinzhou Road, Shanghai.
Phone: (021)34140621-12 
2001-03-19 16:06:38


From Eugene.Leitl at lrz.uni-muenchen.de  Mon Mar 19 01:26:20 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Mon, 19 Mar 2001 10:26:20 +0100 (MET)
Subject: OT:(maybe)NVidea GeForce(76GFLOPS)
In-Reply-To: <3AB54B49.E020CEBD@gmx.net>
Message-ID: <Pine.GSO.4.03.10103191019220.1927-100000@sun1.lrz-muenchen.de>

On Mon, 19 Mar 2001, Gottfried F. Zojer wrote:

> Hi,
> 
> Maybe OT for beowulf but I read in a german computer magazine
> something about the impressive floating-point performance of the
> new GeForce 3.Does anybody know any reference in what form
> it will influence  the performance of a cluster primary for 3D Apps.

Apparently, GeForce3 allows you to write little vertex shader programs (up
to 128 instructions, 17 opcodes), mostly for adding and multiplying
vectors.

Could be useful for some codes, but as long as hardware development
remains so rapid non-portable programming is probably not worth the
effort.
 
        
From agrajag at linuxpower.org  Mon Mar 19 05:23:11 2001
From: agrajag at linuxpower.org (Jag)
Date: Mon, 19 Mar 2001 05:23:11 -0800
Subject: problem when booting the salve machine from floppy disk
In-Reply-To: <200103190826.DAA30610@blueraja.scyld.com>; from davidge@1stchina.com on Mon, Mar 19, 2001 at 04:19:44PM +0800
References: <200103190826.DAA30610@blueraja.scyld.com>
Message-ID: <20010319052311.G13901@kotako.analogself.com>

On Mon, 19 Mar 2001, David wrote:

> Hello everyone,
> 
> I've installed the scyld beowulf front-end machine from 
> RPMS. and I boot salve machine using a floppy disk 
> generated by beoboot program on front-end machine. 
> when the salve booting, It stoped at (looping at this step)
> Sending RAPP requests ...............
> screen. It seemed that there is no answer from front-end
> machine. But I found a file named "unknown_addresses" 
> was generated at the /var/beowulf directory on front-end
> machine. 

While you're booting the slave node, run 'beosetup' as root on the
front-end machine.  When the slave node starts sending out the RARP
requests, the MAC address will appear in the unkown column of beosetup.
Drag and drop that MAC address into the middle column.  Then hit
'apply'.  As soon as you do this, the front-end machine will recognize
the slave node and respond to the RARP requests.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010319/be1cf1c5/attachment.sig>

From rauch at inf.ethz.ch  Mon Mar 19 09:19:52 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Mon, 19 Mar 2001 18:19:52 +0100 (CET)
Subject: Mysterious kernel hangs
In-Reply-To: <Pine.LNX.4.21.0103151424130.24291-100000@maloney.inf.ethz.ch>
Message-ID: <Pine.LNX.4.21.0103191818180.5208-100000@maloney.inf.ethz.ch>

On Thu, 15 Mar 2001, Felix Rauch wrote:
> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
[...]

Thanks for the many hints I got. The solution:

It was a problem with the BIOS/firmware of the boards. After it has
been replaced, the machines seem to run without problems.

- Felix
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From rcferri at us.ibm.com  Mon Mar 19 09:23:50 2001
From: rcferri at us.ibm.com (Richard C Ferri)
Date: Mon, 19 Mar 2001 12:23:50 -0500
Subject: Huinalu Linux SuperCluster
Message-ID: <OF72AF5161.4CFEC944-ON85256A14.005D6D40@pok.ibm.com>

Mark,
     Yes, IBM is actually selling preconfigured beowulf clusters.   You can
search under IBM Solution Series for Linux Clusters for more info.

Here is the press release I dug up from
http://www.nwfusion.com/archive/2000/104965_08-21-2000.html?nf

IBM announced a Linux cluster hardware/software
                 package for high-availability Linux systems. IBM's
                 Solution Series for Linux Clusters includes Netfinity
                 servers running IBM's Linux Utility for Clusters,
                 software that controls multiple servers as one logical
                 node. The cluster package includes high-speed server
                 interfaces from Myricom and Ethernet switches from
                 Extreme Networks for connecting the cluster to a
                 LAN. The package scales up to 64 nodes and
                 supports Caldera, Red Hat, SuSE and TurboLinux
                 distributions. The IBM clusters are available now and
                 start at $115,000.

I don't see a lot of info as to what exactly is packaged as part of the
solution -- the press release is mostly vapor-speak.

Rich

Richard Ferri
IBM Linux Technology Center
rcferri at us.ibm.com
845.433.7920

Mark Lucas <mlucas at imagelinks.com>@beowulf.org on 03/14/2001 06:00:19 PM

Sent by:  beowulf-admin at beowulf.org


To:   beowulf at beowulf.org
cc:
Subject:  Huinalu Linux SuperCluster


Just came across this:

Huinalu is a 520-processor IBM Netfinity Linux Supercluster. It
consists of 260 nodes, each housing two Pentium III 933 megahertz
processors. Their combined theoretical peak performance is a
staggering 478 billion floating point operations per second
(gigaflops). It is, at the present time, the world's most powerful
Linux Supercluster.

at http://www.mhpcc.edu/doc/huinalu/huinalu-intro.html

Does anyone have any specifics on the hardware cost of this system?
Is IBM selling configured Beowulf clusters?

Thanks in advance.

Mark
--
**********************
Mark R Lucas
Chief Technical Officer
ImageLinks Inc.
4450 W Eau Gallie Blvd
Suite 164
Melbourne Fl 32934

321 253 0011 (work)
321 253 5559 (fax)

mlucas at imagelinks.com
**********************

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From RSchilling at affiliatedhealth.org  Mon Mar 19 09:34:15 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Mon, 19 Mar 2001 09:34:15 -0800
Subject: AMD annoyed...
Message-ID: <51FCCCF0C130D211BE550008C724149EBE111B@mail1.affiliatedhealth.org>

Robert,

It's up to you, but I think the key is that you mention you have no existing
agreement with AMD.  Without a non-disclosure on your work, then AMD has
effectively given you permission to publish whatever you want.  You have to
believe that they are aware of the implications of non-disclosure - they do
this all the time with companies.  If you're concerned, call them to verify
how they feel about it - they should be very open with you.  But, don't make
a decision based on speculation (get an official ruling from the company).

You also have to realize that as a Beowulf pioneer, your work speaks as loud
as AMD's on the subject matter.  Many people might not believe the AMD
marketing machine, but Mr. Brown working in the trenches is believable.

So I don't think your profile will do any harm, IMHO.  If AMD processors are
weak in some areas, but strong in others . . . so be it.  You're work is
vitally important for just that reason - to verify the limits of a given
technology.


Richard Schilling
Web Integration Programmer/Webmaster
phone: 360.856.7129
fax: 360.856.7166
URL: http://www.affiliatedhealth.org


> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> Sent: Saturday, March 17, 2001 3:12 PM
> To: lowther at att.net
> Cc: Beowulf Mailing List
> Subject: Re: AMD annoyed...
> 
> 
> On Sat, 17 Mar 2001 lowther at att.net wrote:
> 
> > "Robert G. Brown" wrote:
> > >
> > >
> > > Bemusedly yours,
> > >
> >
> > I saw the page briefly before it came down.  Would it be 
> 'out of bounds'
> > to give a hint for those waiting whether or not AT THIS MOMENT it is
> > wise for them to wait longer or would they be just as well off going
> > ahead with their projects based on currently available technologies?
> > I'm sure if  you had a glowing report, they wouldn't be 
> upset at all if
> > you were to say something positive based on what you know.  
> After all,
> > those waiting know the performance of a single processor 
> board and just
> > need to know in a price/performance framework whether or 
> not they can
> > get at least, say 190% performance gain before they should consider
> > single board solutions?  Not that anyone should take your silence as
> > anything other than a gentlemanly agreement with AMD.
> 
> I'm less optimistic than you are about what they would or wouldn't be
> annoyed by, but let's try it.  After all, I have no agreement with AMD
> at all -- they haven't even talked to me in person.  I only have heard
> through "channels" that they object to my publishing free 
> advertising in
> a venue rich in large scale and technologically knowledgeable 
> purchasers
> (turnkey companies and end users both) that they couldn't pay 
> an agency
> any money to penetrate and which, if they did, nobody would take
> seriously.
> 
> My summary report would be that folks interested in running CPU bound
> code are (as one might expect) perfectly safe waiting for the dual
> Athlon if its release time and expected price point match their needs.
> I saw nothing at all that would make me hesitate to get it 
> for CPU bound
> code.  For code that is mixed CPU and memory bound code the picture is
> less clear.  Very subjectively it had no major "problems" but OTOH its
> performance curve was, not unreasonably, quite different from Intel
> duals.  I experienced no system instabilities (the test system didn't
> crash even under 100% loads over 20 hour periods), although 
> in three or
> four days that may or may not be significant and there were subsystems
> I didn't test at all.
> 
> If your code is expected to be heavily memory I/O IPC bound -- two
> processors doing a lot of talking to each other on the same system --
> then you might do better with Intel.  Or you might not -- one 
> reason AMD
> is probably worried about the pre-release numbers is that they are
> pre-release, and they may be working on specific subsystems that would
> significantly alter these numbers.  Also, we're talking about a
> performance >>profile<<, which is a very nonlinear function.  Intel
> might be optimal in one region and AMD in another.
> 
> For folks in that situation, I'd say that the preliminary 
> numbers I had
> posted might well convince a lot of people near a neck in the (beta)
> profile to wait, and would definitely convince people who are 
> already in
> productive territory to wait, but since AMD won't let me post the
> figures and numbers, we'll never know who is who...
> 
>    rgb
> 
> -- 
> Robert G. Brown	                       
http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Mon Mar 19 10:10:13 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 19 Mar 2001 13:10:13 -0500 (EST)
Subject: AMD annoyed...
In-Reply-To: <51FCCCF0C130D211BE550008C724149EBE111B@mail1.affiliatedhealth.org>
Message-ID: <Pine.LNX.4.30.0103191256480.12601-100000@ganesh.phy.duke.edu>

On Mon, 19 Mar 2001, Schilling, Richard wrote:

> Robert,
>
> It's up to you, but I think the key is that you mention you have no existing
> agreement with AMD.  Without a non-disclosure on your work, then AMD has
> effectively given you permission to publish whatever you want.  You have to
> believe that they are aware of the implications of non-disclosure - they do
> this all the time with companies.  If you're concerned, call them to verify
> how they feel about it - they should be very open with you.  But, don't make
> a decision based on speculation (get an official ruling from the company).

Eventually I will, but I suspect that the company that gave me the
account as a courtesy (ASL) had a NDA with AMD and that their
non-publication restriction was not communicated between different
management levels of the company (and hence not to me -- if they had but
told me I would have been happy enough to comply anyway if that was a
condition of getting the account). I agree that AMD has no legal
recourse against me if I choose to publish at this point, but they might
well choose to punish the company that gave me the account.

Any old guys remember the National Lampoon cover:  "Buy this magazine or
we'll shoot this dog"?  Well, I have no particular desire for ASL to get
shot for being nice enough to give me an account.  Besides, I wasn't
finished -- some of the tests I was working on were producing the most
"interesting" results.  I'm not comfortable publishing these results
anyway without more work.

I have various things to do this week (including teach, as Duke has
started up again post-spring break).  One of them is to post the sources
used in the benchmarking for comment -- I've just started to add a link
to source packages into the

  http://www.phy.duke.edu/brahma/dual_athlon

site both as a resource and as an RFC.  When this task is completed I'll
try contacting AMD directly so they can look over the stuff I was
running (and any comments).  They can then run the tests themselves to
help them decide when it is "safe" to release me.  In the meantime, my
time isn't totally being wasted.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From bryce at redhat.com  Fri Mar 16 07:37:57 2001
From: bryce at redhat.com ('Bryce')
Date: Fri, 16 Mar 2001 10:37:57 -0500
Subject: using graphics cards as generic FLOP crunchers
References: <200103161421.JAA13685@tamarack.cs.mtu.edu>
Message-ID: <3AB23355.F9EA8203@redhat.com>

John Duff wrote:

> Hello,
>
> There are groups at Stanford (WireGL) and Princeton who have done work on
> parallel graphics on PC clusters.  They put a high-end PC graphics card (such
> as an NVidia card) in each slave node of a cluster, and then parallelize the
> rendering of 3D scenes across the cluster, taking advantage of the video
> hardware acceleration, and then combine the image either on a big tiled
> projecter or on a single computer's monitor.  This is all well and good,
> but it struck me that when other groups at these universities who have no
> interest in graphics use the same cluster, all that computing horsepower
> in the GPUs on the graphics cards just sits idle.  Would it be possible
> to write some sort of thin wrapper API over OpenGL that heavy-duty
> number-crunching parallel apps could use to offload some of the FLOPs from
> the main cpu(s) on each slave node to the gpu(s) on the graphics card?
> It would seem pretty obvious that the main cpu(s) would always be faster
> for generic FLOP computations, so I would think only specific apps might
> benefit from the extra cycles of the gpu(s).  Of course, the synchronization
> issues might be too much of a pain to deal with in the end as well.  Has
> anyone heard of someone trying this, or know of any showstopper issues?
>
> Thanks,
> John
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Just a tidied up bit of Irc log that you can mull over

Phil
=--=

 Bx notes the beowulf geeks are getting seriously freaky, they're searching for a way to use the GFU's on high
end video cards to contribute to the processing power of the main FPU
* Bx backs away from these guys
<kx> bx, they're blowing smoke.  you can't do that
<rx> Bx: again ?
<wx> bx: that's not too insane
<rx> kx: I guess it depends on what you're trying to do
<kx> wx, er, it is quite insane.  they can do the calculations but there's no way to get the results out.
<rx> kx: for video rendering it would make sense, I guess ;)
<rx> kx: read back from that frame buffer you've got memory mapped ?
<sx> kx: Generate two textures the size of the screen.  Map them to the display using a multi-texture operation
with an alpha-blend operator between the two of them.
<kx> rx, that would be unworkable;  you'd have to scan the entire framebuffer for the pixel that is the result of

your calculation
<sx> kx: Suddenly you get something that looks suspiciously like a vector multiply.
<kx> sx, hm, possibly
<wx> kx: most fbs let you readl/writel to arbitrary locations
<kx> I still think it's impractical
<sx> kx: No, you can randomly read pixels
<kx> hm, you could do colourspace conversion quickly too
<kx> it'd be a neat hack but I suspect you'd be better off buying a faster CPU
<sx> kx: Not on all cards
<sx> kx: Some of the cards do YUV on demand through overlay.
<sx> kx: The Voodoo3 will do either overlay or texture; with texture conversion you can get the data back.  With
overlay conversion, you can't.
<kx> sx, either way I suspect you'd have to make your code highly dependant on gfx chipset;  and the rate they
are iterating right now it'd be a wasted effort
<sx> kx: A lot of the common multitexture blending modes are pretty standardised, and arithmetically useful.  But

only 32 bit fixed point.
<sx> kx: You could even do things like fast array additions by repeatedly mapping a texture down to half its size

with bilinear filtering.
<kx> sx, still, I suspect that using a faster CPU will be easier and cheaper


From sean at emphasisnetworks.com  Mon Mar 19 02:33:37 2001
From: sean at emphasisnetworks.com (Sean)
Date: Mon, 19 Mar 2001 18:33:37 +0800
Subject: beowulf class cluster??
Message-ID: <001d01c0b060$aa6ea1e0$8a01a8c0@emphasisnetworks.com>

Hi all.

I'm a newbie in cluster field.
In six weeks trying in the dark , I've built PVM3.4 , MOSIX and some applications on them like pvmpov.
I usually saw the words "beowulf class cluster".
I was wondering what kind of cluster is beowulf class....
Can I say MOSIX is beowulf cluster? why can't?

Any response is appreciated.
Sean.


From patrick at myri.com  Thu Mar 15 19:53:43 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Thu, 15 Mar 2001 22:53:43 -0500
Subject: Huinalu Linux SuperCluster
References: <200103152339.QAA00356@dogbert.mp.sandia.gov>
Message-ID: <3AB18E47.A2EA16B7@myri.com>

Ron Brightwell wrote:

> > If you figure out that you have too many machines, a lot of people would
> > be very happy to help you :-)

> we need help. (This probably isn't the right forum for recruiting, but send
> your resumes to jobs at cs.sandia.gov if you would like to join us.)

I was more thinking about taking away some of your extra machines
:-))))

-- 
Patrick Geoffray

---------------------------------------------------------------
|      Myricom Inc       |  University of Tennessee - CS Dept |
| 325 N Santa Anita Ave. |   Suite 203, 1122 Volunteer Blvd.  |
|   Arcadia, CA 91006    |      Knoxville, TN 37996-3450      |
|     (626) 821-5555     |      Tel/Fax : (865) 974-0482      |
---------------------------------------------------------------


From charr at lnxi.com  Fri Mar 16 11:15:59 2001
From: charr at lnxi.com (Cameron Harr)
Date: Fri, 16 Mar 2001 12:15:59 -0700
Subject: problems compiling mpich-1.2.1 on Linux PC
References: <3AB3B748.1E1AFB93@me.lsu.edu>
Message-ID: <3AB2666F.7D47ADCE@lnxi.com>

An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010316/4a733ac2/attachment.html>

From leo.magallon at grantgeo.com  Thu Mar 15 09:07:43 2001
From: leo.magallon at grantgeo.com (Leonardo Magallon)
Date: Thu, 15 Mar 2001 11:07:43 -0600
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
References: <Pine.LNX.4.10.10010061355570.9348-100000@vaio.greennet>
Message-ID: <3AB0F6DF.9341A27E@grantgeo.com>

Hi all,


   We finally finished upgrading our beowulf from 48 to 108 processors and also
added a 523GB RAID-5 system to provide a mounting point for all of our
"drones".  We went with standard metal shelves that cost about $40 installed.
Our setup has one machine with the attached RAID Array to it via a 39160 Adaptec
Card ( 160Mb/s transfer rate) at which we launch jobs.  We export /home and
/array ( the disk array mount point) from this computer to all the other
machines.  They then use /home to execute the app and /array to read and write
over nfs to the array.
  This computer with the array attached to it talks over a syskonnect gig-e card
going directly to a port on a switch which then interconnects to others.  The
"drones" are connected via Intel Ether Express cards running Fast Ethernet to
the switches.
   Our problem is that apparently this setup is not performing well and we seem
to have a bottleneck either at the Array or at the network level.  In regards to
the network level I have changed the numbers nfs uses to pass blocks of info in
this way:

echo 262144 > /proc/sys/net/core/rmem_default
echo 262144 > /proc/sys/net/core/rmem_max
/etc/rc.d/init.d/nfs restart
echo 65536 > /proc/sys/net/core/rmem_default
echo 65536 > /proc/sys/net/core/rmem_max

Our mounts are set to use 8192 as read and write block size also.

When we start our job here, the switch passes no more than 31mb/s at any moment.

A colleague of mine is saying that the problem is at the network level and I am
thinking that it is at the Array level because the lights on the array just keep
steadily on and the switch is not even at 25% utilization and attaching a
console to the array is mainly for setting up drives and not for monitoring.

My colleague also copied 175Megabytes over nfs from one computer to another and
the transfers took close to 45 seconds.


Any comments or suggestions welcomed,

Leo.


From patrick at myri.com  Thu Mar 15 09:35:40 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Thu, 15 Mar 2001 12:35:40 -0500
Subject: Huinalu Linux SuperCluster
References: <p04320406b6d5a80d717e@[208.24.120.15]>
Message-ID: <3AB0FD6C.55CB9BDC@myri.com>

Mark Lucas wrote:

> at http://www.mhpcc.edu/doc/huinalu/huinalu-intro.html
> 
> Does anyone have any specifics on the hardware cost of this system?
> Is IBM selling configured Beowulf clusters?

Hi Mark,

The total price is not public but it's "under $ 10 million"
(http://www.computerworld.com/cwi/story/0,1199,NAV47_STO58037,00.html).

It's the third large IBM Linux cluster with Myrinet after New Mexico and
NCSA at Urbana-Champaign, IL. IBM seems to be very comfortable with this
type of product (evolution of the SP market). However, I do not think
they are selling small scale Beowulf cluster.

Regards.

-- 
Patrick Geoffray

---------------------------------------------------------------
|      Myricom Inc       |  University of Tennessee - CS Dept |
| 325 N Santa Anita Ave. |   Suite 203, 1122 Volunteer Blvd.  |
|   Arcadia, CA 91006    |      Knoxville, TN 37996-3450      |
|     (626) 821-5555     |      Tel/Fax : (865) 974-0482      |
---------------------------------------------------------------


From patrick at myri.com  Thu Mar 15 11:32:30 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Thu, 15 Mar 2001 14:32:30 -0500
Subject: Huinalu Linux SuperCluster
References: <200103151913.MAA32264@dogbert.mp.sandia.gov>
Message-ID: <3AB118CE.6C23F8BE@myri.com>

Ron Brightwell wrote:
>
> > The last NCSA Linux cluster (Urbana-Champaign, IL) provides 512
> > dual PIII 1GHz, so a theoritical peak of 1 TFLOPS :
> > http://access.ncsa.uiuc.edu/Headlines/01Headlines/010116.IBM.html
> 
> I didn't think that machine had been deployed yet, since the above press
> release says it will be installed in the Summer.  I restricted the Antarctica

The install was in progress last week during the Open Cluster Group
workshop on OSCAR at Champaign, I guess it should be up and running by
now. 

> number to what we currently have up and running as a parallel machine.
> There are another 400+ 466 MHz Alphas sitting next those 1024 nodes that
> will be integrated in the next few weeks.

My dream... 
How do you do to get all of these toys at Sandia ? (blackmail some
politicians ?) 
If you figure out that you have too many machines, a lot of people would
be very happy to help you :-)

How many nodes in Cplant these days (total) ?

-- 
Patrick Geoffray

---------------------------------------------------------------
|      Myricom Inc       |  University of Tennessee - CS Dept |
| 325 N Santa Anita Ave. |   Suite 203, 1122 Volunteer Blvd.  |
|   Arcadia, CA 91006    |      Knoxville, TN 37996-3450      |
|     (626) 821-5555     |      Tel/Fax : (865) 974-0482      |
---------------------------------------------------------------


From Daniel.S.Katz at jpl.nasa.gov  Thu Mar 15 16:08:47 2001
From: Daniel.S.Katz at jpl.nasa.gov (Daniel S. Katz)
Date: Thu, 15 Mar 2001 16:08:47 -0800
Subject: cluster2001 deadline approaching
Message-ID: <3AB1598F.C6E907AB@jpl.nasa.gov>

Hi,

If you need more time, please let me know.

Dan


                                     Call for Papers 

                Third IEEE International Conference on Cluster Computing 
                                                             

                    Sutton Place Hotel, Newport Beach, California, USA 
                                      Oct. 8-11, 2001 

                   Sponsored by the IEEE Computer Society, through the 
                           Task Force on Cluster Computing (TFCC) 

   Organized by the University of Southern California, University of California
at Irvine, and California Institute of Technology 


                               Call  For Participation

                                   (1-page Call for paper in PDF)

The rapid emergence of COTS Cluster Computing as a major strategy for delivering
high performance to technical and commercial applications is driven by the
superior cost effectiveness and flexibility achievable through ensembles of PCs,
workstations, and servers. Cluster computing, such as Beowulf class, SMP
clusters, ASCI machines, and metacomputing grids, is redefining the manner in
which parallel and distributed computing is being accomplished today and is the
focus of important research in hardware, software, and application development. 

The Third IEEE International Conference on Cluster Computing will be held in the
beautiful Pacific coastal city, Newport Beach in Southern California, from
October 8 to 11, 2001.  For the first time, the Cluster 2001 merges four popular
professional conferences or workshops: IWCC, PC-NOW, CCC, JPC and German  CC
into an integrated, large-scale, international forum to be held in Northern
America. The conference series was previously held in Australia (1999) and
Germany (2000). For details and updated information, visit the Cluster 2001
official web site:
http://andy.usc.edu/cluster2001/. The conference series information can be found
at: http://www.clustercomp.org/. 

We encourage submission of high quality papers reporting original work in
theoretical, experimental, and industrial research and development in the
following topics, which are not exclusive to cluster architecture, software,
protocols, and applications : 

      Hardware Technology for Clustering 
      High-speed System Interconnects 
      Light Weight Communication Protocols 
      Fast Message Passing Libraries 
      Single System Image Services 
      File Systems and Distributed RAID 
      Internet Security and Reliability 
      Cluster Job and Resource Management 
      Data Distribution and Load Balancing 
      Tools for Operating and Managing Clusters 
      Cluster Middleware, Groupware, and Infoware 
      Highly Available Cluster Solutions 
      Problem Solving Environments for Cluster 
      Scientific and E-Commerce Applications 
      Collaborative Work and Multimedia Clusters 
      Performance Evaluation and Modeling 
      Clusters of Clusters/Computational Grids 
      Software Tools for Metacomputing 
      Novel Cluster Systems Architectures 
      Network-based Distributed Computing 
      Mobile Agents and Java for Cluster Computing 
      Massively Parallel Processing 
      Software Environments for Clusters 
      Clusters for Bioinformatics 
      Innovative Cluster Applications 

Paper Submission

The review process will be based on papers not exceeding 6000 words on at most
20 pages. Deadline for Web-based electronic submission is March 19, 2001 in
Postscript (*.ps) or Adobe Acrobat v3.0 (*.pdf) format. The submitted file must
be viewable with Aladdin GhostScript 5.10 and printable on a standard PostScript
Laser printer. No constraint on the format of the submitted draft except the
length. However, double-space format is encouraged, in order to provide the
referees the convenience of marking comments and corrections on the paper copy.
The web site for the paper submission is:
http://www.cacr.caltech.edu/cluster2001/papers 

Proceedings

The proceedings of CLUSTER 2001 will be published by IEEE Computer Society. The
proceedings will also be made available online through the IEEE digital library
following the conference. 

Panels/Tutorials/Exhibitions

Proposals are solicited for special topics and panel sessions. These proposals
must be submitted to the Program Chair: Thomas Sterling. Proposals for a
half-day or a full-day tutorial related to the conference topics are encouraged
and submit the same to tutorial chair: Ira Pramanick. For exhibitions, contact
exbition chair: Rawn Shah. 

Conference Organization

General Chairs:

          ?   Kai Hwang  (University of Southern California, USA) 
          ?   Mark Baker  (Portsmouth University, UK) 

Vice General Chairs:

          ?   Rick Stevens  (Argonne National Laboratory, USA) 
          ?   Nalini Venkatasubramanian  (University of California at Irvine,
USA) 

Steering Committee:

          ?   Mark Baker    (University of Portsmouth, UK) 
          ?   Pete Beckman    (Turbolinux, Inc., USA) 
          ?   Bill Blake    (Compaq, USA) 
          ?   Rajkumar Buyya    (Monash University, Australia) 
          ?   Giovanni Chiola    (DISI - Universita di Genova, Italy) 
          ?   Jack Dongarra    (University of Tennessee and ORNL, USA) 
          ?   Geoffrey Fox    (NPAC, Syracuse, USA) 
          ?   Al Geist    (ORNL, USA) 
          ?   Kai Hwang    (University of Southern California, USA) 
          ?   Rusty Lusk    (Argonne National Laboratory, USA) 
          ?   Paul Messina    (Caltech, USA) 
          ?   Greg Pfister    (IBM, Advanced Technology & Architecture, Server
Design, USA) 
          ?   Wolfgang Rehm    (Technische Universit?t Chemnitz, Germany) 
          ?   Thomas Sterling    (JPL and Caltech, USA) 
          ?   Rick Stevens    (Argonne National Laboratory, USA) 
          ?   Thomas Stricker    (ETH Z?rich, Switzerland) 
          ?   Barry Wilkinson    (UNCC, USA) 
              

Technical Program Chair: Thomas Sterling  (Caltech & NASA JPL, USA)

Deputy Program Chair: Daniel S. Katz  (NASA JPL, USA)

Vice Program Chairs:

          ?   Gordon Bell  (Microsoft Research, USA) 
          ?   Dave Culler  (University of California, Berkeley, USA) 
          ?   Jack Dongarra  (University of Tennessee, USA) 
          ?   Jim Gray  (Microsoft, USA) 
          ?   Bill Gropp  (Argonne National Laboratory, USA) 
          ?   Ken Kennedy  (Rice University, USA) 
          ?   Dan Reed  (UIUC, USA) 
          ?   Chuck Seitz  (Myricom  Inc., USA) 
          ?   Burton Smith  (Cray Inc., USA) 

Program Committee

Tutorial Chair: Ira Pramanick  (Sun Microsystems, USA)

Publications/Proceedings Co-Chairs: 

            Marcin Paprzycki  (University of Southern Mississippi, USA) 
            Rajkumar Buyya (Monash University, Australia) 

Exhibition Chair: Rawn Shah (Sun World Journal, USA) 

Publicity Chair: Hai Jin (Huazhong University of Science and Technology,
China)

Poster Chair:  Phil Merkey (Michigan Technical University, USA)

Conference Venue: 
             Sutton Place Hotel 
             4500 MacArthur Blvd. 
             Newport Beach, California, 92660 
             USA 
             Tel:  949-476-2001 
             Fax:  949-250-7191
Important Deadlines: 
  
           Paper Submission                     March 19, 2001 (Extended)
           Notification of Acceptance           June 18, 2001
           Camera Ready Papers                  July 9, 2001
           Early Registration                   August 31, 2001
           Tutorial/Exhibition/Panel Proposals  June 11, 2001


Cluster2001 is in cooperation  with the IEEE TC on Distributed Processing, IEEE
TC on Parallel Processing, ACM SIG on Computer Architecture, Univ. of 
Portmouth, UK, Univ. of California, Berkeley, Rice Univ., Univ. of Illinois,
Urbana-Champaaign, Univ. of Tennessee, Monash Univ., Australia,Technical
University of Chemnitz, Germany,  Huazhong University of Science and Technology,
China, Argonne National Lab., NASA Jet Propulsion Lab., National Center for
High-Performance Computing, Taiwan, Sun Microsystems, Cray, Compaq, IBM,
Microsoft, and Myricom, etc. 


-- 
          Daniel S. Katz                Daniel.S.Katz at jpl.nasa.gov
    Jet Propulsion Laboratory               or d.katz at ieee.org
California Institute of Technology        (818) 354-7359 (voice)
        Mail Stop 168-522                  (818) 393-3134 (fax)
       4800 Oak Grove Drive         http://www-hpc.jpl.nasa.gov/PEP/dsk/
     Pasadena, CA  91109-8099


From joe.griffin at mscsoftware.com  Wed Mar 14 07:41:16 2001
From: joe.griffin at mscsoftware.com (Joe Griffin)
Date: Wed, 14 Mar 2001 07:41:16 -0800
Subject: Fortran 90 and BeoMPI
References: <Pine.LNX.4.30.0103141617250.21442-100000@kenzo.iwr.uni-heidelberg.de>
Message-ID: <3AAF911C.7A2B9E3@macsch.com>

If I may add my 2 cents:

I like using mpifxx.  However, I REALLY disdain the fact
that vendors put their scripts in /usr/bin.  mpirun is
put in /usr/bin by several vendors.  I would rather the
vendors put files in /usr/local/PRODUCT. 

We have had mpich, mpipro, and LAM/MPI all on the
same platform, and things really get confusing.
Trying to determine what file is in /usr/lib/libmpipro.a
is not a good use of time.

As far as IA64 goes.  I think setting parameters such as:

setenv IA64ROOT        /usr/local/compiler50/ia64
setenv GCCROOT         /usr/local/compiler50/ia64

Should be done my the efc and ecc wrapper, and not
by the .csrhc and/or .profile files.

Regards,
Joe Griffin

Bogdan Costescu wrote:
> 
> On Wed, 14 Mar 2001, Daniel Ridge wrote:
> 
> > ... I would suggest that people who would be in a
> > position to require per-platform wranglings for MPI often also need to
> > perform similar wranglings to accomodate differences in the C library
> > or in the Fortran environment. MPI compiler wrappers hardly seem like
> > the right place to accomodate these per-platform differences.
> 
> I'd like to disagree 8-)
> I have here 2 clusters, one running jobs on top of LAM-MPI and the other
> running jobs on top of MPICH on top of SCore. By using mpifxx, I'm able to
> compile things without knowing where the include and lib files are and
> even more without knowing which is the right order of linking the libs;
> just try to do this by hand for LAM-MPI for example!
> Another example is the 64 bit platforms where you can compile for both 32
> and 64 bit by specifying a simple flag like -32 or -64. Doing linking by
> hand means that _I_ have to choose the libraries and I might choose the
> wrong one(s) ! Sometimes this is noticed by the linker, but not always...
> 
> > ./configure seems much more palatable than a per-library compiler
> > wrappers. I've seen a number of apps that are MPI enabled and which
> > supply a configure script which work just fine without using the
> > compiler wrappers.
> 
> Of course, if every software package would be using ./configure,
> everything would be easy. Try to convince the maintainers of big Fortran
> packages like Gaussian or CHARMM to switch to ./configure 8-(
> 
> Sincerely,
> 
> Bogdan Costescu
> 
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gerry at cs.tamu.edu  Sun Mar 11 19:18:47 2001
From: gerry at cs.tamu.edu (Gerry Creager n5jxs)
Date: Sun, 11 Mar 2001 21:18:47 -0600
Subject: Real Time
References: <4.3.2.20010306132041.00b87390@pop.pssclabs.com> <20010306174447.A6421@wumpus> <001601c0a6a8$0b704ec0$04a8a8c0@office1> <3AABF6CF.40F120ED@lrz.uni-muenchen.de>
Message-ID: <3AAC4017.4CE0B709@cs.tamu.edu>

Eugene.Leitl at lrz.uni-muenchen.de wrote:
> 
> Jim Lux wrote:
> >
> > While not a beowulf, I am currently working on a very hard real time (<1
> > microsecond) system (a radar) using an MPI like interprocessor interface
> > between DSPs.  It is entirely possible to have hard real time systems with
> > nondeterministic communications.
> 
> Can you tell us more? (preferably, without having to kill us
> afterwards, of course).

Somehow, this sounds like SSAR data reduction technology...  Jim, can I
get some of the Texas data early if I help?
--
Gerry Creager -- gerry at cs.tamu.edu
Network Engineering			|Research focusing on
Academy for Advanced Telecommunications |Satellite Geodesy and 
and Learning Technologies		|Geodetic Control
Texas A&M University	979.458.4020  (Phone) -- 979.847.8578  (Fax)


From sam at venturatech.com  Mon Mar 12 23:05:41 2001
From: sam at venturatech.com (Sam Lewis)
Date: Tue, 13 Mar 2001 07:05:41 +0000
Subject: Auto-responder
In-Reply-To: <3AAD4042.487C7928@lrz.uni-muenchen.de>
Message-ID: <NFBBLAPJAKNEOJDJLOJECELNCDAA.sam@venturatech.com>

My apologies for the "auto-responder hell" that my mail server created in
the past few days.  My IS department has rectified the problem, so that this
will not happen again.  To be on the safe side, I've unsubscribed for the
foreseeable future.

Sam Lewis

-----Original Message-----
From: Eugene.Leitl at lrz.uni-muenchen.de
[mailto:Eugene.Leitl at lrz.uni-muenchen.de]
Sent: Monday, March 12, 2001 9:32 PM
To: Andreas Boklund
Cc: beowulf at beowulf.org; sam at venturatech.com
Subject: Re: Beowulf digest, Vol 1 #315 - 9 msgs


Andreas Boklund wrote:

> I have already started to filter him out so i hope he wont send enything
> usefull to this list in the future :(

SOP should be instant unsubscription.

Actually, CHEMINF-L is way worse. Here you can get ~10 out of office
autoreplies to a post, some of them make it to the list. Unfortunately,
there's no license to mail.


From drh at niptron.com  Mon Mar 12 07:19:13 2001
From: drh at niptron.com (D. R. Holsbeck)
Date: Mon, 12 Mar 2001 09:19:13 -0600
Subject: sk98lin gigabit driver
References: <Pine.LNX.4.30.0103110957200.18061-100000@loki.chpc.utah.edu>
Message-ID: <3AACE8F1.4F851C2C@niptron.com>

Thanx for the info. We did get it working, after a while. It
does seem that most of the issue was the auto-negotiation. 
Unfortunatley I only had a chezzy unmanageable switch to use.

-- 
drh at niptron.com

"Necessity is the mother of taking chances."
--Mark Twain.


From charr at lnxi.com  Tue Mar 13 18:23:09 2001
From: charr at lnxi.com (Cameron Harr)
Date: Tue, 13 Mar 2001 19:23:09 -0700
Subject: Dual Athlon
References: <Pine.LNX.4.30.0103132003080.12236-100000@lucifer.rgb.private.net>
Message-ID: <3AAED60D.7FF61E3A@lnxi.com>

you can try ssh -o "Protocol 1" <host> and that will work for a v 2.x
client accessing a 1.x server.

"Robert G. Brown" wrote:
> 
> On Tue, 13 Mar 2001, David Vos wrote:
> 
> > I've had combatibility problems between OpenSSH and ssh.com's
> > implementation.  I had two linux boxen that could telnet back and forth,
> > but could not ssh.  I put ssh.com's on both and the problem went away.
> 
> I've experienced similar things in the past, but ssh -v indicates:
> 
> debug: Remote protocol version 1.99, remote software version OpenSSH_2.3.0p1
> debug: no match: OpenSSH_2.3.0p1
> Enabling compatibility mode for protocol 2.0
> debug: Local version string SSH-2.0-OpenSSH_2.3.0p1
> 
> which suggests that they are using OpenSSH also, albeit a slightly
> earlier revision.  The rest of the verbose handshaking proceeds
> perfectly up to password entry:
> 
> debug: authentications that can continue: publickey,password
> debug: next auth method to try is publickey
> debug: next auth method to try is password
> rgb at dual's password:
> debug: authentications that can continue: publickey,password
> debug: next auth method to try is password
> Permission denied, please try again.
> rgb at dual's password:
> rgb at lucifer|T:113>
> 
> (where I've tried typing my password and the password for the other
> account they tried to roll for me maybe fifty times by now -- it is
> impossible that I'm mistyping).  I'm pretty well stuck at this point
> until they unstick me.  I'd get exactly the same "Permission denied"
> message if the login fails because my account doesn't really exist and
> I'm warped into NOUSER or if there really is a Failed password or if the
> account exists but has e.g. a bad shell or bad /etc/passwd file entry.
> 
> I can debug this sort of thing in five minutes on my own system, but I'm
> at their mercy on theirs.  So far today, the guy I wrote to suggest a
> few simple tests (like him trying to login and/or ssh to my account with
> the same password they gave me) hasn't responded at all.  I'll give them
> until tomorrow and then I'll try escalating a bit.
> 
>    rgb
> 
> >
> > David
> >
> > On Tue, 13 Mar 2001, Robert G. Brown wrote:
> >
> > > On Tue, 13 Mar 2001, Mofeed Shahin wrote:
> > >
> > > > So Robert, when are you going to let us know the results of the Dual Athlon ?
> > > > :-)
> > > >
> > > > Mof.
> > >
> > > They got my account setup yesterday, but for some reason I'm having a
> > > hard time connecting via ssh (it's rejecting my password).  We've tried
> > > both a password they sent me and an MD5 crypt I sent them. Very strange
> > > -- I use OpenSSH routinely to connect all over the place so I'm
> > > reasonably sure my client is OK.  Anyway, I expect it is something
> > > trivial and that I'll get in sometime this morning.  I spent the time
> > > yesterday that I couldn't get in profitably anyway packaging stream and
> > > a benchmark sent to me by Thomas Guignol of the list up into make-ready
> > > tarball/RPM's.  At the moment my list looks something like:
> > >
> > > stream
> > > guignol
> > > cpu-rate
> > > lmbench (ass'td)
> > > LAM/MPI plus two benchmarks (Josip and Doug each suggested one)
> > > EPCC OpenMP microbenchmarks (probably with PGI)
> > > possibly some fft timings (Martin Seigert)
> > >
> > > in roughly that order, depending on how much time I get and how well
> > > things go.  I'm going to TRY to build a page with all the tests I used
> > > in tarball/rpm form, results, and commentary.
> > >
> > >    rgb
> > >
> > > --
> > > Robert G. Brown                            http://www.phy.duke.edu/~rgb/
> > > Duke University Dept. of Physics, Box 90305
> > > Durham, N.C. 27708-0305
> > > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> >
> 
> --
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Cameron Harr
Applications Engineer
Linux NetworX Inc.
http://www.linuxnetworx.com


From Sven.Hiller at brr.de  Tue Mar 13 06:21:05 2001
From: Sven.Hiller at brr.de (Sven Hiller)
Date: Tue, 13 Mar 2001 15:21:05 +0100
Subject: CeBit
Message-ID: <3AAE2CD1.89867928@brr.de>

Is anybody form the commercial Beowulf vendors presenting something at
the CeBIT in Hannover?

--
-----------------------------------
Dr. Sven Hiller
Turbine Aerodynamics/CFD
Rolls-Royce Deutschland Ltd & Co KG
Eschenweg 11
D-15827 Dahlewitz/Germany
Tel: +49-33708-6-1142
Fax: +49-33708-6-3292
e-mail: sven.hiller at rolls-royce.com
-----------------------------------


From jakob at unthought.net  Mon Mar 19 11:21:06 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Mon, 19 Mar 2001 20:21:06 +0100
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
In-Reply-To: <3AB0F6DF.9341A27E@grantgeo.com>; from leo.magallon@grantgeo.com on Thu, Mar 15, 2001 at 11:07:43AM -0600
References: <Pine.LNX.4.10.10010061355570.9348-100000@vaio.greennet> <3AB0F6DF.9341A27E@grantgeo.com>
Message-ID: <20010319202106.B31224@unthought.net>

On Thu, Mar 15, 2001 at 11:07:43AM -0600, Leonardo Magallon wrote:
> Hi all,
> 
...
> 
> When we start our job here, the switch passes no more than 31mb/s at any moment.

What  do you get from a standard local Bonnie benchmark, or a dd if=hugefile
of=/dev/null  ?

Before guessing whether the low performance is due to the network or the disks,
you should try to benchmark what your disks can actually deliver...

Do a netperf on the network as well, so you know what you can actually push
thru your gig-e.


> 
> A colleague of mine is saying that the problem is at the network level and I am
> thinking that it is at the Array level because the lights on the array just keep
> steadily on and the switch is not even at 25% utilization and attaching a
> console to the array is mainly for setting up drives and not for monitoring.

I would suspect the array too.  But I'm interested in hearing what your testing
shows   :)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:


From jimlux at jpl.nasa.gov  Sun Mar 11 18:16:48 2001
From: jimlux at jpl.nasa.gov (Jim Lux)
Date: Sun, 11 Mar 2001 18:16:48 -0800
Subject: Real Time
References: <4.3.2.20010306132041.00b87390@pop.pssclabs.com> <20010306174447.A6421@wumpus> <001601c0a6a8$0b704ec0$04a8a8c0@office1> <3AABF6CF.40F120ED@lrz.uni-muenchen.de>
Message-ID: <003b01c0aa9a$7f8238a0$04a8a8c0@office1>

We've got a couple of papers in the works, one of which is being presented
in a few hours later tonight at the IEEE 2001 Aerospace Conference in Big
Sky, MT.

We're building a breadboard demonstration of a "general purpose DSP" based
orbiting scatterometer to be used to measure winds over the ocean. A
scatterometer is basically a time domain reflectometer or radar that
measures the backscatter (at Ku-band, 13.4 GHz) from the ocean's surface to
an accuracy of better than 0.1 dB. I hesitate to call it a radar, because
unlike a radar, we know where the target is, and how far away it is, so
there's not much "detection" or "ranging" involved.  More on the general
scatterometry program at http://winds.jpl.nasa.gov/.  The unique (and
painful) thing about this is maintaining the accuracy of the backscatter
cross section measurement.

We're using a set of 3 space qualified hybrids from Astrium (formerly Matra
Marconi Space) which integrate a ADSP21020 clone, memory, and peripherals
into a space qualified rad tolerant package that is functionally much like a
SHARC.  The processors communicate by using a high speed (150 Mbps) serial
link called SpaceWire (http://www.estec.esa.nl/tech/spacewire/) implemented
in an ASIC developed by Dornier Satelliten Systeme which provides 3 ports (+
a port to the processor) as well as "wormhole" routing.

The architecture is basically a master/slaves scheme, with a master DSP
controlling the transmit functions and also assigning one of many receiver
DSPs in a "round robin" sort of scheme to process the echoes (the processing
time for one echo is somewhat greater than the interpulse interval).   In
this breadboard, all the timing is done by the DSP directly without using
any "glue logic" in an external FPGA, including generating sampling clocks,
transmitter and receiver gates, etc. We've implemented a message passing
system to communicate between the processors with a C-language API based on
MPI-like framework (i.e. Send_Message, Receive Message, etc.) (Actually, I
edited the MPI include file to create the API definition..)

Spacewire is fast and low latency, but, especially with wormhole routing,
not deterministic in terms of timing, and furthermore, we need
submicrosecond timing accuracy to meet the echo range resolution
requirement. We have come up with a fairly simple (and clever, if <ahem> I
do say so myself) way to do this, which is the subject of another paper,
currently wending its way through JPL's document review process so that we
don't run afoul of ITAR. (It's not having to kill you, Eugene, that I worry
about, it's me going to prison....).  Any day now, we'll be able to release
all the details.

So.. it's not really a Beowulf (no OS, not really a commodity processor
(nothing for space is commodity..)), but, certainly, the techniques we are
using could be used, especially with the help of a bit of external timing
hardware to make up for the non-deterministic timing of the processors in
your basic PC.  DSP processors are really nice that way... the same sequence
of instructions will execute in the same length of time, every time.
Certainly, getting timing down to sub milliseconds is a real possibility
with PC hardware and Linux.


----- Original Message -----
From: <Eugene.Leitl at lrz.uni-muenchen.de>
To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>
Cc: "Greg Lindahl" <lindahl at conservativecomputer.com>; <beowulf at beowulf.org>
Sent: Sunday, March 11, 2001 2:06 PM
Subject: Re: Real Time


> Jim Lux wrote:
> >
> > While not a beowulf, I am currently working on a very hard real time (<1
> > microsecond) system (a radar) using an MPI like interprocessor interface
> > between DSPs.  It is entirely possible to have hard real time systems
with
> > nondeterministic communications.
>
> Can you tell us more? (preferably, without having to kill us
> afterwards, of course).
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


From whitney at math.berkeley.edu  Sun Mar 11 12:51:26 2001
From: whitney at math.berkeley.edu (Wayne Whitney)
Date: Sun, 11 Mar 2001 12:51:26 -0800 (PST)
Subject: 8 DIMM Slot PIII/Athlon Motherboard ?
In-Reply-To: <001e01c0aa19$99576600$5df31d97@W2KCECCHI1>
Message-ID: <Pine.LNX.4.30.0103111249370.4563-100000@mf1.private>

On Sun, 11 Mar 2001, Gianluca Cecchi wrote:

> Any pointer to where to buy for these proces true good memories?

Well, I don't know if these prices will get you true good memory, I was
just quoting the lowest www.pricewatch.com prices.  So there are probably
hidden costs, excessive shipping charges and so on.

Cheers, Wayne


From toon at moene.indiv.nluug.nl  Wed Mar 14 12:09:11 2001
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Wed, 14 Mar 2001 21:09:11 +0100
Subject: Fortran 90 and BeoMPI
References: <Pine.LNX.4.21.0103132126290.19033-100000@eleanor.wdhq.scyld.com> <3AB07CAA.507E39BD@me.lsu.edu> <20010314102110.A22059@stikine.ucs.sfu.ca>
Message-ID: <3AAFCFE7.7722DD68@moene.indiv.nluug.nl>

Martin Siegert wrote:

...

> > mpi_heat.o(.data+0x30): undefined reference to `mpi_null_delete_fn_'
> > mpi_heat.o(.data+0x34): undefined reference to `mpi_dup_fn_'

> The problem is g77 and libraries built to work with g77:
> g77 has the "unfortunate" (to put it mildly) property to append two
> underscores to a function name

Well, it doesn't *have* to.  System integrators could just as well build
the MPI libraries with g77 [other-options] -fno-second-underscore ... to
prevent this effect.

Of course, *then* they have to tell g77 users of their systems to _also_
use this option when compiling stuff that needed MPI libs.

-- 
Toon Moene - mailto:toon at moene.indiv.nluug.nl - phoneto: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction)


From patrick at myri.com  Wed Mar 14 19:12:48 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Wed, 14 Mar 2001 22:12:48 -0500
Subject: Huinalu Linux SuperCluster
References: <200103150151.SAA31516@dogbert.mp.sandia.gov>
Message-ID: <3AB03330.BBED66A2@myri.com>

Hi Ron,

Ron Brightwell wrote:

> Actually, no it's not -- at least not for a cluster intended to support
> parallel apps. The Siberia Cplant cluster at Sandia that is currently
> #82 on the top 500 list has a peak theoretical perfomance of 580 GFLOPS.
> It has demonstrated (with the MPLinpack benchmark) 247.6 GFLOPS.  The latest
> Cplant cluster, called Antarctica, has 1024+ 466 MHz Alpha nodes, with a
> peak theoretical performance of more than 954 GFLOPS.

The last NCSA Linux cluster (Urbana-Champaign, IL) provides 512
dual PIII 1GHz, so a theoritical peak of 1 TFLOPS :
http://access.ncsa.uiuc.edu/Headlines/01Headlines/010116.IBM.html

> Keep in mind that peak theoretical performance accurately measures your ability
> to spend money, while MPLinpack performance accurately measures your ability
> to seek pr -- I mean it measures the upper bound on compute performance from
> a parallel app.

Very true (actually, it measures the upper bound on compute
performance of a dense linear algebra double precision
computation, which indeed covers a large set of // apps. There is
a lot of other codes that do not behave like LU, specially for the
ratio computation/communication).

I don't know MPLinpack. Don't you mean HPLinpack ?

Regards.
-- 
Patrick Geoffray

---------------------------------------------------------------
|      Myricom Inc       |  University of Tennessee - CS Dept |
| 325 N Santa Anita Ave. |   Suite 203, 1122 Volunteer Blvd.  |
|   Arcadia, CA 91006    |      Knoxville, TN 37996-3450      |
|     (626) 821-5555     |      Tel/Fax : (865) 974-0482      |
---------------------------------------------------------------


From bargle at umiacs.umd.edu  Wed Mar 14 19:24:59 2001
From: bargle at umiacs.umd.edu (Gary Jackson)
Date: Wed, 14 Mar 2001 22:24:59 -0500
Subject: Switch configuration for channel bonding
Message-ID: <200103150324.WAA00808@leviathan.umiacs.umd.edu>

What's an appropriate switch configuration for channel bonding?

-- 
					Gary Jackson
					bargle at umiacs.umd.edu


From daniel.pfenniger at obs.unige.ch  Thu Mar 15 06:18:34 2001
From: daniel.pfenniger at obs.unige.ch (Pfenniger Daniel)
Date: Thu, 15 Mar 2001 15:18:34 +0100
Subject: Mysterious kernel hangs
References: <Pine.LNX.4.21.0103151424130.24291-100000@maloney.inf.ethz.ch>
Message-ID: <3AB0CF3A.D15F8FF2@obs.unige.ch>

Felix Rauch wrote:
> 
> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
...
If this doesn't depend on kernel nor on communications, I would suggest
a relation with temperature.  Typically on compute intensive task 
temperature can raise by a few degrees and components such as memory 
or processor can stop to work, without any error message!

A further action might be to increase the node cooling. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Dr Daniel Pfenniger 			  | Daniel.Pfenniger at obs.unige.ch
 Geneva Observatory, University of Geneva | tel: +41 (22) 755 2611 
 CH-1290 Sauverny, Switzerland		  | fax: +41 (22) 755 3983
__________________________________________________________________________


From drh at niptron.com  Thu Mar 15 07:08:06 2001
From: drh at niptron.com (D. R. Holsbeck)
Date: Thu, 15 Mar 2001 09:08:06 -0600
Subject: Mysterious kernel hangs
References: <Pine.LNX.4.21.0103151424130.24291-100000@maloney.inf.ethz.ch>
Message-ID: <3AB0DAD6.C3F07F48@niptron.com>

Felix Rauch wrote:
You might want to change the eepro module to the latest version.
We have had some of the same issues. At first we tried the Intel
version. Seemed to help, but performance wasnt so hot. We are currently 
testing the latest one from http://www.scyld.com/network/eepro100.html.
And have seen good things so far.

> 
> We recently bought a new 16 node cluster with dual 1 GHz PentiumIII
> nodes, but machines mysteriously freeze :-(
> 
> The nodes have STL2 boards (Version A28808-301), onboard adaptec SCSI
> controllers (7899P), onboard intel Fast Ethernet adapters (82557
> [Ethernet Pro 100]) and additional Packet Engines Hamachi GNIC-II
> Gigabit Ethernet cards.
> 
> We tried kernels 2.2.x, 2.4.1 and now even 2.4.2-ac20, but it seems to
> be the same problem with all kernels: When we run experiments which
> use the network intensively, any of the machines will just freeze
> after a few hours. The frozen machine does not respond to anything and
> up to now we were not able to see any log-entries related to the
> freeze on virtual console 10 :-(   We switched now on all the "Kernel
> Hacking" stuff in the kernel configuration (especially the logging)
> and we will try again, hopefuly we will at least see some log outputs.
> 
> The freezes do also happen if we let non-network-intensive jobs run on
> the machines (e.g. SETI at home), but clearly they happen less often.
> 
> Does anyone of you have any ideas what could go wrong or what we could
> try to find the cause of the problems?
> 
> Regards,
> Felix
> --
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
drh at niptron.com

"Necessity is the mother of taking chances."
--Mark Twain.


From lindahl at conservativecomputer.com  Mon Mar 19 11:19:53 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 19 Mar 2001 14:19:53 -0500
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
In-Reply-To: <3AB0F6DF.9341A27E@grantgeo.com>; from leo.magallon@grantgeo.com on Thu, Mar 15, 2001 at 11:07:43AM -0600
References: <Pine.LNX.4.10.10010061355570.9348-100000@vaio.greennet> <3AB0F6DF.9341A27E@grantgeo.com>
Message-ID: <20010319141953.A2210@wumpus.hpti.com>

On Thu, Mar 15, 2001 at 11:07:43AM -0600, Leonardo Magallon wrote:

> A colleague of mine is saying that the problem is at the network
> level and I am thinking that it is at the Array level because the
> lights on the array just keep steadily on and the switch is not even
> at 25% utilization and attaching a console to the array is mainly
> for setting up drives and not for monitoring.

One way to approach this problem is to test the two independently.

To test the array, you can dd a large file on the server. It needs to
be big enough that you don't get any caching effects. Something like:

dd if=/dev/zero of=/array/big.file bs=1024k count=2048    (2 gigabytes)
time dd if=/array/big.file bs=1024k

But, to simulate your client's workload better, you would want to
start one smaller dd for each client. Reading N independent files is
harder for an array than 1 big file.

To test the network, create a small file that does fit into memory,
and rcp it to all the nodes. That doesn't have the inefficiency of NFS
(especially UDP) but it gives you a "speed of light" for the network:

dd if=/dev/zero of=/tmp/delete.me bs=1024k count=50        (50 megabytes)
rcp /tmp/delete.me client001:/tmp/delete.me

Again you need one rcp per client, simultaneously.

To test with NFS, read the same small file simultaneously from every
client:

rlogin client001
time cp /array/50.megabyte.file /tmp/delete.me

-- greg


From dvos12 at calvin.edu  Mon Mar 19 12:13:34 2001
From: dvos12 at calvin.edu (David Vos)
Date: Mon, 19 Mar 2001 15:13:34 -0500 (EST)
Subject: 8 DIMM Slot PIII/Athlon Motherboard ?
In-Reply-To: <Pine.LNX.4.30.0103111249370.4563-100000@mf1.private>
Message-ID: <Pine.GSO.4.21.0103191511280.20713-100000@ursa.calvin.edu>

www.crucial.com
Great prices right now.  PC133 256MB ECC SDRAM for under $100.

David

On Sun, 11 Mar 2001, Wayne Whitney wrote:

> On Sun, 11 Mar 2001, Gianluca Cecchi wrote:
> 
> > Any pointer to where to buy for these proces true good memories?
> 
> Well, I don't know if these prices will get you true good memory, I was
> just quoting the lowest www.pricewatch.com prices.  So there are probably
> hidden costs, excessive shipping charges and so on.
> 
> Cheers, Wayne
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


From Dean.Carpenter at pharma.com  Mon Mar 19 12:46:22 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Mon, 19 Mar 2001 15:46:22 -0500
Subject: DHCP - Channel Bonding?
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47A8@a1mbx01.pharma.com>

Anyone shed any light on this ?  We have some test nodes coming in shortly,
and the motherboards have dual 10/100 nics on board - Intel EtherExpress
Pros I believe.  We'll be using a Cisco 3548 switch - what do we have to do
to get these to use both nics into that switch ?

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


-----Original Message-----
From: Carpenter, Dean 
Sent: Wednesday, March 14, 2001 4:10 PM
To: 'Scott Shealy'; 'beowulf at beowulf.org'
Subject: RE: DHCP - Channel Bonding?


I was wondering the same thing, or rather a similar thing.  We're going to
be testing some compute nodes that have dual 10/100 NICs onboard.  It would
be nice to be able to use both in a bonded setup via the standard Scyld
beoboot method.

I would assume that the stage 1 boot would use just one nic to start up, but
the final stage 3 one would enslave the two eth0 and eth1 once they're up ?

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


-----Original Message-----
From: Scott Shealy [mailto:sshealy at asgnet.psc.sc.edu]
Sent: Wednesday, March 14, 2001 3:25 PM
To: 'beowulf at beowulf.org'
Subject: DHCP - Channel Bonding?


Anyone know if you can use channel bonding with DHCP....

Thanks,
Scott Shealy


From Eugene.Leitl at lrz.uni-muenchen.de  Mon Mar 19 13:09:09 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl at lrz.uni-muenchen.de)
Date: Mon, 19 Mar 2001 22:09:09 +0100
Subject: using graphics cards as generic FLOP crunchers
References: <200103161421.JAA13685@tamarack.cs.mtu.edu> <3AB23355.F9EA8203@redhat.com>
Message-ID: <3AB67575.82A850B6@lrz.uni-muenchen.de>

'Bryce' wrote:
 
>  Bx notes the beowulf geeks are getting seriously freaky, they're searching for a way to use the GFU's on high
> end video cards to contribute to the processing power of the main FPU

This is not the first time the topic came up. Last time we decided it 
wasn't worthwhile, iirc.

> <kx> wx, er, it is quite insane.  they can do the calculations but there's no way to get the results out.

It depends on the amount of calculations vs. result data to be moved. 
Clearly, 64 MBytes VRAM are usable for ANNs, which do require extensive 
matrix multiplications  (at the least the canonical kind), whereas the 
traffic from and to the input and output layer is relatively negligable.

> <sx> kx: Suddenly you get something that looks suspiciously like a vector multiply.

GeForce3 does do vector addition and vector multiply, amongst other things. 

> <kx> sx, still, I suspect that using a faster CPU will be easier and cheaper

The problem is the overhead of dealing with the nooks and crannies of horribly 
misused 3d accelerators. Also, the hardware is getting stale awfully quickly, 
so your investments would seem to have a very short half life time.


From lindahl at conservativecomputer.com  Mon Mar 19 13:28:52 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Mon, 19 Mar 2001 16:28:52 -0500
Subject: using graphics cards as generic FLOP crunchers
In-Reply-To: <3AB67575.82A850B6@lrz.uni-muenchen.de>; from Eugene.Leitl@lrz.uni-muenchen.de on Mon, Mar 19, 2001 at 10:09:09PM +0100
References: <200103161421.JAA13685@tamarack.cs.mtu.edu> <3AB23355.F9EA8203@redhat.com> <3AB67575.82A850B6@lrz.uni-muenchen.de>
Message-ID: <20010319162851.A2928@wumpus.hpti.com>

On Mon, Mar 19, 2001 at 10:09:09PM +0100, Eugene.Leitl at lrz.uni-muenchen.de wrote:

> Also, the hardware is getting stale awfully quickly, 
> so your investments would seem to have a very short half life time.

That's what the VSIP standard is for -- among other things, it's
fairly good for abstracting away the difficulties of using attached
array processors. It's pretty big, but if you wanted to go down this
path, it would be good to write your software to use it.

-- g


From lowther at att.net  Mon Mar 19 14:15:05 2001
From: lowther at att.net (lowther at att.net)
Date: Mon, 19 Mar 2001 17:15:05 -0500
Subject: AMD annoyed...
References: <Pine.LNX.4.30.0103191256480.12601-100000@ganesh.phy.duke.edu>
Message-ID: <3AB684E9.4AC6349C@att.net>

"Robert G. Brown" wrote:

>  I agree that AMD has no legal
> recourse against me if I choose to publish at this point, but they might
> well choose to punish the company that gave me the account.
> 

Exactly right.  This is not a bridge the Beowulf community wants to
burn.  If they said this was a ready to ship production chip set I would
view this totally differently.  It is encourageing that they are asking
for input, even if it is a bit late.

Ken


From rauch at inf.ethz.ch  Mon Mar 19 15:10:02 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Tue, 20 Mar 2001 00:10:02 +0100 (CET)
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
In-Reply-To: <3AB0F6DF.9341A27E@grantgeo.com>
Message-ID: <Pine.LNX.4.21.0103200003300.5822-100000@maloney.inf.ethz.ch>

On Thu, 15 Mar 2001, Leonardo Magallon wrote:
> When we start our job here, the switch passes no more than 31mb/s at
> any moment.

It could very well be the case that this number is the maximum that
the NFS server can deliver. We had a diploma thesis here in 1999 [1]
which examined the performance of NFS on Linux with various parameters
such as CPU speed, network, etc.

At that time, a 400 MHz PII as NFS server could not deliver more than
25 MB/s from the cache to an NFS client over Gigabit Ethernet (for
large single files). And that was only possible with the optimal
configuration for short times. Sustained bandwith was lower.

I don't know the details of your server, but 35 MB/s seems likely to
be the maximum that the NFS servers memory system and CPU can
deliver.

Did you try TCP measurements to see what the network can deliver? TCP
performance should be higher than NFS performance.

- Felix

[1] http://www.cs.inf.ethz.ch/stricker/sada/archive/isele.[html|pdf]
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From rgb at phy.duke.edu  Mon Mar 19 16:25:51 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Mon, 19 Mar 2001 19:25:51 -0500 (EST)
Subject: Auto-responder
In-Reply-To: <NFBBLAPJAKNEOJDJLOJECELNCDAA.sam@venturatech.com>
Message-ID: <Pine.LNX.4.30.0103191922370.16212-100000@ganesh.phy.duke.edu>

On Tue, 13 Mar 2001, Sam Lewis wrote:

>
> My apologies for the "auto-responder hell" that my mail server created in
> the past few days.  My IS department has rectified the problem, so that this
> will not happen again.  To be on the safe side, I've unsubscribed for the
> foreseeable future.

I personally wouldn't recommend that you unsubscribe -- just convert to
e.g. digest mode and be careful about the use of "vacation".  With
mailman, you can also literally turn the list on and off like a spigot.
That way you can remain subscribed all the time, but turn off the list
before you leave and turn it on again when you get back.  You can catch
up at your leisure by browsing the time-ordered list archives for the
time you were gone -- you needn't miss anything if you don't want to.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From yoon at bh.kyungpook.ac.kr  Mon Mar 19 17:01:21 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Tue, 20 Mar 2001 10:01:21 +0900
Subject: Will you add my benchmark for your cluster.top500.org database ?
References: <200103161233.GAA23764@hw.top500.org> <0103160954380L.30623@avicenna>
Message-ID: <002401c0b0d9$4a9bf320$5f72f2cb@TEST>

Dear Anas Nashif.

Thank you for you kind response about your thinking about clusters.top500.org .

But, I think there are many things to consider for you to run the clusters.top500.org.

First of all, What do you think about the "Beowulf Philosophy" ?

and The idea what I suggested first (Feb 1st 2000) in the 
http://www.beowulf.org/pipermail/beowulf/2000-February/008221.html
and many excellent ideas (including suggested by Professor Robert G. Brown ) added during discussion 
is only rejected by your future guidelines to be followed for the name of competition ?

What is the incentive for many people to make a beowulf  or enlist it in your clusters.top500.org or contribute to the
Linux kernel ?
 
You may be worry it will be end up in chaos if everybody enlist the 2 node (home-)clusters in your clusters.top500.org.

But we live in the Chaos society which has attraction & nature order but it looks like disorder.

I think the new idea & progress for mankind comes from the very small idea which we would like to skip or omit it.  
 
If you run cluster database, please don't make the restrictive guideline for enlisting your DB.

The great idea comes from very small idea or from the beowulf example, it may comes from 2 node beowulf.

So please don't judge the rule for enlisting your DB using University or Company name but the very small people & their

very small idea about beowulf & their endeavor.

Thank you very much.

I will wait your response or your clusters.top500.org  guideline.
     
P.S : Why www.beowulf.org people don't make the so called  www.top500-beowulf.org  DB 
         using very different benchmark and 
         enlist it in the www.beowulf.org  or www.beowulf-underground.org directly instead of  linking clusters.top500.org ?
         I could make it about 1 years ago. but I think it will be better to make it in the United States instead of Korea.
         and now clusters.top500.org is established. 
         but some future clusters.top500.org guideline prevent my (enlist-requested) KoreaBeowulf-24 node for Economic & 
         Business Application to enlist in clusters.top500.org ?
         Until now, I never think about so many restriction in the beowulf society, but now I see that inner computer society
         have many restrictive idea that has opposite view of Open Source or Open Society.
         Will you run your clusters.top500.org for more unrestrictively ?

        Thank you for my small & humble suggestion . 

-------------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon at bh.kyungpook.ac.kr
jhyoon at mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
news://203.242.114.96/koreabeowulf   BBS
 
Imagination is more important than knowledge.  A. Einstein
-------------------------------------------------------------------------


----- Original Message ----- 
From: Anas Nashif <webamster at top500.org>
To: Yoon Jae Ho <yoon at bh.knu.ac.kr>
Sent: Friday, March 16, 2001 11:54 PM
Subject: Re: Will you increase the number cluster sublist showing from 100 to 1000 ?


 On March 16, 2001 07:33 am, you wrote:
 > Now The cluster database can show only 100 clusters per search.

 No, it can show more. But there are nomore than 100 clusters in the DB right 
 now.

 > Will you increase the number to 1000 instead of 100 ?

 No, you can enter the desired number in the form.
 
 > Because in the world, there are so many beowulf clusters. and They will
> enlist their beowulf in this DB.

I hope so.
 

> I think in the Cluster Database, there must have at least 1000 results per
> query.
 
 Nobody will want to have 1000 results per query. Thats why we call it a 
 sub-list generator, wehre you can crete a list clusters of your interest.

> because the purpose for me to propose so called topcluster DB (It was Feb
> 1st, 2000 using the beowulf mailing list
> http://www.beowulf.org/pipermail/beowulf/2000-February/008221.html) is not
> for the competition like top500 list but their usefulness & flexibilities.

Well, we have defined another purpose. Several things might be just like you 
 expect but it makes no sense for us to enlist everybodys 2 node 
 (home-)clusters. To be in the list and to be ranked, several guidelines must 
 be followed otherwise this will end up in chaos.
 
 
 Anas
 
 > Thank you
> http://ie.korea.ac.kr/~supercom/


From mathboy at velocet.ca  Mon Mar 19 17:11:11 2001
From: mathboy at velocet.ca (Velocet)
Date: Mon, 19 Mar 2001 20:11:11 -0500
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
In-Reply-To: <Pine.LNX.4.21.0103200003300.5822-100000@maloney.inf.ethz.ch>; from rauch@inf.ethz.ch on Tue, Mar 20, 2001 at 12:10:02AM +0100
References: <3AB0F6DF.9341A27E@grantgeo.com> <Pine.LNX.4.21.0103200003300.5822-100000@maloney.inf.ethz.ch>
Message-ID: <20010319201111.B27759@velocet.ca>

On Tue, Mar 20, 2001 at 12:10:02AM +0100, Felix Rauch's all...
> On Thu, 15 Mar 2001, Leonardo Magallon wrote:
> > When we start our job here, the switch passes no more than 31mb/s at
> > any moment.
> 
> It could very well be the case that this number is the maximum that
> the NFS server can deliver. We had a diploma thesis here in 1999 [1]
> which examined the performance of NFS on Linux with various parameters
> such as CPU speed, network, etc.


> 
> At that time, a 400 MHz PII as NFS server could not deliver more than
> 25 MB/s from the cache to an NFS client over Gigabit Ethernet (for
> large single files). And that was only possible with the optimal
> configuration for short times. Sustained bandwith was lower.
> 
> I don't know the details of your server, but 35 MB/s seems likely to
> be the maximum that the NFS servers memory system and CPU can
> deliver.
> 
> Did you try TCP measurements to see what the network can deliver? TCP
> performance should be higher than NFS performance.

NFSv3 should show a DRASTIC improvement on these stats, though I dont
have any equipment to test this (only GbE could do this as FastEther (100Mbps)
is a theoretical max of 12.5Mbps, which NFSv3 does with ease in our operations
everyday).

You might also want to check out the nbd (network block device) in linux
2.4 - mounting a filesystem on an nbd is supposedly even faster than
nfs. Perhaps I can get Ben LaHaise of redhat to comment here.. (bcc'd
in case he doenst want his address known).

/kc

> - Felix
> 
> [1] http://www.cs.inf.ethz.ch/stricker/sada/archive/isele.[html|pdf]
> -- 
> Felix Rauch                      | Email: rauch at inf.ethz.ch
> Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
> ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
> CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 


From matz at wsunix.wsu.edu  Mon Mar 19 17:28:55 2001
From: matz at wsunix.wsu.edu (Phillip D. Matz)
Date: Mon, 19 Mar 2001 17:28:55 -0800
Subject: Netgear Fast Ethernet Cards
Message-ID: <000c01c0b0dd$21411bf0$b4297986@chem.wsu.edu>

Hi,

If I have the choice of building a Beowulf with the Netgear FA310TX, FA311 or FA312 fast ethernet cards which one should I choose?  Reasons why I should select one over the other?  They are all priced about the same on insight.com, and I like the 310TX - but should I be going with the other models for any reason?

Thanks for reading this.

Respectfully,

Phillip Matz
matz at wsunix.wsu.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010319/dca269e9/attachment.html>

From natorro at fenix.ifisicacu.unam.mx  Mon Mar 19 17:40:22 2001
From: natorro at fenix.ifisicacu.unam.mx (Carlos Lopez)
Date: Mon, 19 Mar 2001 19:40:22 -0600
Subject: Trying to install scyld on an alpha cluster.
Message-ID: <3AB6B506.A0BE31F1@fenix.ifisicacu.unam.mx>

Hi evrybody, I finally decided to install scyld on the alpha's cluster,
I downloaded the src.rpm's at www.scyld.com and tried to recompile them,
I failed miserably, I get to compile te kernel, and some other tools,
but the most important package seems to be the bproc package (in fact it
is what makes a beowulf an scyld beowulf) but I can't compile it, it 
says something about the kernel headers, and I did install the
kernel-headers
package, I have no idea what to do now, and the cluster is not working
by now... any help will be greatly appreciated.

Greetings
natorro


From w.l.kleb at larc.nasa.gov  Mon Mar 19 11:42:14 2001
From: w.l.kleb at larc.nasa.gov (william l kleb)
Date: Mon, 19 Mar 2001 14:42:14 -0500
Subject: dual PIII 133MHz FSB motherboards
References: <Pine.LNX.4.30.0103140947050.29906-100000@twain>
Message-ID: <3AB66116.5A688687@larc.nasa.gov>

Bill Comisky wrote:
> 
> Also, if anyone has rack mounted any of these, what rackmount case height
> did they fit in? 

fwiw, we have a prototype cluster running redhat 7.0+ with a master node in a
http://www.pcpowercooling.com/ mid-tower using a supermicro 370DL3 m/b and four
http://www.evserv.com/ 1U rackmount cases (available in the us through
http://www.initiative-tech.com/) housing tyan 2510NG m/b's.  the evserv cases
are much nicer than others we evaluated but do not go to the extreme of the
supermicro server case with hot swap, etc.

up to 8 1U's can be hung vertically in a custom computer cart build from
http://www.8020.net/ extruded aluminum "tinker toys" with a 19" x 14"
footprint including a shelf at the bottom for the switches (fastether,
kvm, and power).

so far (a couple months) we've been happy...

-- 
bil <http://abweb.larc.nasa.gov/~kleb/>


From newt at scyld.com  Mon Mar 19 20:15:14 2001
From: newt at scyld.com (Daniel Ridge)
Date: Mon, 19 Mar 2001 23:15:14 -0500 (EST)
Subject: Trying to install scyld on an alpha cluster.
In-Reply-To: <3AB6B506.A0BE31F1@fenix.ifisicacu.unam.mx>
Message-ID: <Pine.LNX.4.21.0103192259260.14831-100000@eleanor.wdhq.scyld.com>

On Mon, 19 Mar 2001, Carlos Lopez wrote:

> Hi evrybody, I finally decided to install scyld on the alpha's cluster,
> I downloaded the src.rpm's at www.scyld.com and tried to recompile them,
> I failed miserably, I get to compile te kernel, and some other tools,
> but the most important package seems to be the bproc package (in fact it
> is what makes a beowulf an scyld beowulf) but I can't compile it, it 
> says something about the kernel headers, and I did install the
> kernel-headers
> package, I have no idea what to do now, and the cluster is not working
> by now... any help will be greatly appreciated.

We know for certian that BProc builds just fine on the Alpha -- that's how
we build it for our customers.

My guess is that the 'kernel headers' RPM you installed is not from a
kernel RPM that includes the BProc patch. No problem -- just apply the
BProc patch from the SRC rpm to the kernel you've installed.

This should allow you to compile and build BProc for Alpha. You'll need to
also rebuild your kernel with BProc turned on in order to use this
feature.

This is one reason that people often prefer to get a distribution from
us. See the Scyld Partners page at:

	http://www.scyld.com/page/vendors

for a list of vendors that provide Scyld preinstalled on systems.

Regards (and good luck),
	Dan Ridge
	Scyld Computing Corporation


From jakob at unthought.net  Mon Mar 19 20:32:21 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Tue, 20 Mar 2001 05:32:21 +0100
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
In-Reply-To: <20010319201111.B27759@velocet.ca>; from mathboy@velocet.ca on Mon, Mar 19, 2001 at 08:11:11PM -0500
References: <3AB0F6DF.9341A27E@grantgeo.com> <Pine.LNX.4.21.0103200003300.5822-100000@maloney.inf.ethz.ch> <20010319201111.B27759@velocet.ca>
Message-ID: <20010320053221.D31501@unthought.net>

On Mon, Mar 19, 2001 at 08:11:11PM -0500, Velocet wrote:
...
> NFSv3 should show a DRASTIC improvement on these stats, though I dont
> have any equipment to test this (only GbE could do this as FastEther (100Mbps)
> is a theoretical max of 12.5Mbps, which NFSv3 does with ease in our operations
> everyday).
> 
> You might also want to check out the nbd (network block device) in linux
> 2.4 - mounting a filesystem on an nbd is supposedly even faster than
> nfs. Perhaps I can get Ben LaHaise of redhat to comment here.. (bcc'd
> in case he doenst want his address known).

Two clients can't mount the same FS over nbd though...

The clients will all have meta-data caches, and they will be inconsistent even
though the block data device is the same on the server.

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:


From davidge at 1stchina.com  Tue Mar 20 00:27:27 2001
From: davidge at 1stchina.com (David)
Date: Tue, 20 Mar 2001 16:27:27 +0800
Subject: problem when booting the salve machine from floppy disk
Message-ID: <200103200833.DAA10662@blueraja.scyld.com>

Hello everyone,

I've instaled Scyld Beowulf front-end machine & salve machine
but there are some failed notice in the salve machine log and
It seemed that the salve machine can not share in the fond-end 
machine's work.

below is the salve machine's log
-----
node_up: Setting system clock.
node_up: TODO set interface netmask.
node_up: Configuring loopback interface.
setup_fs: Configuring node filesystems...
setup_fs: Using /etc/beowulf/fstab
setup_fs: Checking /dev/ram3 (type=ext2)...
setup_fs: Hmmm...This appears to be a ramdisk. 
setup_fs: I'm going to try to try checking the filesystem (fsck) anyway.
setup_fs: If it is a RAM disk the following will fail harmlessly.
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
ext2fs_check_if_mount: No such file or directory while determining whether /dev/ram3 is mounted.

Couldn't find ext2 superblock, trying backup blocks...

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

e2fsck: Bad magic number in super-block while trying to open /dev/ram3

setup_fs: FSCK failure. (OK for RAM disks)
setup_fs: Creating ext2 on /dev/ram3...
mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
ext2fs_check_if_mount: No such file or directory while determining whether /dev/ram3 is mounted.

setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults)
grep: BProc move failed.
modprobe: Can't locate module ext2
bpsh: Child process exit abnormally.
Failed to mount /dev/ram3 on /.
----------------------

Anyone can tell me what make this occurred and how to solve
this question?

Thanks.


Sincerely yours,
David Ge <davidge at 1stchina.com>
Room 604, No. 168, Qinzhou Road, Shanghai.
Phone: (021)34140621-12 
2001-03-20 16:06:38


From bremner at unb.ca  Tue Mar 20 07:42:22 2001
From: bremner at unb.ca (David Bremner)
Date: Tue, 20 Mar 2001 11:42:22 -0400 (AST)
Subject: Huinalu Linux SuperCluster
In-Reply-To: <3AB0FD6C.55CB9BDC@myri.com>
References: <p04320406b6d5a80d717e@[208.24.120.15]>
	<3AB0FD6C.55CB9BDC@myri.com>
Message-ID: <15031.31326.249833.53437@convex.cs.unb.ca>

Patrick Geoffray writes:
 > Mark Lucas wrote:
 > 
 > > at http://www.mhpcc.edu/doc/huinalu/huinalu-intro.html
 > > 
 > > Does anyone have any specifics on the hardware cost of this system?
 > > Is IBM selling configured Beowulf clusters?
 >
 > It's the third large IBM Linux cluster with Myrinet after New Mexico and
 > NCSA at Urbana-Champaign, IL. IBM seems to be very comfortable with this
 > type of product (evolution of the SP market). However, I do not think
 > they are selling small scale Beowulf cluster.
 > 

Ask your friendly IBM rep. They are willing to quote quite small 
configurations where I live. 

All the best,

David


From Dean.Carpenter at pharma.com  Tue Mar 20 08:12:26 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Tue, 20 Mar 2001 11:12:26 -0500
Subject: PXE booting
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47B2@a1mbx01.pharma.com>

Anyone out there using PXE to boot their nodes for a Scyld Beowulf ?  What
did you do for the configuration ?

--
Dean Carpenter
deano at areyes.com
dean.carpenter at pharma.com
dean.carpenter at purduepharma.com
94TT :)


From newt at scyld.com  Tue Mar 20 10:00:28 2001
From: newt at scyld.com (Daniel Ridge)
Date: Tue, 20 Mar 2001 13:00:28 -0500 (EST)
Subject: PXE booting
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC47B2@a1mbx01.pharma.com>
Message-ID: <Pine.LNX.4.21.0103201257050.32372-100000@eleanor.wdhq.scyld.com>

On Tue, 20 Mar 2001, Carpenter, Dean wrote:

> Anyone out there using PXE to boot their nodes for a Scyld Beowulf ?  What
> did you do for the configuration ?

We've done PXE booting using Peter Anvin's SYSLINUX for PXE as the
bootstrapper. If you run 'beoboot -2 -i -o /tmp/foofoo' or similar,
the beoboot tool will spit out separate kernel and initrd images which you
can then supply to SYSLINUX.

Naturally, several other arragements are possible with PXE.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From natorro at fenix.ifisicacu.unam.mx  Tue Mar 20 10:15:25 2001
From: natorro at fenix.ifisicacu.unam.mx (Carlos Lopez)
Date: Tue, 20 Mar 2001 12:15:25 -0600
Subject: Trying to install scyld on an alpha cluster.
References: <Pine.LNX.4.21.0103192259260.14831-100000@eleanor.wdhq.scyld.com>
Message-ID: <3AB79E3D.6DC93CED@fenix.ifisicacu.unam.mx>

Daniel Ridge wrote:

> We know for certian that BProc builds just fine on the Alpha -- that's how
> we build it for our customers.

Cool, that means I certainly can make it :-)

> My guess is that the 'kernel headers' RPM you installed is not from a
> kernel RPM that includes the BProc patch. No problem -- just apply the
> BProc patch from the SRC rpm to the kernel you've installed.

I compile the kernel-2.2.17-33.beo.src.rpm with 'rpm --rebuild
kernel-2.2.17-33.beo.src.rpm'
and it compiled just good, it was running a 2.2.16-22 kernel (the one
that comes with
Red Hat 6.2) and I uninstalled every package related to the old kernel,
including
kernel-headers, kernel-source, kernel-smp and kernel, I installed all
the binaries
generated by my compilation with kernel-2.2.17-33.beo.src.rpm excepting
kernel (the 
normal kernel, I did install kernel-smp instead) and kernel-BOOT, so I
rebooted,
finally it worked and then I tried to recompile the bproc rpm, it didn't
work, it
says the same thing I told you before, it seems like some headers are
missing, I
did install them from the compilation of the
kernel-2.2.17-33.beo.src.rpm but I
don't know wat I did wrong... 
You say I have just to apply the BProc patch from the SRC rpm to the
kernel I
installed, how do I do this???, I'm very confused because when I do the 
'rpm --rebuild kernel-2.2.17-33.beo.src.rpm' thing, it answers all the
questions
when it does 'make confing' (or at least that's why I think it's doing)
and generates
the binary rpm's, well, in few words, how do I apply the bproc patch to
the kernel
I've installed, I was thinking the compilation takes place in
/usr/src/linux
but it seems to be doing so in /usr/src/redhat/BUILD/linux or something
like that.
 
> This should allow you to compile and build BProc for Alpha. You'll need to
> also rebuild your kernel with BProc turned on in order to use this
> feature.

I don't know how to do this, I mean, from the SRC rpm's how do I turn
the bproc
on???


> This is one reason that people often prefer to get a distribution from
> us. See the Scyld Partners page at:
> 
>         http://www.scyld.com/page/vendors
> 
> for a list of vendors that provide Scyld preinstalled on systems.

Of course :-) I've seen that already, but the problem is that we already
have
bought a cluster, it came from Microway, and came with a terrible
implementation
of a beowulf, no NFS and no xntpd so the distribution of binaries is a
pain in
the ass and the nodes clocks were losing sync all the time... that's why
I decided
to change to Scyld because I've already installed it on a PC's cluster,
and it
runs beautifully, the administration is really easy and works a lot
better, I sent
an email asking for help to Microway but  no answer from them, 
I'm really sorry to bother you with questions that are suppose to come
from 
newbies, but as a matter if fact I'm the 'expert' here and I haven't
been able
to make this Alpha cluster work, thanks a lot for any more help you
could give me.

Greetings and again thank you.
natorro


From davidge at 1stchina.com  Tue Mar 20 18:34:54 2001
From: davidge at 1stchina.com (David)
Date: Wed, 21 Mar 2001 10:34:54 +0800
Subject: how to let salve machine share the front-end machine's load
Message-ID: <200103210241.VAA20336@blueraja.scyld.com>

Hello everyone,

I've instaled Scyld Beowulf front-end machine & salve machine.

Now I have a question, How to make the salve machine share the 
front-end machine's load? Need I install all the software(suche 
lick MySQL, Apache)on the salve machine again or simple mount
the /usr directory from the front-end machine, or do nothing 
(It seemed this doesnot work), or use scyld's libary to write 
some programe special?

Thanks.


Sincerely yours,
David Ge <davidge at 1stchina.com>
Room 604, No. 168, Qinzhou Road, Shanghai.
Phone: (021)34140621-12 
2001-03-20 16:06:38


From wyy at cersa.admu.edu.ph  Tue Mar 20 18:53:04 2001
From: wyy at cersa.admu.edu.ph (Horatio B. Bogbindero)
Date: Wed, 21 Mar 2001 10:53:04 +0800 (PHT)
Subject: PXE booting
In-Reply-To: <Pine.LNX.4.21.0103201257050.32372-100000@eleanor.wdhq.scyld.com>
Message-ID: <Pine.LNX.4.10.10103211052030.24297-100000@cersa.admu.edu.ph>

> 
> > Anyone out there using PXE to boot their nodes for a Scyld Beowulf ?  What
> > did you do for the configuration ?
> 
> We've done PXE booting using Peter Anvin's SYSLINUX for PXE as the
> bootstrapper. If you run 'beoboot -2 -i -o /tmp/foofoo' or similar,
> the beoboot tool will spit out separate kernel and initrd images which you
> can then supply to SYSLINUX.
> 
> Naturally, several other arragements are possible with PXE.
> 
how about arrangements with a pxe daemon, tftp and dhcp....

i was able to get a prototype running a while based on intel's
documentation for redhat. however, i stopped for a while because 
of a problem...
 
--------------------------------------
William Emmanuel S. Yu
Ateneo Cervini-Eliazo Networks (ACENT)
email  :  william.s.yu at ieee.org
web    :  http://cersa.admu.edu.ph/
phone  :  63(2)4266001-5925/5904
 
A citizen of America will cross the ocean to fight for democracy, but
won't cross the street to vote in a national election.
		-- Bill Vaughan
 

From agrajag at linuxpower.org  Tue Mar 20 20:00:39 2001
From: agrajag at linuxpower.org (Jag)
Date: Tue, 20 Mar 2001 20:00:39 -0800
Subject: how to let salve machine share the front-end machine's load
In-Reply-To: <200103210241.VAA20336@blueraja.scyld.com>; from davidge@1stchina.com on Wed, Mar 21, 2001 at 10:34:54AM +0800
References: <200103210241.VAA20336@blueraja.scyld.com>
Message-ID: <20010320200039.H13901@kotako.analogself.com>

On Wed, 21 Mar 2001, David wrote:

> Hello everyone,
> 
> I've instaled Scyld Beowulf front-end machine & salve machine.
> 
> Now I have a question, How to make the salve machine share the 
> front-end machine's load? Need I install all the software(suche 
> lick MySQL, Apache)on the salve machine again or simple mount
> the /usr directory from the front-end machine, or do nothing 
> (It seemed this doesnot work), or use scyld's libary to write 
> some programe special?

What exactly are you trying to do?  It sounds like you're trying to do
server load balancing.  This isn't what a Beowulf cluster is designed to
do.  Beowulf clusters are designed to run self-contained compute jobs
that are coded in such a way that different parts of the job can be run
at the same time without comprimising up the results.

A load-balancing cluster is designed to take requests from remote users
and assign the new requests to whichever system in the cluster has the
lowest load.

While these tasks seem to be somewhat related, the way the systems have
to be setup are completely different.  And the a Scyld system just isn't
designed to be able to handle load balancing server tasks.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010320/bc40e783/attachment.sig>

From carlos at nernet.unex.es  Wed Mar 21 07:35:20 2001
From: carlos at nernet.unex.es (=?Windows-1252?Q?Carlos_J._Garc=EDa_Orellana?=)
Date: Wed, 21 Mar 2001 16:35:20 +0100
Subject: Problem with BeoMPI
Message-ID: <004901c0b21c$8a82a4e0$7c12319e@unex.es>

Hello,

I want to use BeoMPI in our Scyld beowulf cluster. I have compiled the
example pi3p.f, and using one processor it works, however with two or more
the problem fail with this error message:

    p0_19537:  p4_error: net_create_slave: host not a bproc node: -3
        p4_error: latest msg from perror: Success

I'm using the last Scyld Beowulf software.

Please, could I have any help?

Thanks.

Carlos J. Garc?a Orellana
Dpto. Electronica.
Universidad de Extremadura - SPAIN


From agrajag at linuxpower.org  Wed Mar 21 07:49:25 2001
From: agrajag at linuxpower.org (Jag)
Date: Wed, 21 Mar 2001 07:49:25 -0800
Subject: Problem with BeoMPI
In-Reply-To: <004901c0b21c$8a82a4e0$7c12319e@unex.es>; from carlos@nernet.unex.es on Wed, Mar 21, 2001 at 04:35:20PM +0100
References: <004901c0b21c$8a82a4e0$7c12319e@unex.es>
Message-ID: <20010321074925.I13901@kotako.analogself.com>

On Wed, 21 Mar 2001, Carlos J. Garc?a Orellana wrote:

> Hello,
> 
> I want to use BeoMPI in our Scyld beowulf cluster. I have compiled the
> example pi3p.f, and using one processor it works, however with two or more
> the problem fail with this error message:
> 
>     p0_19537:  p4_error: net_create_slave: host not a bproc node: -3
>         p4_error: latest msg from perror: Success

That seems strange..  what does running the command 'bpstat' return?
Also, what happens when you run the command 'bpsh 0 echo foo'?


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010321/b1e944e5/attachment.sig>

From carlos at nernet.unex.es  Wed Mar 21 08:09:10 2001
From: carlos at nernet.unex.es (=?iso-8859-1?Q?Carlos_J._Garc=EDa_Orellana?=)
Date: Wed, 21 Mar 2001 17:09:10 +0100
Subject: Problem with BeoMPI
References: <004901c0b21c$8a82a4e0$7c12319e@unex.es> <20010321074925.I13901@kotako.analogself.com>
Message-ID: <005901c0b221$43f3f420$7c12319e@unex.es>

The results of these commands:

-> bpstat

bash$ bpstat
Node    Address         Status  User    Group
0       192.168.1.100   up      any     any
1       192.168.1.101   up      any     any
2       192.168.1.102   up      any     any
3       192.168.1.103   up      any     any
4       192.168.1.104   up      any     any
5       192.168.1.105   up      any     any
6       192.168.1.106   up      any     any
7       192.168.1.107   up      any     any
8       192.168.1.108   up      any     any
9       192.168.1.109   up      any     any
10      192.168.1.110   up      any     any
11      192.168.1.111   up      any     any
12      192.168.1.112   up      any     any
13      192.168.1.113   up      any     any
14      192.168.1.114   up      any     any
15      192.168.1.115   up      any     any
16      192.168.1.116   up      any     any
17      192.168.1.117   up      any     any
18      192.168.1.118   up      any     any
19      192.168.1.119   up      any     any
20      192.168.1.120   up      any     any
21      192.168.1.121   up      any     any
22      192.168.1.122   up      any     any
23      192.168.1.123   up      any     any
24      192.168.1.124   down    any     any
25      192.168.1.125   up      any     any
26      192.168.1.126   down    any     any
27      192.168.1.127   down    any     any
28      192.168.1.128   down    any     any
29      192.168.1.129   down    any     any
30      192.168.1.130   down    any     any


-> bpsh 0 echo foo

bash$ bpsh 0 echo foo
foo
bash$


----- Original Message -----
From: "Jag" <agrajag at linuxpower.org>
To: "Carlos J. Garc?a Orellana" <carlos at nernet.unex.es>
Cc: <beowulf at beowulf.org>
Sent: Wednesday, March 21, 2001 4:49 PM
Subject: Re: Problem with BeoMPI


From RSchilling at affiliatedhealth.org  Wed Mar 21 10:44:49 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Wed, 21 Mar 2001 10:44:49 -0800
Subject: how to let salve machine share the front-end machine's load
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1124@mail1.affiliatedhealth.org>

I do something similar.  I mount via NFS a directory on the slave machine.
That directory containes executables common to nodes of a particular
hardware type.  I also have my CVS repository on a slave machine.  I mount
those directories under /usr somewhere:

	/usr/cvsroot - NFS mount to the node with the CVS repository
	/usr_shared  - NFS mount to the node with executables and other
"usr" type files I can share among nodes.
	
	notice that the head node still has it's own /usr partition.
Mounting /usr with NFS can be done, but you can't mount /usr locally and via
NFS at the same time, of course.

Mounting the CVS repository with NFS works really well.  You might also try
downloading the clusterit toolkit - it allows you to spawn simple jobs
across network nodes.  Really handy when you have to grep files on several
nodes or check disk space or something like that.

Richard Schilling
Web Integration Programmer/Webmaster
phone: 360.856.7129
fax: 360.856.7166
URL: http://www.affiliatedhealth.org

Affiliated Health Services
Information Systems
1971 Highway 20
Mount Vernon, WA USA


> -----Original Message-----
> From: David [mailto:davidge at 1stchina.com]
> Sent: Tuesday, March 20, 2001 6:35 PM
> To: beowulf at beowulf.org
> Subject: how to let salve machine share the front-end machine's load
> 
> 
> Hello everyone,
> 
> I've instaled Scyld Beowulf front-end machine & salve machine.
> 
> Now I have a question, How to make the salve machine share the 
> front-end machine's load? Need I install all the software(suche 
> lick MySQL, Apache)on the salve machine again or simple mount
> the /usr directory from the front-end machine, or do nothing 
> (It seemed this doesnot work), or use scyld's libary to write 
> some programe special?
> 
> Thanks.
> 
> 
> 
> Sincerely yours,
> David Ge <davidge at 1stchina.com>
> Room 604, No. 168, Qinzhou Road, Shanghai.
> Phone: (021)34140621-12 
> 2001-03-20 16:06:38
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


From rgb at phy.duke.edu  Wed Mar 21 10:48:39 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 21 Mar 2001 13:48:39 -0500 (EST)
Subject: Test source site
Message-ID: <Pine.LNX.4.30.0103211341400.1273-100000@ganesh.phy.duke.edu>

Dear List Friends,

Several persons have been asking for a look at the sources used to
conduct the d*** A***on test that I cannot discuss or publish at this
point;-), and I've started to put together a site of those sources.
I've put some effort into packaging at least the ones that can easily be
packaged -- tests derived from a particular build of LAM-MPI are a bit
harder to wrap up in a simple tarball, for example.  With luck, however,
I'll get "something" together eventually even for them.  In the
meantime, the first three (stream, cpu-rate, Thomas' Guignon's
benchmark) are available with annotations and comments in both tgz and
rpm form at:

  http://www.phy.duke.edu/brahma/dual_athlon/tests.html

I view all of these packages as much as Requests for Comment (RFC) as
general beowulf/systems engineering resources -- feel free to grab them
and apply them to your own systems but also consider letting me know if
the packaging works well, if the tests are broken, if you are deeply
offended that I would tamper with e.g. stream by packaging it or hacking
it, or whatever.  As I get more tests packaged, more links will light up
un the URL above.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From RSchilling at affiliatedhealth.org  Wed Mar 21 10:54:03 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Wed, 21 Mar 2001 10:54:03 -0800
Subject: Test source site
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1126@mail1.affiliatedhealth.org>

Have you also done any work with the AMD 64 bit simulator?  If so, how did
it go?


Richard Schilling
Web Integration Programmer/Webmaster
phone: 360.856.7129
fax: 360.856.7166
URL: http://www.affiliatedhealth.org

Affiliated Health Services
Information Systems
1971 Highway 20
Mount Vernon, WA USA


> -----Original Message-----
> From: Robert G. Brown [mailto:rgb at phy.duke.edu]
> Sent: Wednesday, March 21, 2001 10:49 AM
> To: Beowulf Mailing List
> Subject: Test source site
> 
> 
> Dear List Friends,
> 
> Several persons have been asking for a look at the sources used to
> conduct the d*** A***on test that I cannot discuss or publish at this
> point;-), and I've started to put together a site of those sources.
> I've put some effort into packaging at least the ones that 
> can easily be
> packaged -- tests derived from a particular build of LAM-MPI are a bit
> harder to wrap up in a simple tarball, for example.  With 
> luck, however,
> I'll get "something" together eventually even for them.  In the
> meantime, the first three (stream, cpu-rate, Thomas' Guignon's
> benchmark) are available with annotations and comments in both tgz and
> rpm form at:
> 
  http://www.phy.duke.edu/brahma/dual_athlon/tests.html

I view all of these packages as much as Requests for Comment (RFC) as
general beowulf/systems engineering resources -- feel free to grab them
and apply them to your own systems but also consider letting me know if
the packaging works well, if the tests are broken, if you are deeply
offended that I would tamper with e.g. stream by packaging it or hacking
it, or whatever.  As I get more tests packaged, more links will light up
un the URL above.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010321/18cf29be/attachment.html>

From rgb at phy.duke.edu  Wed Mar 21 10:54:35 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 21 Mar 2001 13:54:35 -0500 (EST)
Subject: Test source site
In-Reply-To: <51FCCCF0C130D211BE550008C724149EBE1126@mail1.affiliatedhealth.org>
Message-ID: <Pine.LNX.4.30.0103211353480.1273-100000@ganesh.phy.duke.edu>

On Wed, 21 Mar 2001, Schilling, Richard wrote:

> Have you also done any work with the AMD 64 bit simulator?  If so, how did
> it go?

No.  I'd welcome contributions or comments or instructions on how to go
about it, though.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From lindahl at conservativecomputer.com  Wed Mar 21 11:49:43 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed, 21 Mar 2001 14:49:43 -0500
Subject: Test source site
In-Reply-To: <Pine.LNX.4.30.0103211353480.1273-100000@ganesh.phy.duke.edu>; from rgb@phy.duke.edu on Wed, Mar 21, 2001 at 01:54:35PM -0500
References: <51FCCCF0C130D211BE550008C724149EBE1126@mail1.affiliatedhealth.org> <Pine.LNX.4.30.0103211353480.1273-100000@ganesh.phy.duke.edu>
Message-ID: <20010321144943.A1658@wumpus.hpti.com>

> > Have you also done any work with the AMD 64 bit simulator?  If so, how did
> > it go?
> 
> No.  I'd welcome contributions or comments or instructions on how to go
> about it, though.

The 64-bit simulator doesn't have the stuff needed to do any
performance analysis with it. It doesn't simulate any of the
super-scalar stuff or instruction latencies. I talked to the authors
about this at a show, since I'd love to be able to work out the
performance boost for codes that use 64-bit integers...

-- g


From RSchilling at affiliatedhealth.org  Wed Mar 21 11:58:00 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Wed, 21 Mar 2001 11:58:00 -0800
Subject: Test source site
Message-ID: <51FCCCF0C130D211BE550008C724149EBE112A@mail1.affiliatedhealth.org>

So it's mostly for the application layer, then.  I did get the sense it is
enough to start doing preliminary work on - that is compiling and testing
basic code.

I do expect, however that if the simulator runs the code, the hardware will
(with perhaps modifications due to errata), and I'll be able to get a jump
on application/tools development.

I'll let you know if I have luck with it.  

Richard Schilling
Web Integration Programmer/Webmaster
phone: 360.856.7129
fax: 360.856.7166
URL: http://www.affiliatedhealth.org

Affiliated Health Services
Information Systems
1971 Highway 20
Mount Vernon, WA USA


> -----Original Message-----
> From: Greg Lindahl [mailto:lindahl at conservativecomputer.com]
> Sent: Wednesday, March 21, 2001 11:50 AM
> To: Beowulf Mailing List
> Subject: Re: Test source site
> 
> 
> > > Have you also done any work with the AMD 64 bit 
> simulator?  If so, how did
> > > it go?
> > 
> > No.  I'd welcome contributions or comments or instructions 
> on how to go
> > about it, though.
> 
> The 64-bit simulator doesn't have the stuff needed to do any
> performance analysis with it. It doesn't simulate any of the
> super-scalar stuff or instruction latencies. I talked to the authors
> about this at a show, since I'd love to be able to work out the
> performance boost for codes that use 64-bit integers...
> 
> -- g
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010321/0a38e8e2/attachment.html>

From larry at pssclabs.com  Wed Mar 21 13:32:45 2001
From: larry at pssclabs.com (Larry Lesser)
Date: Wed, 21 Mar 2001 13:32:45 -0800
Subject: Cables
Message-ID: <4.3.2.20010321132558.00b61210@pop.pssclabs.com>

Phil:

We have extension cables for the keyboard, mouse and video.  On the power 
supply I need to know which kind you have inside the case.  There are two 
types.  The difference is how the black power cable is connected to the 
power supply.  If you go from the back of the case you will see three wires 
connected to the back of the case.  These are inside the black power 
cable.  Now the end of the cable going into the power supply is either one 
that has a normal plug that you can pull out of the power supply, or it is 
the kind that is wired directly inside the power supply.  If you can 
clarify this then we can get you the correct one.

Also, please give us the best address for you.

Thanks,.

Larry
=====================================
Larry Lesser
PSSC Labs
voice: (949) 380-7288
fax: (949) 380-9788
larry at pssclabs.com

http://www.pssclabs.com
=====================================


From lindahl at conservativecomputer.com  Wed Mar 21 14:20:53 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed, 21 Mar 2001 17:20:53 -0500
Subject: Test source site
In-Reply-To: <51FCCCF0C130D211BE550008C724149EBE112A@mail1.affiliatedhealth.org>; from RSchilling@affiliatedhealth.org on Wed, Mar 21, 2001 at 11:58:00AM -0800
References: <51FCCCF0C130D211BE550008C724149EBE112A@mail1.affiliatedhealth.org>
Message-ID: <20010321172053.A1539@wumpus.hpti.com>

On Wed, Mar 21, 2001 at 11:58:00AM -0800, Schilling, Richard wrote:

> So it's mostly for the application layer, then.  I did get the sense it is
> enough to start doing preliminary work on - that is compiling and testing
> basic code.

Sure, but they're paying someone to do the gcc backend, right? It's
probably not so hot now and will be a lot better when the hardware is
closer to reality. Good luck, and let us know how you fare...

-- greg


From bill at billnorthrup.com  Wed Mar 21 14:51:06 2001
From: bill at billnorthrup.com (Bill Northrup)
Date: Wed, 21 Mar 2001 14:51:06 -0800
Subject: Hybrid Master.
Message-ID: <007101c0b259$6a5babc0$2048000a@enshq>

Hello List,

Being both a Mosix and Beowulf enthusiast I would like to combine my masters on to one machine. I have followed the list for some time and understand many folks run hybrid systems. However I am running the Scyld release 2 distribution and the kernal versions required by Mosix is that of a plain vanilla type from kernel.org. My assumptions are that Mosix is kernal specific and Scyld is not so picky. In short my question is how do I go about it? Should I install plain Red Hat, plain kernel, Mosix then Scyld RPMS? Install the Scyld master, plain kernel, Mosix and Scyld again? I am not a kernel junkie and still pretty deep into the learning curve. I think a to do list with considerations and caveats would start me down my path, but feel free to get overly detailed! Please feel free to take it off list with me as well. If I get it working I promise to write a white paper or contribute the procedure in some way.
BTW: I am not trying to mix the slaves.

Thank-you

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010321/d56cb5a2/attachment.html>

From billygotee at operamail.com  Tue Mar 20 10:47:19 2001
From: billygotee at operamail.com (Brandon Arnold)
Date: Tue, 20 Mar 2001 12:47:19 -0600
Subject: Ok so I ditched the LNE100TX
Message-ID: <000e01c0b261$3ef6afe0$0201a8c0@communicomm.com>

I ditched the new Linksys card I bought and decided to continue battling
with my old card, a Kingston KNE30BT(runs off rtl8029 driver, ne2k-pci).  I
figured out that the whole time, it was assigning the card to IRQ 0.  I
turned of Plug-n-Play BIOS and the card works fine.

Nowthen,

I decided to install VMWare because I dunno I just wanted to, and it screwed
my ethernet up again!  Now, Linux can't even detect my card despite the IRQ
correction and everything.  Upon bootup, it loads a lot of vmnet modules and
such, when I do a ifconfig, I get something about vmnet1 but not eth0!! This
is bullshit. It seems like vmware could find some way to utilize the drivers
that are already there.  I uninstalled vmware and the card is no longer
working...best thing I can think of is reinstall Mandrake 7.2 and that
really pisses me off...

-Brandon


From walke at usna.edu  Wed Mar 21 17:02:40 2001
From: walke at usna.edu (Vann H. Walke)
Date: Wed, 21 Mar 2001 20:02:40 -0500
Subject: Sharing LNE100TX blues...
References: <003a01c0b034$2196e8e0$6801a8c0@communicomm.com>
Message-ID: <3AB94F30.9010408@usna.edu>

Try Donald Becker's latest net-driver package:

http://www.scyld.com/network/updates.html

It should get everything up and working...

Good Luck,
Vann


Brandon Arnold wrote:

> I've been battling with my LNE100TX v4 NIC for the past 3 days.  I see 
> its a common problem with most Linux folks that have bought it, but 
> since I've read lots of posts claiming success, I have faith that I'm 
> doing something wrong.
> 
>  
> 
> For all obvious reasons, the NIC isn't there.  Be aware, it works fine 
> when I boot into Windows ME, so the card is plugged in securely.  I 
> tried to use the netdrivers.tgz that was included on the installation 
> disk for Linux users.  The files unzipped correctly, but only a few 
> compiled.  I wont go into extensive detail about that, because most 
> posts suggested compiling the updated tulip.c driver at Becker's 
> site.  Therefore, I downloaded the latest tulip.c, pci-scan.c, 
> pci-scan.h, and kern_compat.h, ran DOS2UNIX on them and then placed 
> them all in the /usr/src/modules directory.  The compile command gave 
> me fits because it is different for Mandrake 7.2, but I finally came 
> up with the following:
> 
>  
> 
> For tulip.c:
> 
>  
> 
> # gcc -I/usr/src/linux-2.2.17/include -DMODULE -Wall 
> -Wstrict-prototypes -O6 -c /usr/src/modules/tulip.c
> 
>  
> 
> For pci-scan.c:
> 
>  
> 
> # gcc -I/usr/src/linux-2.2.17/include -DMODULE -D__KERNEL__ 
> -DEXPORT_SYMTAB -Wall -Wstrict-prototypes -O6 -c 
> /usr/src/modules/pci-scan.c
> 
>  
> 
> They both returned the compiled tulip.o and pci-scan.o files, but both 
> times I received an assembling error in modprob.c.  I'm not sure if 
> this matters--it seems like it would. :-)  Anyway I ran insmod on both 
> of the compiled files.  pci-scan.o installed correctly, but when I 
> tried to install tulip.o, it returned 'tulip is busy' or something to 
> that effect.  It seems like I read a few posts where people fixed this 
> problem by changing the slot that the card was in.  Anyway, I changed 
> the card to the slot directly below it, but still no cigar.  Of 
> course, Windows ME still works.  Please forgive me on the 
> non-descriptiveness of the errors, I had to reboot into Windows ME to 
> type this message and forgot to record the exact statement. I think I 
> included enough information for a guru to cite my problem, but if you 
> need the exact statements, by all means, post and I'll include those 
> as well.  I guess what I want is first hand instructions from someone 
> else who had the same problem in Linux Mandrake 7.x and got it fixed, 
> but I also welcome help from anyone else who sees my mistake. Thanks 
> in advance!
> 
>  
> 
> -Brandon Arnold
> 


From ok at mailcall.com.au  Thu Mar 22 10:49:50 2001
From: ok at mailcall.com.au (Omar Kilani)
Date: Fri, 23 Mar 2001 05:49:50 +1100
Subject: Installing Scyld on the Master ...
Message-ID: <5.0.2.1.2.20010323020625.0246bec0@172.17.0.107>

Hello,

Has anyone installed Scyld onto Mylex hardware RAID (DAC960*) controlled 
drives?
If so, can someone provide information on how they got the Scyld installer 
to recognise the RAID controller and install  onto it?

Choosing 'expert' install on startup asks for a driver disk, so should I 
recompile the module using the beowulf modified kernel sources and stick 
the .o on a floppy disk, then create the necessary /dev entries? Is there 
another way?

Please Advise,
Omar Kilani


From ok at mailcall.com.au  Thu Mar 22 12:12:30 2001
From: ok at mailcall.com.au (Omar Kilani)
Date: Fri, 23 Mar 2001 07:12:30 +1100
Subject: Installing Scyld on the Master ...
In-Reply-To: <5.0.2.1.2.20010323020625.0246bec0@172.17.0.107>
Message-ID: <5.0.2.1.2.20010323071002.0247da00@172.17.0.107>

Argh, feeling bad replying to my own post but for future reference for 
anyone else:

Start the install in 'expert' mode.
'Cancel' the driver disk dialog.
Choose keyboard and language.
At the 'special devices' screen, choose 'Add new devices'.
Find 'Mylex DAC960', choose OK.
And from then on, it works beautifully, the Redhat partition program even 
understands the DAC960 /dev/rd/* notation.

Regards,
Omar Kilani

At 05:49 AM 3/23/01 +1100, you wrote:
>Hello,
>
>Has anyone installed Scyld onto Mylex hardware RAID (DAC960*) controlled 
>drives?
>If so, can someone provide information on how they got the Scyld installer 
>to recognise the RAID controller and install  onto it?
>
>Choosing 'expert' install on startup asks for a driver disk, so should I 
>recompile the module using the beowulf modified kernel sources and stick 
>the .o on a floppy disk, then create the necessary /dev entries? Is there 
>another way?
>
>Please Advise,
>Omar Kilani
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf


From myrridin at wlug.westbo.se  Thu Mar 22 00:02:50 2001
From: myrridin at wlug.westbo.se (Daniel Persson)
Date: Thu, 22 Mar 2001 09:02:50 +0100 (CET)
Subject: Q system on a Sycld cluster.
Message-ID: <Pine.LNX.4.30.0103220856220.19616-100000@wlug.westbo.se>

Hi all,

A while back there was discussion about wich batch system to use on a
Scyld cluster. However, what did you people come up to ?

Wich bacth system could/should one use on Scyld Beowulf2 cluster ?

Is it possible to use for ex PBS ?

What i need is a rather simple system whitout to many fancy features -
suggestions anyone ?

Regards
Daniel

BTW - The mailinglist archive seems to have stopped at February.


-- 
Daniel Persson

Westbo Linux User Group 	---> http://wlug.westbo.se
A swedish site about Gnome 	---> http://wlug.westbo.se/gnome
My personal pages 		---> http://wlug.westbo.se/~myrridin

Dagens kommentar :

A star captain's most solemn oath is that he will give his life, even
his entire crew, rather than violate the Prime Directive.
		-- Kirk, "The Omega Glory", stardate unknown


From ole at scali.no  Thu Mar 22 03:05:21 2001
From: ole at scali.no (Ole W. Saastad)
Date: Thu, 22 Mar 2001 12:05:21 +0100
Subject: BERT 77: Automatic Parallelizer
Message-ID: <3AB9DC71.BC9A0EE3@scali.no>

Hi,
I have some questions about the Bert 77 Automatic Parallelizer 
from Paralogic. (http://www.plogic.com/bert.html)

At first sight it looks nice and shiny, but how is the users
experience? 

Many of my colleges run climate models with OpenMP on fast
sequential machines and would consider MPI based clusters if
they could get some help to make the transition.
This tool might be useful as a start to switch from OpenMP
or sequential code to MPI based code. The task of converting
programs from sequential to parallel is not a trivial one and
I am very interested in how well Paralogic's program perform.

I test would be to crunch the g98 fortran source code through 
and see if it gave any reasonable results. 
Have anyone of you done test with programs this size ? 

The examples shows tremendous speedup by splitting loops over
many nodes, but in real lift things are different.

-- 
Ole W. Saastad, Dr.Scient.
Scali AS P.O.Box 70 Bogerud 0621 Oslo NORWAY 
Tel:+47 22 62 89 68(dir) mailto:ole at scali.no http://www.scali.com 
ScaMPI: bandwidth 150 MB/sec. latency 5us.


From rogers at seuss.chem.upenn.edu  Thu Mar 22 05:52:00 2001
From: rogers at seuss.chem.upenn.edu (Christopher L. Rogers)
Date: Thu, 22 Mar 2001 08:52:00 -0500 (EST)
Subject: Memory Issues: /proc/kcore vs. free
Message-ID: <200103221352.IAA291162@seuss.chem.upenn.edu>

folks:

i am trying to determine whether or not i have
some bad memory or not on some of my beowulf nodes.
One thing i noticed right away is that ls -l of /proc/kcore
maxs out at 940 MB for physical ram greater than 1 GB.

I'm running 2048 MB, which the bios detects. free reports
2074 MB but as i said earlier /proc/kcore reports 940 MB.
(i do have mem=2048M appended in my lilo.conf)

So is this cause for alarm? Which is the true amount the
system sees? I need to routinely run jobs on these nodes
that are about 1 to 1.25 GB., so it makes a difference.

The system in question is a dual PIII Supermicro 370DL3
with their latest bios. The bios reports the 2 GB of ram
as PC133, and the vendor claims it's true registered ECC. 
The kernel is a stock Redhat 7.0 "enterprise" kernel
(2.2.16-22enterprise)

Thanks for all your help.

-Chris Rogers
crogers at sas.upenn.edu


From Dean.Carpenter at pharma.com  Thu Mar 22 06:23:10 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Thu, 22 Mar 2001 09:23:10 -0500
Subject: Hybrid Master.
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47C9@a1mbx01.pharma.com>

Excellent.  That's a direction we want to pursue as well.  I'm still
learning about clusters, so as many how-to documents as possible would be a
great help.
 

-- 
Dean Carpenter 
Principal Architect 
Purdue Pharma 
dean.carpenter at pharma.com 
deano at areyes.com 
94TT :) 

-----Original Message-----
From: Bill Northrup [mailto:bill at billnorthrup.com]
Sent: Wednesday, March 21, 2001 5:51 PM
To: beowulf at beowulf.org
Subject: Hybrid Master.


Hello List,
 
Being both a Mosix and Beowulf enthusiast I would like to combine my masters
on to one machine. I have followed the list for some time and understand
many folks run hybrid systems. However I am running the Scyld release 2
distribution and the kernal versions required by Mosix is that of a plain
vanilla type from kernel.org. My assumptions are that Mosix is kernal
specific and Scyld is not so picky. In short my question is how do I go
about it? Should I install plain Red Hat, plain kernel, Mosix then Scyld
RPMS? Install the Scyld master, plain kernel, Mosix and Scyld again? I am
not a kernel junkie and still pretty deep into the learning curve. I think a
to do list with considerations and caveats would start me down my path, but
feel free to get overly detailed! Please feel free to take it off list with
me as well. If I get it working I promise to write a white paper or
contribute the procedure in some way.
BTW: I am not trying to mix the slaves.
 
Thank-you
 
Bill

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010322/7b6a3b99/attachment.html>

From joelja at darkwing.uoregon.edu  Thu Mar 22 06:52:51 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Thu, 22 Mar 2001 06:52:51 -0800 (PST)
Subject: Memory Issues: /proc/kcore vs. free
In-Reply-To: <200103221352.IAA291162@seuss.chem.upenn.edu>
Message-ID: <Pine.LNX.4.30.0103220649060.4265-100000@twin.uoregon.edu>

compile a kernel with high memory support set to 4GB....

options are:
 < 1GB (off)
 > 1GB <= 4GB
 or
 < 64GB

it's in the processor type and features dialog if you run make menuconfig
on a kernel tree...

joelja

On Thu, 22 Mar 101, Christopher L. Rogers wrote:

> folks:
>
> i am trying to determine whether or not i have
> some bad memory or not on some of my beowulf nodes.
> One thing i noticed right away is that ls -l of /proc/kcore
> maxs out at 940 MB for physical ram greater than 1 GB.
>
> I'm running 2048 MB, which the bios detects. free reports
> 2074 MB but as i said earlier /proc/kcore reports 940 MB.
> (i do have mem=2048M appended in my lilo.conf)
>
> So is this cause for alarm? Which is the true amount the
> system sees? I need to routinely run jobs on these nodes
> that are about 1 to 1.25 GB., so it makes a difference.
>
> The system in question is a dual PIII Supermicro 370DL3
> with their latest bios. The bios reports the 2 GB of ram
> as PC133, and the vendor claims it's true registered ECC.
> The kernel is a stock Redhat 7.0 "enterprise" kernel
> (2.2.16-22enterprise)
>
> Thanks for all your help.
>
> -Chris Rogers
> crogers at sas.upenn.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli				       joelja at darkwing.uoregon.edu
Academic User Services			     consult at gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From gary at umsl.edu  Thu Mar 22 07:39:54 2001
From: gary at umsl.edu (Gary Stiehr)
Date: Thu, 22 Mar 2001 09:39:54 -0600
Subject: Memory Issues: /proc/kcore vs. free
References: <200103221352.IAA291162@seuss.chem.upenn.edu>
Message-ID: <3ABA1CCA.EFA3C914@umsl.edu>

Hi,
	We used to have some nodes that had SuperMicro boards and others that
had a different brand motherboard.  Each node had 128MB of RAM.  The
SuperMicro-based nodes, however, only showed 64MB of RAM when running
free (even though the bios recognized all 128 MB).  Supposedly the bios
on these boards did not properly pass some information to the linux
kernel about the amount of memory.  It was suggested that we boot with

	linux mem=128MB

at the lilo prompt or add

	append="mem=128M"

to /etc/lilo.conf in the "image" sections.

Unfortunately, this did not work for us but we feel that there may have
been other complications keeping this suggestion from working for us. 
We did not have those nodes for much longer and our jobs were not very
memory intensive so we did not investigate it much further.  But I think
that the suggestion above was a good starting point and hopefully it
WILL help you.

-- 
Gary Stiehr
Information Technology Services
University of Missouri - St. Louis
gary at umsl.edu

"Christopher L. Rogers" wrote:
> 
> folks:
> 
> i am trying to determine whether or not i have
> some bad memory or not on some of my beowulf nodes.
> One thing i noticed right away is that ls -l of /proc/kcore
> maxs out at 940 MB for physical ram greater than 1 GB.
> 
> I'm running 2048 MB, which the bios detects. free reports
> 2074 MB but as i said earlier /proc/kcore reports 940 MB.
> (i do have mem=2048M appended in my lilo.conf)
> 
> So is this cause for alarm? Which is the true amount the
> system sees? I need to routinely run jobs on these nodes
> that are about 1 to 1.25 GB., so it makes a difference.
> 
> The system in question is a dual PIII Supermicro 370DL3
> with their latest bios. The bios reports the 2 GB of ram
> as PC133, and the vendor claims it's true registered ECC.
> The kernel is a stock Redhat 7.0 "enterprise" kernel
> (2.2.16-22enterprise)
> 
> Thanks for all your help.
> 
> -Chris Rogers
> crogers at sas.upenn.edu
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From rauch at inf.ethz.ch  Thu Mar 22 08:12:29 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Thu, 22 Mar 2001 17:12:29 +0100 (CET)
Subject: Memory Issues: /proc/kcore vs. free
In-Reply-To: <3ABA1CCA.EFA3C914@umsl.edu>
Message-ID: <Pine.LNX.4.21.0103221706110.16901-100000@maloney.inf.ethz.ch>

On Thu, 22 Mar 2001, Gary Stiehr wrote:
[...]
> Supposedly the bios on these boards did not properly pass some
> information to the linux kernel about the amount of memory.  It was
> suggested that we boot with
> 
> 	linux mem=128MB
> 
> at the lilo prompt or add
> 
> 	append="mem=128M"
> 
> to /etc/lilo.conf in the "image" sections.
> 
> Unfortunately, this did not work for us but we feel that there may have
> been other complications keeping this suggestion from working for us. 

There were times when it was suggested to put a little less memory in
the "mem=..." option than what the machines actually had. In this
example, it might help to use

   append="mem=127M"

but I don't know if this is still true with recent kernels.

- Felix
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From lindahl at conservativecomputer.com  Thu Mar 22 09:57:05 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Thu, 22 Mar 2001 12:57:05 -0500
Subject: parallelizing OpenMP apps
In-Reply-To: <3AB9DC71.BC9A0EE3@scali.no>; from ole@scali.no on Thu, Mar 22, 2001 at 12:05:21PM +0100
References: <3AB9DC71.BC9A0EE3@scali.no>
Message-ID: <20010322125705.A1872@wumpus.hpti.com>

On Thu, Mar 22, 2001 at 12:05:21PM +0100, Ole W. Saastad wrote:

> Many of my colleges run climate models with OpenMP on fast
> sequential machines and would consider MPI based clusters if
> they could get some help to make the transition.

I would suggest checking out SMS:

http://www-ad.fsl.noaa.gov/ac/sms.html

It is a system aimed at the weather community. You add HPF-like
directives, and the system does the rest. We have seen O(100) speedup
on O(100) nodes... for a real code used for production air travel
weather prediction in the US.

It only does some kinds of stencils and some kinds of spectral codes,
and it only does F77, but if it does what your code uses...

-- greg


From timothy.g.mattson at intel.com  Thu Mar 22 11:11:08 2001
From: timothy.g.mattson at intel.com (Mattson, Timothy G)
Date: Thu, 22 Mar 2001 11:11:08 -0800
Subject: parallelizing OpenMP apps
Message-ID: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD67@orsmsx47.jf.intel.com>

Mapping OpenMP onto an HPF-style programming environment can be as-hard or
harder than going straight to MPI.  Before taking such a drastic step, you
should consider your options in the OpenMP space.  

I haven't personnally used it, but the OMNI compiler project at RWCP lets
you run OpenMP programs on a cluster.  I can't find their URL right now (my
web proxy is acting up), but I think you can learn more about it by
following links off the OpenMP web sige (www.openmp.org).  KAI also has a
system that supports moving OpenMP onto clusters --- though I don't think
its an official product yet (you'll have to ask them).

Good luck.

--Tim


-----Original Message-----
From: Greg Lindahl [mailto:lindahl at conservativecomputer.com]
Sent: Thursday, March 22, 2001 9:57 AM
To: beowulf at beowulf.org
Subject: parallelizing OpenMP apps


On Thu, Mar 22, 2001 at 12:05:21PM +0100, Ole W. Saastad wrote:

> Many of my colleges run climate models with OpenMP on fast
> sequential machines and would consider MPI based clusters if
> they could get some help to make the transition.

I would suggest checking out SMS:

http://www-ad.fsl.noaa.gov/ac/sms.html

It is a system aimed at the weather community. You add HPF-like
directives, and the system does the rest. We have seen O(100) speedup
on O(100) nodes... for a real code used for production air travel
weather prediction in the US.

It only does some kinds of stencils and some kinds of spectral codes,
and it only does F77, but if it does what your code uses...

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From romie at pinguin.stttelkom.ac.id  Thu Mar 22 11:19:26 2001
From: romie at pinguin.stttelkom.ac.id (romie)
Date: Fri, 23 Mar 2001 02:19:26 +0700 (JAVT)
Subject: newbie
Message-ID: <Pine.LNX.4.21.0103230211590.19657-100000@pinguin.stttelkom.ac.id>

hi ..

is it possible to establish beowulf paralel computation using 
2 computers ??  1 node server & 1 node client , connected directly each
other with  cross-connect UTP , with ethrnet bonding(2 ethernet) too, so
in this case ,so it needs 4 ethernet card ... 

is it possible ? i hope to see ur comment soon ..  

with love
- romie - 


From bill at billnorthrup.com  Thu Mar 22 11:42:36 2001
From: bill at billnorthrup.com (Bill Northrup)
Date: Thu, 22 Mar 2001 11:42:36 -0800
Subject: newbie
References: <Pine.LNX.4.21.0103230211590.19657-100000@pinguin.stttelkom.ac.id>
Message-ID: <002701c0b308$3f91bd80$2001a8c0@enshq>

Romie,

Welcome to parallel computing. Please feel free to search the list archive,
your questions are answered many times over with great detail. Also take a
look at the www.beowulf.org site, years of reading material as well.

A good list for newbies
beowulf-newbie at fecundswamp.net


Bill


----- Original Message -----
From: "romie" <romie at pinguin.stttelkom.ac.id>
To: <beowulf at beowulf.org>
Sent: Thursday, March 22, 2001 11:19 AM
Subject: newbie


> hi ..
>
> is it possible to establish beowulf paralel computation using
> 2 computers ??  1 node server & 1 node client , connected directly each
> other with  cross-connect UTP , with ethrnet bonding(2 ethernet) too, so
> in this case ,so it needs 4 ethernet card ...
>
> is it possible ? i hope to see ur comment soon ..
>
> with love
> - romie -
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


From ctierney at hpti.com  Thu Mar 22 11:42:06 2001
From: ctierney at hpti.com (Craig Tierney)
Date: Thu, 22 Mar 2001 12:42:06 -0700
Subject: parallelizing OpenMP apps
In-Reply-To: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD67@orsmsx47.jf.intel.com>; from timothy.g.mattson@intel.com on Thu, Mar 22, 2001 at 11:11:08AM -0800
References: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD67@orsmsx47.jf.intel.com>
Message-ID: <20010322124206.B6375@hpti.com>

On Thu, Mar 22, 2001 at 11:11:08AM -0800, Mattson, Timothy G wrote:
> 
> Mapping OpenMP onto an HPF-style programming environment can be as-hard or
> harder than going straight to MPI.  Before taking such a drastic step, you
> should consider your options in the OpenMP space.  
> 
> I haven't personnally used it, but the OMNI compiler project at RWCP lets
> you run OpenMP programs on a cluster.  I can't find their URL right now (my
> web proxy is acting up), but I think you can learn more about it by
> following links off the OpenMP web sige (www.openmp.org).  KAI also has a
> system that supports moving OpenMP onto clusters --- though I don't think
> its an official product yet (you'll have to ask them).
>

RWCP's OMNI compiler is an OpenMP preprocessor. It allows you to
run OpenMP codes on an SMP system as if you really had an OpenMP 
compiler.  It does not work with clusters, unless you have a cluster
of SMP boxes, and you use a combination of OpenMP and MPI.

I have been testing the latest version of OMNI on a couple of
ES40s (2 cpus and 4 cpus).  The have some test OpenMP codes that do
compile cleanly.  I am getting pretty good speed ups with the codes.
I am seeing a 3.2-3.4x for 4 cpus, depending on the test code (NAS
Parallel Benchmarks).  With the memory bandwidth of an ES40, I don't
think I would get much better than that.

It works with C and Fortran 77.  It does not work with Fortran 90.
I would like to try and get MM5 or some other real codes going.
My initial attempt to compile MM5 with OpenMP failed, but right
now I claim pilot error.

Craig
 
> Good luck.
> 
> --Tim
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Greg Lindahl [mailto:lindahl at conservativecomputer.com]
> Sent: Thursday, March 22, 2001 9:57 AM
> To: beowulf at beowulf.org
> Subject: parallelizing OpenMP apps
> 
> 
> On Thu, Mar 22, 2001 at 12:05:21PM +0100, Ole W. Saastad wrote:
> 
> > Many of my colleges run climate models with OpenMP on fast
> > sequential machines and would consider MPI based clusters if
> > they could get some help to make the transition.
> 
> I would suggest checking out SMS:
> 
> http://www-ad.fsl.noaa.gov/ac/sms.html
> 
> It is a system aimed at the weather community. You add HPF-like
> directives, and the system does the rest. We have seen O(100) speedup
> on O(100) nodes... for a real code used for production air travel
> weather prediction in the US.
> 
> It only does some kinds of stencils and some kinds of spectral codes,
> and it only does F77, but if it does what your code uses...
> 
> -- greg
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Craig Tierney (ctierney at hpti.com)
phone: 303-497-3112


From chrisa at ASPATECH.COM.BR  Thu Mar 22 12:53:03 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Thu, 22 Mar 2001 17:53:03 -0300
Subject: Endless RARP Requests
Message-ID: <BD674CDC0A68C642B60038986FBB56DF0138E1@mailsrv.ASPATECH.COM.BR>

During a simple installation of a Master node, at the begining of the
install - 'it' recognizes my two Network cards (eth0, eth1) then it
says, "Sending RARP Requests..." after that I just see dots until
infinity.  What might be the problem?

Thanks,
Chris


From JParker at coinstar.com  Thu Mar 22 13:04:03 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Thu, 22 Mar 2001 13:04:03 -0800
Subject: Someone good with electronics might make good use of this ...
Message-ID: <OF249218EA.DEC0CDFF-ON88256A17.00739CDB@coinstar.com>

G'Day !

http://linux.com/hardware/newsitem.phtml?sid=1&aid=11945

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010322/b825a6f9/attachment.html>

From James.P.Lux at jpl.nasa.gov  Thu Mar 22 14:07:13 2001
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 22 Mar 2001 14:07:13 -0800
Subject: Someone good with electronics might make good use of this ...
Message-ID: <003e01c0b31c$72da76f0$61064f89@cerulean.jpl.nasa.gov>

It's basically a 128 MHz 586 (not a pentium, more like the AM5x86 or the
Cyrix) with 8K L1 cache, FPU, no RAM.

They cost about $60 each in smallish quantities (10-100+)..

Off hand, you might be better off using a high density pentium mobo

-----Original Message-----
From: JParker at coinstar.com <JParker at coinstar.com>
To: beowulf at beowulf.org <beowulf at beowulf.org>
Date: Thursday, March 22, 2001 1:14 PM
Subject: Someone good with electronics might make good use of this ...


>G'Day !
>
>http://linux.com/hardware/newsitem.phtml?sid=1&aid=11945
>
>cheers,
>Jim Parker
>
>Sailboat racing is not a matter of life and death ....  It is far more
>important than that !!!


From newt at scyld.com  Thu Mar 22 14:00:43 2001
From: newt at scyld.com (Daniel Ridge)
Date: Thu, 22 Mar 2001 17:00:43 -0500 (EST)
Subject: Endless RARP Requests
In-Reply-To: <BD674CDC0A68C642B60038986FBB56DF0138E1@mailsrv.ASPATECH.COM.BR>
Message-ID: <Pine.LNX.4.21.0103221658160.4133-100000@eleanor.wdhq.scyld.com>

The Scyld install disk times out if you fail to respond to the directions
on the screen. It presumes that you wish to bring the machine up as a
slave node instead of performing an OS install on the master.

Follow the prompt on the screen when the machine boots to perform an
OS install. You can then use the same disk to boot your nodes (as you
have discovered)

Regards,
	Dan Ridge
	Scyld Computing Corporation

On Thu, 22 Mar 2001, Chris Richard Adams wrote:

> During a simple installation of a Master node, at the begining of the
> install - 'it' recognizes my two Network cards (eth0, eth1) then it
> says, "Sending RARP Requests..." after that I just see dots until
> infinity.  What might be the problem?


From rgb at phy.duke.edu  Thu Mar 22 15:08:04 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 22 Mar 2001 18:08:04 -0500 (EST)
Subject: newbie
In-Reply-To: <Pine.LNX.4.21.0103230211590.19657-100000@pinguin.stttelkom.ac.id>
Message-ID: <Pine.LNX.4.30.0103221804090.7949-100000@ganesh.phy.duke.edu>

On Fri, 23 Mar 2001, romie wrote:

> hi ..
>
> is it possible to establish beowulf paralel computation using
> 2 computers ??  1 node server & 1 node client , connected directly each
> other with  cross-connect UTP , with ethrnet bonding(2 ethernet) too, so
> in this case ,so it needs 4 ethernet card ...
>
> is it possible ? i hope to see ur comment soon ..
>
> with love
> - romie -

One can certainly use two computers in parallel on a computation, even
if they are just two computers connected directly with crossover UTP.
I've never tried channel bonding in this way and don't know enough to
know if it is possible.

It is useful to remember, though, that 100 Mbps switches are now
extremely cheap, and would let you add more nodes.  You are pretty much
limited to a speedup of two with only two nodes.  You can "get started"
with two, but you'll likely want to add more, and a switch makes this
easy.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From sean at eevillage.com.tw  Thu Mar 22 18:20:56 2001
From: sean at eevillage.com.tw (Sean)
Date: Fri, 23 Mar 2001 10:20:56 +0800
Subject: I need a start point
Message-ID: <004701c0b33f$e5220f20$8a01a8c0@emphasisnetworks.com>

Hi all.

I'm a newbie in beowulf and I got mosix and pvm clusters.
I'm interested in coding some program about IC simulator and biotechnology or something like that...
would somebody give me a hint or path to get in that field ?

any response is appreciated.
Sean.


From agrajag at linuxpower.org  Thu Mar 22 20:11:58 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 22 Mar 2001 20:11:58 -0800
Subject: Q system on a Sycld cluster.
In-Reply-To: <Pine.LNX.4.30.0103220856220.19616-100000@wlug.westbo.se>; from myrridin@wlug.westbo.se on Thu, Mar 22, 2001 at 09:02:50AM +0100
References: <Pine.LNX.4.30.0103220856220.19616-100000@wlug.westbo.se>
Message-ID: <20010322201158.L13901@kotako.analogself.com>

On Thu, 22 Mar 2001, Daniel Persson wrote:

> Hi all,
> 
> A while back there was discussion about wich batch system to use on a
> Scyld cluster. However, what did you people come up to ?
> 
> Wich bacth system could/should one use on Scyld Beowulf2 cluster ?
> 
> Is it possible to use for ex PBS ?
> 
> What i need is a rather simple system whitout to many fancy features -
> suggestions anyone ?

I wrote a very simple system that you might be interested in.  It was
designed to run programs that weren't programed for parallel processing,
but for which you can specify certain command line options or input
(through stdin) to specify how the program should run differently on
each node.  It also redirects the output to a file.  Each node the job
is run on gets its own file for the output.

There are a few restraints on it based on how it does the I/O
redirection.  It requires that each job you run have a control
directory.  This directory stores the input you gives the program on
each node, and is where the output goes.  Due to this, it needs to be
located somewhere that is nfs mounted by the slave nodes (such as in
/home).  The actual program being run must also be on an nfs mounted
filesystem.

If anyone is interested in this simple system, let me know and I'll
clean it up for release.

Hopefully in the future I will modify it to use some of the tricks that
bpsh does so that the control directory and program being run won't have
to be nfs mounted. (Are we going to see a libbpsh?)  However, this
probablly won't be there if I do a release this month or next.

> 
> BTW - The mailinglist archive seems to have stopped at February.

I've noticed this as well and pointed it out to the person responsible,
but it still doesn't seem to be fixed yet :(


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010322/0e1741f3/attachment.sig>

From ok at mailcall.com.au  Fri Mar 23 14:10:57 2001
From: ok at mailcall.com.au (Omar Kilani)
Date: Sat, 24 Mar 2001 09:10:57 +1100
Subject: 3COM 3C905C(X) and Scyld
Message-ID: <5.0.2.1.2.20010324090619.028a1ec0@172.17.0.107>

Hello,

I have set up a cluster of 8 1.2 Ghz Athlons, each one has a 3COM 3C905CX.
The problem is that, with the 3c59x.o module, the cards do not recieve any 
data.
This is described in Beowulf digest, Vol 1 #274.
So I am using the 3COM supplied 3c90x.o module, which recieves data, but 
theres one problem:
When I boot a node, the driver is reported as inserted 'boot: installing 
module '3c90x', I see a 3COM message telling me the card has been found, 
and then the RARP process takes place, in which, the node recieves an IP 
address. After this, the machine hangs with:

SIOCSFFLAGS: No such device

On the screen.
I cannot go past this point.
Please advise.

Best Regards,
Omar Kilani


From chrisa at ASPATECH.COM.BR  Fri Mar 23 06:06:37 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Fri, 23 Mar 2001 11:06:37 -0300
Subject: MPI/Beowulf vs. SM Programming/OpenMP
Message-ID: <BD674CDC0A68C642B60038986FBB56DF015C5E@mailsrv.ASPATECH.COM.BR>

Hi everyone;

I've been coding for almost 5 years in a mix of C, Java and now Python.
I am now focused on learning more about parallel programming for
applications/algorithms related to genetic sequencial analysis within
databases - bioinformatics.  The last month or so I've been studying
about different methods that exist and I'm getting confused about where
to start.

I was convinced after reading material on Beowulf that it was the way to
go, but I've recently stumbled upon the OpenMP site and read more about
shared memory techniques.  It seems to me for the type of applications
I'm focusing on...this is a much better approach because I don't have to
spend so much time learning MPI and all the communications.  I can just
focus on learning the algorthms (this is their big sell point anyway).  

1.) Is this really true?

2.) Can anyone point out how Beowulf/MPI is the better solution and
learning path?

3.) Is their room for both Beowulf/MPI and Shared-Mem tech. in the
future?

I would really appreciate hearing your feedback.

Regards,
Chris 


From parkw at better.net  Fri Mar 23 06:23:49 2001
From: parkw at better.net (William Park)
Date: Fri, 23 Mar 2001 09:23:49 -0500
Subject: 3COM 3C905C(X) and Scyld
In-Reply-To: <5.0.2.1.2.20010324090619.028a1ec0@172.17.0.107>; from ok@mailcall.com.au on Sat, Mar 24, 2001 at 09:10:57AM +1100
References: <5.0.2.1.2.20010324090619.028a1ec0@172.17.0.107>
Message-ID: <20010323092349.A3106@better.net>

On Sat, Mar 24, 2001 at 09:10:57AM +1100, Omar Kilani wrote:
> Hello,
> 
> I have set up a cluster of 8 1.2 Ghz Athlons, each one has a 3COM
> 3C905CX.  The problem is that, with the 3c59x.o module, the cards do
> not recieve any data.  This is described in Beowulf digest, Vol 1
> #274.  So I am using the 3COM supplied 3c90x.o module, which recieves
> data, but theres one problem: When I boot a node, the driver is
> reported as inserted 'boot: installing module '3c90x', I see a 3COM
> message telling me the card has been found, and then the RARP process
> takes place, in which, the node recieves an IP address. After this,
> the machine hangs with:
> 
> SIOCSFFLAGS: No such device
> 
> On the screen.  I cannot go past this point.  Please advise.
> 
> Best Regards, Omar Kilani

I presume you're talking about 2.2.x kernel, since 2.4.2 works with
3c905CX.  I had the same problem with 3c59x-2.2.16/18.  Try Andrew
Morton's 3c59x-2.2.19pre11 <http://www.uow.edu.au/~andrewm/linux/> or
kernel-2.2.19pre18.

:wq --William Park, Open Geometry Consulting, Linux/Python, 8 CPUs.


From frank at joerdens.de  Fri Mar 23 06:48:52 2001
From: frank at joerdens.de (Frank Joerdens)
Date: Fri, 23 Mar 2001 15:48:52 +0100
Subject: OT: computationally hard or not (algorithm question)
Message-ID: <20010323154852.A9050@rakete.joerdens.de>

I've been lurking on this this list for a few months cuz I think
beowulfs are cool - but so far I've had neither the money nor any use
for one. Now I have a problem which might fall into the general vicinity
of beowulfery (but then again, it might not). You might have a pointer
to the relevant resources (Yes! I am willing and able to RTFM!):

Consider an n-dimensional array containing something on the order of
10^4 (i.e. thens of thousands) elements, where each number may assume an
integer value between 1-12. For the sake of the argument (it's easier to
visualize; actually, n would be something on the order of a dozen),
assume that n is 3, so you're basically looking at a box-shaped cloud of
dots in 3-D space, where each dot's position is defined by its
three-dimensional carthesian co-ordinates which are stored in the array.

Now, take any one of those dots, search for the 10 (some figure on this
order of magnitude) dots which are closest to it, and order those by
proximity to the origin of the search.  This sounds pretty hard,
computationally. Not for 3 Dimensions, since there the number of
possible positions is only 12^3 = 1728, but for 12, its 12^12 =
8916100448256. I guess you'd have to find some efficient shortest-pair
algorithm that works for n dimensions (my algorithm book only has one
that works for 2), find the shortest pair, remove it from the array,
then find the next value etc..

Does anyone have an idea, or a pointer to some reading matter about it,
as to how computationally expensive that would be? Any guess as to what
kind of hardware you'd have to throw at it (I could settle for less
dimensions if it turns out that I can't afford the hardware) when I want
the result set within a couple of seconds at the _very_ most?

Does anyone have a link to an article about the best algorithm for this
kind of problem?

Many thanks in advance,
Frank 


From jared_hodge at iat.utexas.edu  Fri Mar 23 06:50:11 2001
From: jared_hodge at iat.utexas.edu (Jared Hodge)
Date: Fri, 23 Mar 2001 08:50:11 -0600
Subject: BIOS
Message-ID: <3ABB62A3.A5290DC1@iat.utexas.edu>

Howdy (That's Texan for "hello"),
	I've got a question about setting BIOS settings on a cluster's nodes. 
The way I've been going about making changes is moving the keyboard and
monitor to all of the nodes.  I'm sure this isn't what everyone does.  I
imagine there's a way to put the settings on a floppy or maybe even send
them over  the network, but I don't have any experience with this.  Has
anyone done this?  What's the typical way of doing this on one of the
really large systems where moving a keyboard and monitor from node to
node would be impractical?
-- 
Jared Hodge
Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
FAX: 512-471-9096
Email: Jared_Hodge at iat.utexas.edu


From demeler at bioc09.v19.uthscsa.edu  Fri Mar 23 07:03:09 2001
From: demeler at bioc09.v19.uthscsa.edu (Borries Demeler)
Date: Fri, 23 Mar 2001 09:03:09 -0600 (CST)
Subject: Bioinformatics/Beowulf applications
Message-ID: <200103231503.JAA13937@bioc09.v19.uthscsa.edu>

Greetings,

I have been put in charge of developing our new bioinformatics core
facility. Having used Beowulf computers in my research for simple
Monte Carlo (hydrodynamic modeling of biological macromolecules), I am 
planning to expand onto other areas of Beowulf implementations. 

To get started, I am looking for links (perhaps a well organized
web site?) and contacts to groups that have implemented Beowulfs in
bioinformatic applications. In particular I am looking for Beowulf
enabled software/applications in:

* molecular modeling/statistical mechanics
* structure solutions for X-ray crystallography data
* structure solutions for NMR data
* structure prediction software
* sequence analysis/genomics 
* machine vision/pattern recognition/neural networks (self-organizing maps)
* proteomics
others?

If you know of a website with relevant links or are working on
applications that may be potentially useful for other researchers,
I would be grateful for some feedback.

Thank you very much, -Borries

*******************************************************************************
* Borries Demeler, Ph.D.                                                      *
* The University of Texas Health Science Center at San Antonio                *
* Dept. of Biochemistry, 7703 Floyd Curl Drive, San Antonio, Texas 78229-3900 *
* Voice: 210-567-6592, Fax: 210-567-4575, Email: demeler at biochem.uthscsa.edu  *
*******************************************************************************


From timm at fnal.gov  Fri Mar 23 07:13:52 2001
From: timm at fnal.gov (Steven Timm)
Date: Fri, 23 Mar 2001 09:13:52 -0600 (CST)
Subject: BIOS
In-Reply-To: <3ABB62A3.A5290DC1@iat.utexas.edu>
Message-ID: <Pine.LNX.4.31.0103230910420.1174-100000@snowball.fnal.gov>

It depends to some extent on which motherboard you are running.
The Intel L440GX series, which is most of what we have here at FNAL
currently, allows to direct the BIOS I/O out one of the COM ports,
usually COM2.  We have some clusters where we have hooked these
up to a console server and been able to access all the BIOS from there.

Intel has a very nice client to do this on the Windows side as well,
unfortunately it can only cover four nodes at once and they are not
planning to expand it.

It would in theory be possible to extend vacm (from VA Linux) to do
this but the last I talked to their developers they weren't
moving this way.

Steve Timm

------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

On Fri, 23 Mar 2001, Jared Hodge wrote:

> Howdy (That's Texan for "hello"),
> 	I've got a question about setting BIOS settings on a cluster's nodes.
> The way I've been going about making changes is moving the keyboard and
> monitor to all of the nodes.  I'm sure this isn't what everyone does.  I
> imagine there's a way to put the settings on a floppy or maybe even send
> them over  the network, but I don't have any experience with this.  Has
> anyone done this?  What's the typical way of doing this on one of the
> really large systems where moving a keyboard and monitor from node to
> node would be impractical?
> --
> Jared Hodge
> Institute for Advanced Technology
> The University of Texas at Austin
> 3925 W. Braker Lane, Suite 400
> Austin, Texas 78759
>
> Phone: 512-232-4460
> FAX: 512-471-9096
> Email: Jared_Hodge at iat.utexas.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


From agrajag at linuxpower.org  Fri Mar 23 07:54:17 2001
From: agrajag at linuxpower.org (Jag)
Date: Fri, 23 Mar 2001 07:54:17 -0800
Subject: BIOS
In-Reply-To: <3ABB62A3.A5290DC1@iat.utexas.edu>; from jared_hodge@iat.utexas.edu on Fri, Mar 23, 2001 at 08:50:11AM -0600
References: <3ABB62A3.A5290DC1@iat.utexas.edu>
Message-ID: <20010323075417.M13901@kotako.analogself.com>

On Fri, 23 Mar 2001, Jared Hodge wrote:

> Howdy (That's Texan for "hello"),
> 	I've got a question about setting BIOS settings on a cluster's nodes. 
> The way I've been going about making changes is moving the keyboard and
> monitor to all of the nodes.  I'm sure this isn't what everyone does.  I
> imagine there's a way to put the settings on a floppy or maybe even send
> them over  the network, but I don't have any experience with this.  Has
> anyone done this?  What's the typical way of doing this on one of the
> really large systems where moving a keyboard and monitor from node to
> node would be impractical?

You might want to look into getting a KVM.  They can be expensive, but
they're extremely usefull.  You plug a monitor, keyboard, and mouse into
the KVM, then hook the KVM up to the monitor, keyboard, and mouse ports
of whatever machines you're interested in.  Then with either a touch of
the button on the KVM, or hitting certain hotkeys on the keyboard
(depending on how fancy your KVM is), your keyboard/mouse/monitor will
be used for whatever system you want.  KVMs are also smart and make all
the machines its attached to always think there is a
monitor/keyboard/mouse attached, even if you're using a different system
on the KVM at the moment.

Hope this helps,


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010323/3ee83df5/attachment.sig>

From anders.lennartsson at foi.se  Tue Mar  6 07:53:49 2001
From: anders.lennartsson at foi.se (Anders Lennartsson)
Date: Tue, 06 Mar 2001 16:53:49 +0100
Subject: problems with etherchannel and NatSemi DP83815 cards
Message-ID: <3AA5080D.9E37F163@foi.se>

Hi

BACKGROUND:

I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes,
each a PPro 200 :( but there may be more/other stuff coming :).
Considering the costs, we settled for Netgear 311 ethernet cards, for
which there is support in 2.4.x kernels. Patches are available for
kernels 2.2.x,
but since 2.4 is here... 
I have checked and the driver is a slightly modified version derived
from natsemi.c
available on www.scyld.com. There are some additions in the later not
included in the
one provided in the kernel source though.

Initially I put one card in each machine and verified that everything
worked.
I tested with NTtcp (netperf derivative?) and the the throughput
asymptotically
went up to about 90Mbits per second when two cards were connected
through a 100Mbps
switch (where are the last 10?).

Then I set out for etherchannel bonding.
It was a bit tricky to find a working ifenslave.c,
the one on www.beowulf.org seemed old and I found a newer at
pdsf.nersc.gov/linux/
Then it seemed to work after doing:

ifconfig bond0 192.168.1.x netmask 255.255.255.0 up
./ifenslave bond0 eth0
(bond0 gets the MAC adress from eth0)
./ifenslave bond0 eth1 

When testing the setup by ftping a large file between two nodes
messages of the following type was output repeatedly on the console:

ethX ... Something wicked happened! 0YYY
where X was 0 or 1 and YYY was one of 500, 700, 740, 749, 749, see
below.

Same thing happened when running NPtcp as package size came above a few
kbytes, speeds approx 50MBits per second.

QUESTIONS:

Anyone got ideas as to the nature/solution of this problem?
I suppose the PCI interface on these particular motherboards may play a
significant
role. Maybe the driver itself? Or is just the processor too slow?

Does anyone have experience of this with for instance 3c905?
Otherwise a very stable card IMHO.
It is about three times more expensive which isn't that much for
one or two, although I could imagine substantial savings
for a large cluster. But if my hours are included ...

Regards,
Anders

SOME DETAILED INFO:

>From syslog, kernel identifying network cards: (eth2 is for accessing from
outside the dedicated networks)

Mar  1 21:30:53 beo101 kernel:  
http://www.scyld.com/network/natsemi.html
Mar  1 21:30:53 beo101 kernel:   (unofficial 2.4.x kernel port, version
1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder)
Mar  1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000,
00:02:e3:03:da:87, IRQ 12.
Mar  1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000,
00:02:e3:03:de:43, IRQ 10.
Mar  1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000,
00:02:e3:03:dc:2c, IRQ 11.
Mar  1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869
advertising 05e1.

some lines of the wicked message: (above those are the two lines where
eth0 and eth1 are reported when ifenslave is run)

Mar  1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok)
Mar  1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on
negotiated link capability.
Mar  1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s
Mar  1 21:35:32 beo101 ntpd[182]: kernel pll status change 41
Mar  1 21:35:32 beo101 ntpd[182]: synchronisation lost
Mar  1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on
negotiated link capability.
Mar  1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD (  if [ -x
/usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null
2>&1; fi)
Mar  1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:12 beo101 last message repeated 2 times
Mar  1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:13 beo101 last message repeated 2 times
Mar  1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 last message repeated 3 times
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500.

The result of ifconfig:

bond0     Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:986886789 (941.1 Mb)

eth0      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907798 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915439 errors:1776 dropped:0 overruns:1776
carrier:1776
          collisions:0 txqueuelen:100 
          RX bytes:435552233 (415.3 Mb)  TX bytes:491795214 (469.0 Mb)
          Interrupt:12 

eth1      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907768 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915466 errors:1748 dropped:0 overruns:1748
carrier:1748
          collisions:0 txqueuelen:100 
          RX bytes:434992308 (414.8 Mb)  TX bytes:489766183 (467.0 Mb)
          Interrupt:10 Base address:0x2000 

eth2      Link encap:Ethernet  HWaddr 00:02:E3:03:DC:2C  
          inet addr:150.227.64.210  Bcast:150.227.64.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:13122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1032660 (1008.4 Kb)  TX bytes:943713 (921.5 Kb)
          Interrupt:11 Base address:0x4000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3904  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:552 (552.0 b)  TX bytes:552 (552.0 b)


From andres at chem.duke.edu  Tue Mar 20 08:48:27 2001
From: andres at chem.duke.edu (Gerardo Andres Cisneros)
Date: Tue, 20 Mar 2001 11:48:27 -0500 (EST)
Subject: Cluster Question (fwd)
Message-ID: <Pine.GSO.4.33.0103201141020.5546-100000@mole>

Hello All,

As I have said below, I have built a very small cluster (8 nodes) running
a slightly modified version of RedHat Linux 6.2 and I'm trying to run a
parallel version of a computational chemistry program (g98).

This program uses Linda for the paralellization but I'm having problems
with it.

As stated below I'm having problems with either g98 or Linda killing the
processes on the slave nodes once they're done.  We've looked into a bunch
of things including hardware malfunction but everything seems Ok.

We have checked almost everything Dr. Brown suggested as per his
experience with PVM (included below) but we can find no problems in the
Linda conf file or the UID's belonging to a different user or the dameons
not running.

I was wondering if anyone out there is using Linda and/or g98 and has
encountered similar problems?.

Any help is greatly appreciated.

I would also very much apreciate if you could reply directly to me since
I'm not subscribed to this list.

Thank you very much in advance,

Best Regards,

Andres

--
G. Andres Cisneros
Department of Chemistry
Duke University
andres at chem.duke.edu

---------- Forwarded message ----------
Date: Tue, 20 Mar 2001 11:11:28 -0500 (EST)
From: Robert G. Brown <rgb at phy.duke.edu>
To: Gerardo Andres Cisneros <andres at chem.duke.edu>
Subject: Re: Cluster Question

On Tue, 20 Mar 2001, Gerardo Andres Cisneros wrote:

>
> Dear Prof. Brown,
>
> I'm a grad student working for Dr. W. Yang at the Chemistry Dept.
>
> We have built a beowulf cluster using 8 Dell PC's donated by intel, i
> installed Dulug Linux 6.2 on all of them and I am now trying to run some
> programs in parallel.
>
> Specifically I'm trying to run Gaussian98 on it so I had to download Linda
> which is basically software based shared memory (virtual shared memory).
>
> I was wondering if you had ever used this software and if so if I could
> get some pointers.

Unfortunately I've never used G98 or Linda either one, so I don't know
how helpful I can be.  I'd recommend posting the problem to the beowulf
list though, as there are probably folks out there who have used the two
together.

> My problem is that every time I try to run a big job on more than one node
> the program crashes before finnishing.  The program is supposed to kill
> the processes on the slave nodes but it doesn't do it so they just sit on
> the slave nodes occupying memory until eventually one of the nodes just
> runs out of memory and the process dies.
>
> If I do a run with a veryverbose flag for linda I get a bunch of "Killed
> by signal 15" messages stating that it killed the remote processes when
> they're done but it doesn't actually do it.
>
> A message to CCL produced a bunch of replies telling me to upgrade the
> kernel which I did (from 2.2.16-3 to 2.2.17-4) but still no go.
>
> Somebody else told me that he once had a simmilar problem but it was
> caused by bad grounding of his network cards so static electricity was
> building up and crashing his machines but I doubt that is the case here
> since the network card is chipset to the motherboard.  We have 8 Dell
> Optiplex (I'm sorry I didn't mention that before).
>
> I would very much appretiate any suggestions you might have on this.

I doubt very much that it is static electricity, and our Dells (probably
from the same batch as yours) are rock stable under load and running a
nearly identical setup.  Besides, I can only assume that all the chassis
are plugged into properly grounded three prong plugs and sit on a rack
of some sort as well.  I've never had any instabilities of any systems
anywhere that I could identify with static electricity although perhaps
you might if you had some sort of active source of high voltage nearby
(a van DeGraff accelerator, a tesla coil, or some such).  Ordinarily
the ground wire of the power cable is connected to the chassis and
absolutely prevents the buildup of static on connected components.

Besides, this would be more likely to kill your whole computer than to
just shut down one particular process.  You haven't had any problems
running e.g. NFS have you?  Or connecting and transferring large files
via scp?  Why would a hardware problem pick on G98 with this whole raft
of things to choose from?

A problem in Linda seems much, much more likely especially given that it
is failing to to successfully kill the remote processes when it claims
that it is doing so.  I've encountered the identical problem in recent
versions of PVM -- the pvm_kill command is there, but I'll be damned if
I could ever make it actually kill off the slaves in a master-slave
calculation.  Curiously, they could be killed off from the daemon
command interface, so PVM had the capability -- there was just some sort
of bug in the command implementation.

I wish I could be of some help to you as you try to figure this out, but
there isn't a lot I can think of trying without any hands on experience
with Linda/G98.  One thing might be permissions -- perhaps the remote
slaves are being spawned but end up belonging to a UID that doesn't
correspond with the source of the kill signal so that the kill signal is
ignored, for example.  If you can, look in the /var/log/messages on the
slave nodes and see what kinds of things are being logged at the time of
a kill.  Look in the slave sources and see what the signal handler is
doing.  Snoop the net and verify that there are packets being sent that
actually contain the kill signal.  Run a remote host monitor tool (e.g.
procstatd and watchman from the brahma site in physics) on the nodes and
watch e.g. their memory consumption and network and CPU load -- is the
problem a simple memory leak somewhere?

Still, I think your best bet is the beowulf list itself.  Surely
somebody on it can help you better than I am able to.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From charr at lnxi.com  Mon Mar  5 09:09:33 2001
From: charr at lnxi.com (Cameron Harr)
Date: Mon, 05 Mar 2001 10:09:33 -0700
Subject: Typical hardware
References: <759FC8B57540D311B14E00902727A0C002EC475C@a1mbx01.pharma.com>
Message-ID: <3AA3C84D.E423912@lnxi.com>

If you could get two duals per 1U, that'd be great density. I must warn
you though of heating issues. Even if you manage to get the logistics of
your idea to work, heat will be a huge concern. So if you can get the
company to drop the computer room to a refrigerator and have the cost
come out of their budget, you may ok.

"Carpenter, Dean" wrote:
> 
> We're just now beginning to mess around with clustering - initial
> proof-of-concept for the local code and so on.  So far so good, using spare
> equipment we have lying around, or on eval.
> 
> Next step is to use some "real" hardware, so we can get a sense of the
> throughput benefit.  For example, right now it's a mishmosh of hardware
> running on a 3Com Switch 1000, 100m to the head node, and 10m to the slaves.
> The throughput one will be with 100m switched all around, possibly with a
> gig uplink to the head node.
> 
> Based on this, we hunt for money for the production cluster(s) ...
> 
> What hardware are people using ?  I've done a lot of poking around at the
> various clusters linked to off beowulf.org, and seen mainly two types :
> 
> 1.  Commodity white boxes, perhaps commercial ones - typical desktop type
> cases.  These take up a chunk of real estate, and give no more than 2 cpus
> per box.  Lots of power supplies, shelf space, noise, space etc etc.
> 
> 2.  1U or 2U rackmount boxes.  Better space utilization, still 2 cpus per
> box, but costing a whole lot more $$$.
> 
> We, like most out there I'm sure, are constrained, by money and by space.
> We need to get lots of cpus in as small a space as possible.  Lots of 1U
> VA-Linux or SGI boxes would be very cool, but would drain the coffers way
> too quickly.  Generic motherboards in clone cases is cheap, but takes up too
> much room.
> 
> So, a colleague and I are working on a cheap and high-density 1U node.  So
> far it looks like we'll be able to get two dual-CPU (P3) motherboards per 1U
> chassis, with associated dual-10/100, floppy, CD and one hard drive.  And
> one PCI slot.  Although it would be nice to have several Ultra160 scsi
> drives in raid, a generic cluster node (for our uses) will work fine with a
> single large UDMA-100 ide drive.
> 
> That's 240 cpus per 60U rack.  We're still working on condensed power for
> the rack, to simplify things.  Note that I said "for our uses" above.  Our
> design goals here are density and $$$.  Hence some of the niceties are being
> foresworn - things like hot-swap U160 scsi raid drives, das blinken lights
> up front, etc.
> 
> So, what do you think ?  If there's interest, I'll keep you posted on our
> progress.  If there's LOTS of interest, we may make a larger production run
> to make these available to others.
> 
> --
> Dean Carpenter
> deano at areyes.com
> dean.carpenter at pharma.com
> dean.carpenter at purduepharma.com
> 94TT :)
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Cameron Harr
Applications Engineer
Linux NetworX Inc.
http://www.linuxnetworx.com


From kinghorn at pqs-chem.com  Mon Mar 19 12:15:58 2001
From: kinghorn at pqs-chem.com (Donald B. Kinghorn)
Date: Mon, 19 Mar 2001 14:15:58 -0600
Subject: AMD annoyed...
References: <Pine.LNX.4.30.0103191256480.12601-100000@ganesh.phy.duke.edu>
Message-ID: <3AB668FE.58A10E26@pqs-chem.com>

... I sent a note to a "good guy" I know at AMD ... hopefully he'll let the right people know that the beowulf
community is REALLY interested in what they have to offer. ... we'll see if they respond ...

-Don


From patrick at myri.com  Mon Mar  5 08:54:24 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Mon, 05 Mar 2001 11:54:24 -0500
Subject: [Fwd: 8 node cluster help!]
References: <F102davN3qljWwY0Vnk00003b2b@hotmail.com> <3AA3B670.1060703@lnxi.com>
Message-ID: <3AA3C4C0.AD8CBE67@myri.com>

nate fuhriman wrote:
 
> This is the result with a data size of 4000. 8000 crashed the machine.
> Remember this is with HEAVY swapping because it was on a single machine.
> (hd light was constant)

> W00L2L2         4000     1     2     2           52143.23          8.187e-04

If it's swapping, it's normal that the performance is bad, like
everywhere in the HPC world. A matrix of 4000 needs (4000*4000*8) = 125
MB at the application level. 
The benchmark uses also 4 processes (2x2) : if it's on a single machine,
I hope you have 4 processors in the box ;-)

To forecast the result of HPL (High perf Linpack), you can take the peak
of DGEMM reported by ATLAS, multiply it by the number of processors and
then multiply by 0.75 (rough efficiency of 75 %, may be 50 % for cheap
interconnect) to get the total performance. So 8 nodes with a Pentium
III at 550 MHz, assuming that your are using ATLAS to generate the BLAS
and you have enough memory in each boxe to reach a point close of the
DGEMM peak, should get something between 4 and 6 GFlops.

If any machine swaps, the performance goes to the toilettes :-))

Hope it helps.

-- 
Patrick Geoffray

---------------------------------------------------------------
|      Myricom Inc       |  University of Tennessee - CS Dept |
| 325 N Santa Anita Ave. |   Suite 203, 1122 Volunteer Blvd.  |
|   Arcadia, CA 91006    |      Knoxville, TN 37996-3450      |
|     (626) 821-5555     |      Tel/Fax : (865) 974-0482      |
---------------------------------------------------------------


From chr at kisac.cgr.ki.se  Tue Mar 20 12:32:34 2001
From: chr at kisac.cgr.ki.se (Christian Storm)
Date: Tue, 20 Mar 2001 21:32:34 +0100 (MET)
Subject: NFS file server performance
Message-ID: <Pine.OSF.4.21.0103202113010.25175-100000@dbm.cgr.ki.se>

Hi,

I just took over a Beowulf cluster and I'm having having a great time
reconfiguring everything ... :)
It is partly used on large databases (up to 20 GB). Local storage is
not possible therefore the databases reside on a disk of a dedicated
file server that is mounted on all nodes of the cluster.

Here are the questions:

1. To improve performance two additional networks card were put into the
file server. Then the cluster was splitted in three networks (all sitting
on the same switch). Each subnetwork is mounting the file throw a
different NIC. These *seems* to work. But it is rather static and it is
not very elegant ... .
I experienced with the new 2.4 bridiging feature (by assigning all 3 NICs
to a bridge), but it just seems to add redundancy, not performance. I
assume some kind of channel bonding would be needed - as far as I know
supported by the network-cards (3c980) but not by the driver (3c90x).

Anybody knows a solution ?

2. What would be good number of NFS Daemons to run on the file-server ?
(accessed by 12 nodes through 3 NICs, PIII500 system with SCSI)
Currently I'm running 16 with socket input queue resized to 1MB.

Thanks in advance
Christian


From sean at emphasisnetworks.com  Wed Mar 21 02:25:23 2001
From: sean at emphasisnetworks.com (Sean)
Date: Wed, 21 Mar 2001 18:25:23 +0800
Subject: I need a start point
Message-ID: <00ec01c0b1f1$48dc3680$8a01a8c0@emphasisnetworks.com>

Hi all.

I'm a newbie in beowulf and I got mosix and pvm clusters.
I'm interested in coding some program about IC simulator and biotechnology or something like that...
would somebody give me a hint or path to get in that field ?

any response is appreciated.
Sean.


From sean at emphasisnetworks.com  Thu Mar 22 17:44:43 2001
From: sean at emphasisnetworks.com (Sean)
Date: Fri, 23 Mar 2001 09:44:43 +0800
Subject: I need a start point
Message-ID: <003b01c0b33a$d5971fa0$8a01a8c0@emphasisnetworks.com>

Hi all.

I'm a newbie in beowulf and I got mosix and pvm clusters.
I'm interested in coding some program about IC simulator and biotechnology or something like that...
would somebody give me a hint or path to get in that field ?

any response is appreciated.
Sean.


From bill at billnorthrup.com  Wed Mar 21 13:34:11 2001
From: bill at billnorthrup.com (Bill Northrup)
Date: Wed, 21 Mar 2001 13:34:11 -0800
Subject: Hybrid Master..
Message-ID: <000b01c0b24e$ab6bdd20$2048000a@enshq>

Hello List,

Being both a Mosix and Beowulf enthusiast I would like to combine my masters on to one machine. I have followed the list for some time and understand many folks run hybrid systems. However I am running the Scyld release 2 distribution and the kernal versions required by Mosix is that of a plain vanilla type from kernel.org. My assumptions are that Mosix is kernal specific and Scyld is not so picky. In short my question is how do I go about it? Should I install plain Red Hat, plain kernel, Mosix then Scyld RPMS? Install the Scyld master, plain kernel, Mosix and Scyld again? I am not a kernel junkie and still pretty deep into the learning curve. I think a to do list with considerations and caveats would start me down my path, but feel free to get overly detailed! Please feel free to take it off list with me as well. If I get it working I promise to write a white paper or contribute the procedure in some way.

Thank-you

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010321/071538c4/attachment.html>

From blah at kvack.org  Tue Mar 20 07:43:11 2001
From: blah at kvack.org (Benjamin C.R. LaHaise)
Date: Tue, 20 Mar 2001 10:43:11 -0500 (EST)
Subject: (no subject)
Message-ID: <Pine.LNX.3.96.1010320103453.31564B-100000@kanga.kvack.org>

[Please note: I'm not on the Beowulf list, so cc me on any comments]

jakob at unthought.net wrote:

> Two clients can't mount the same FS over nbd though...

You can for read only data (yes, changing the data requires remounting),
and many workloads have a large readonly workset (ie executables and some
data files).

For the case of g98 a per-client disk image provides the space and
performance benefits of a hard drive with the managability and ease of
maintenance networked servers offer.  We know that NFS sucks, and we'd
like to use something better.  This isn't perfect, but it's better than
NFS.

> The clients will all have meta-data caches, and they will be
> inconsistent even though the block data device is the same on the
> server.

That's the entire point of using nbd: unneeded network traffic is avoided,
and the server is simpler and faster.

		-ben


From bill at billnorthrup.com  Mon Mar 19 13:44:40 2001
From: bill at billnorthrup.com (Bill Northrup)
Date: Mon, 19 Mar 2001 13:44:40 -0800
Subject: Cluster and RAID 5 Array bottleneck.( I believe)
References: <Pine.LNX.4.10.10010061355570.9348-100000@vaio.greennet> <3AB0F6DF.9341A27E@grantgeo.com>
Message-ID: <002a01c0b0bd$cd9893d0$2048000a@enshq>

Leonardo,

Hi.. Check for a duplex issue, it's a good idea to hard configure the speed
and duplex on both the server and switch; especially with Cisco products. If
your switch can allow it you might want to fiddle with port priorities but
this is mostly for fine tuning. Are your port counters reporting anything?

There are some other issues that come to mind but every install and
configuration is unique, I will paint with broad brush and hope I hit
something.
A few things pop into my mind when funneling by a factor of 10 from a gig-e
to fast-e, buffers get hammered and the TCP window size. Try opening up the
TCP window size and map your performance, you may find a better number.

You may be able to get a good feel for the problem by trying to isolate it..
Connect a client directly via crossover cable to the array box (assuming it
has a fast-e port somewhere).. Run your test again. Maybe connect the array
to the switch via fast-e and test again. If everything seems to be swell
until the switch is in the mix, maybe borrow a different one to try out. As
far as system configuration goes I'll leave that to the list gods. I hope I
provided some value.

Bill


----- Original Message -----
From: "Leonardo Magallon" <leo.magallon at grantgeo.com>
To: "Beowulf List" <beowulf at beowulf.org>
Sent: Thursday, March 15, 2001 9:07 AM
Subject: Cluster and RAID 5 Array bottleneck.( I believe)


> Hi all,
>
>
>    We finally finished upgrading our beowulf from 48 to 108 processors and
also
> added a 523GB RAID-5 system to provide a mounting point for all of our
> "drones".  We went with standard metal shelves that cost about $40
installed.
> Our setup has one machine with the attached RAID Array to it via a 39160
Adaptec
> Card ( 160Mb/s transfer rate) at which we launch jobs.  We export /home
and
> /array ( the disk array mount point) from this computer to all the other
> machines.  They then use /home to execute the app and /array to read and
write
> over nfs to the array.
>   This computer with the array attached to it talks over a syskonnect
gig-e card
> going directly to a port on a switch which then interconnects to others.
The
> "drones" are connected via Intel Ether Express cards running Fast Ethernet
to
> the switches.
>    Our problem is that apparently this setup is not performing well and we
seem
> to have a bottleneck either at the Array or at the network level.  In
regards to
> the network level I have changed the numbers nfs uses to pass blocks of
info in
> this way:
>
> echo 262144 > /proc/sys/net/core/rmem_default
> echo 262144 > /proc/sys/net/core/rmem_max
> /etc/rc.d/init.d/nfs restart
> echo 65536 > /proc/sys/net/core/rmem_default
> echo 65536 > /proc/sys/net/core/rmem_max
>
> Our mounts are set to use 8192 as read and write block size also.
>
> When we start our job here, the switch passes no more than 31mb/s at any
moment.
>
> A colleague of mine is saying that the problem is at the network level and
I am
> thinking that it is at the Array level because the lights on the array
just keep
> steadily on and the switch is not even at 25% utilization and attaching a
> console to the array is mainly for setting up drives and not for
monitoring.
>
> My colleague also copied 175Megabytes over nfs from one computer to
another and
> the transfers took close to 45 seconds.
>
>
> Any comments or suggestions welcomed,
>
> Leo.
>
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From JParker at coinstar.com  Fri Mar 23 08:51:43 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Fri, 23 Mar 2001 08:51:43 -0800
Subject: Q system on a Sycld cluster.
Message-ID: <OF25499007.4061DABD-ON88256A18.005C9873@coinstar.com>

G'Day !

I would be interested in a look ...

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!


Jag <agrajag at linuxpower.org>
Sent by: beowulf-admin at beowulf.org
03/22/01 08:11 PM

 
        To:     Daniel Persson <myrridin at wlug.westbo.se>
        cc:     beowulf at beowulf.org
        Subject:        Re: Q system on a Sycld cluster.

On Thu, 22 Mar 2001, Daniel Persson wrote:

> Hi all,
> 
> A while back there was discussion about wich batch system to use on a
> Scyld cluster. However, what did you people come up to ?
> 
> Wich bacth system could/should one use on Scyld Beowulf2 cluster ?
> 
> Is it possible to use for ex PBS ?
> 
> What i need is a rather simple system whitout to many fancy features -
> suggestions anyone ?

I wrote a very simple system that you might be interested in.  It was
designed to run programs that weren't programed for parallel processing,
but for which you can specify certain command line options or input
(through stdin) to specify how the program should run differently on
each node.  It also redirects the output to a file.  Each node the job
is run on gets its own file for the output.

There are a few restraints on it based on how it does the I/O
redirection.  It requires that each job you run have a control
directory.  This directory stores the input you gives the program on
each node, and is where the output goes.  Due to this, it needs to be
located somewhere that is nfs mounted by the slave nodes (such as in
/home).  The actual program being run must also be on an nfs mounted
filesystem.

If anyone is interested in this simple system, let me know and I'll
clean it up for release.

Hopefully in the future I will modify it to use some of the tricks that
bpsh does so that the control directory and program being run won't have
to be nfs mounted. (Are we going to see a libbpsh?)  However, this
probablly won't be there if I do a release this month or next.

> 
> BTW - The mailinglist archive seems to have stopped at February.

I've noticed this as well and pointed it out to the person responsible,
but it still doesn't seem to be fixed yet :(


Jag


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010323/f872b42e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: attvo8mu.dat
Type: application/octet-stream
Size: 240 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010323/f872b42e/attachment.obj>

From djholm at fnal.gov  Fri Mar 23 09:27:59 2001
From: djholm at fnal.gov (Don Holmgren)
Date: Fri, 23 Mar 2001 11:27:59 -0600
Subject: BIOS
In-Reply-To: <Pine.LNX.4.31.0103230910420.1174-100000@snowball.fnal.gov>
Message-ID: <Pine.SGI.4.21.0103231056460.358756-100000@hppc.fnal.gov>

We've had some limited success on L440GX motherboards with dumping,
under Linux, the CMOS data area on a node configured to our liking to a
file.  Most, and on some motherboards, all of the BIOS settings are
there.  Then, again under Linux on another _identical_ (same
motherboard, same BIOS revision) system we write the CMOS dump file to
the CMOS data area.  This often succeeds in fixing all of the
settings.  For very large clusters, this is a much better way of
controlling the BIOS settings.  We also use the serial line BIOS
redirects that Steve talks about, but automating the various BIOS
settings via the serial line is quite difficult - impossible without the
serial redirect!

Now for all the caveats.  There is an existing Linux interface to the
CMOS data area, via /dev/nvram, which takes care of computing and
writing the standard checksum (bytes 2E and 2F) which the BIOS uses
during its POST routines to verify that the CMOS data are not corrupted.  
You must open /dev/nvram, lseek to the byte you want to change, and read
or write and the driver will handle the checksum for you.  
Unfortunately, some motherboards like the L440GX have an additional CRC
which covers an undocumented group of bytes and which is computed in an
undocumented order (and so almost impossible to reverse engineer).  
During POST if the CRC isn't correct, the BIOS resets all the settings
to their default values.  Also, some motherboards use more than the
"standard" (if such a standard exists) number of bytes (/dev/nvram lets
you manipulate 50 bytes, IIRC).

Since L440GX boards use more than the standard set of CMOS locations, we
had to write codes that use ioports 0x70 and 0x71 to dump or to write
about 240 bytes in the nvram area.

If only the BIOS and motherboard manufactures would post memory maps for
the nvram area!

Don Holmgren
Fermilab


On Fri, 23 Mar 2001, Steven Timm wrote:

> It depends to some extent on which motherboard you are running.
> The Intel L440GX series, which is most of what we have here at FNAL
> currently, allows to direct the BIOS I/O out one of the COM ports,
> usually COM2.  We have some clusters where we have hooked these
> up to a console server and been able to access all the BIOS from there.
> 
> Intel has a very nice client to do this on the Windows side as well,
> unfortunately it can only cover four nodes at once and they are not
> planning to expand it.
> 
> It would in theory be possible to extend vacm (from VA Linux) to do
> this but the last I talked to their developers they weren't
> moving this way.
> 
> Steve Timm
> 
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
> Fermilab Computing Division/Operating Systems Support
> Scientific Computing Support Group--Computing Farms Operations
> 
> On Fri, 23 Mar 2001, Jared Hodge wrote:
> 
> > Howdy (That's Texan for "hello"),
> > 	I've got a question about setting BIOS settings on a cluster's nodes.
> > The way I've been going about making changes is moving the keyboard and
> > monitor to all of the nodes.  I'm sure this isn't what everyone does.  I
> > imagine there's a way to put the settings on a floppy or maybe even send
> > them over  the network, but I don't have any experience with this.  Has
> > anyone done this?  What's the typical way of doing this on one of the
> > really large systems where moving a keyboard and monitor from node to
> > node would be impractical?
> > --
> > Jared Hodge
> > Institute for Advanced Technology
> > The University of Texas at Austin
> > 3925 W. Braker Lane, Suite 400
> > Austin, Texas 78759
> >
> > Phone: 512-232-4460
> > FAX: 512-471-9096
> > Email: Jared_Hodge at iat.utexas.edu


From newt at scyld.com  Fri Mar 23 09:32:29 2001
From: newt at scyld.com (Daniel Ridge)
Date: Fri, 23 Mar 2001 12:32:29 -0500 (EST)
Subject: 3COM 3C905C(X) and Scyld
In-Reply-To: <5.0.2.1.2.20010324090619.028a1ec0@172.17.0.107>
Message-ID: <Pine.LNX.4.21.0103231228480.4133-100000@eleanor.wdhq.scyld.com>

On Sat, 24 Mar 2001, Omar Kilani wrote:

> address. After this, the machine hangs with:
> 
> SIOCSFFLAGS: No such device

Do you get this message immediately after the rarp stage succeeds?

If not, I wonder if you have regenerated both your phase 1 and phase 2
boot images to include the new 3com driver.

To regenerate the phase 2 image, try 'beoboot -2 -n' and reboot a node to
see if the problem fixes itself.

Regards,
	Dan Ridge
	Scyld Computing Corporation


From agrajag at linuxpower.org  Fri Mar 23 10:56:43 2001
From: agrajag at linuxpower.org (Jag)
Date: Fri, 23 Mar 2001 10:56:43 -0800
Subject: Scyld Beowulf problem - PLEASE HELP
In-Reply-To: <5747AD9390EED211A1E000805FA719ED03B5AC87@PROWLER>; from RYANNET@techdata.com on Wed, Nov 22, 2000 at 02:08:31PM -0500
References: <5747AD9390EED211A1E000805FA719ED03B5AC87@PROWLER>
Message-ID: <20010323105643.N13901@kotako.analogself.com>

On Wed, 22 Nov 2000, Yannetta, Robert wrote:

> I have a 3-PC beowulf cluster that will not work. During the "Quick Setup"
> install, two things are not made clear in the instructions on the CD:
> 
> 1: During slave node disk partitioning, typing "beoboot-install -a hda"
> results in the error message "Failed to read partition table from hda on
> node 0."

That should be "beoboot-install -a /dev/hda"

> 
> 2: The next instruction: "Update the file /etc/beowulf/fstab on the front
> end machine." How should this be added and with what information? The
> instructions do not say.

This is just a matter of making what partitions are mounted match with
what partitions you made.  The instructions can't say exactly as there's
more than one way to partition the drives on the slave node.

If you used beofdisk -d to get a default partition layout, then
commenting the $RAMDISK line in /etc/beowulf/fstab and uncommenting the
two lines that begin with /dev/hda *should* work.  But as I don't know
exactly how your harddrives are partitioned, I can't say if that's right
or not.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010323/f7c9e556/attachment.sig>

From rgb at phy.duke.edu  Fri Mar 23 11:36:23 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 23 Mar 2001 14:36:23 -0500 (EST)
Subject: OT: computationally hard or not (algorithm question)
In-Reply-To: <20010323154852.A9050@rakete.joerdens.de>
Message-ID: <Pine.LNX.4.30.0103231017580.9126-100000@ganesh.phy.duke.edu>

On Fri, 23 Mar 2001, Frank Joerdens wrote:

> Consider an n-dimensional array containing something on the order of
> 10^4 (i.e. thens of thousands) elements, where each number may assume an
> integer value between 1-12. For the sake of the argument (it's easier to
> visualize; actually, n would be something on the order of a dozen),
> assume that n is 3, so you're basically looking at a box-shaped cloud of
> dots in 3-D space, where each dot's position is defined by its
> three-dimensional carthesian co-ordinates which are stored in the array.
>
> Now, take any one of those dots, search for the 10 (some figure on this
> order of magnitude) dots which are closest to it, and order those by
> proximity to the origin of the search.  This sounds pretty hard,
> computationally. Not for 3 Dimensions, since there the number of
> possible positions is only 12^3 = 1728, but for 12, its 12^12 =
> 8916100448256. I guess you'd have to find some efficient shortest-pair
> algorithm that works for n dimensions (my algorithm book only has one
> that works for 2), find the shortest pair, remove it from the array,
> then find the next value etc..
>
> Does anyone have an idea, or a pointer to some reading matter about it,
> as to how computationally expensive that would be? Any guess as to what
> kind of hardware you'd have to throw at it (I could settle for less
> dimensions if it turns out that I can't afford the hardware) when I want
> the result set within a couple of seconds at the _very_ most?
>
> Does anyone have a link to an article about the best algorithm for this
> kind of problem?

Not per se, but this problem (or problems that significantly overlap
with this problem) arise fairly often in both physics and in
multivariate statistics.  I'm not absolutely certain that I understand
your problem.  As you describe it, it doesn't sound all that
computationally complex.  However, it does sound very close in structure
to problems that are complex.  There are two generically different ways
to approach the solution, depending on which category it is in.

You indicate that you intend to pick a point and then look for its ten
nearest neighbors (according to some metric you haven't mentioned, so
I'll assume a flat Euclidean metric).  If I understand this correctly
(and there aren't any complications that kick the problem into the
"complex" category) a solution might look like:

typedef struct {
   double coord[12];
} element;

element elements[10000];
double distance[10000]

then of course it is simple to pick an element (or a point), evaluate
the distance of each element in the elements from the element or point
(storing the result in e.g. distance[i]) and sort the indices of
distance (not the list itself).  This scales no worse than N plus the
scaling of the sort algorithm used, and the scaling has nothing to do
with the number of points in the space.  Indeed, each element can
contain real number coordinates (hence double coord[12]) instead of one
of 12 possible integer values with some integer metric and it won't make
a lot of difference in the time required for the computation and no
difference at all in the scaling and a beowulf is likely not needed
unless you plan to do this a LOT.

Alternatively, the problem may be something where you have to store the
elements in an array that maps to the spatial coordinates -- you have a
************Space array addressable as
Space[a][b][c][d][e][f][g][h][i][j][k][l] where each array element
contains either 0 or one of 12 values.  You then have to pick a point (a
nontrivial operation in and of itself, as if a-l can be 1-12 each and
there are only 10^4 slots that aren't 0, you'll have to pick random
coordinates a lot of times to find one that doesn't contain 0.  Since
this array is sparse, the sane thing would be to invert it as described
above; however, one COULD always e.g. search concentric 12-cubes around
the point until one finds ten entries.  This would scale terribly --
remember the space is only 0.0000001% occupied (or something like that
-- I'm not doing the arithmetic carefully) so you'd have to search
rather large cubes around the point before you were at all "likely" to
find 10 hits if the elements aren't spatially bunched in their
distribution.

There are a number of physics and statistics problems that lie between
these two extremes -- the dimensionality of the space is high (perhaps
much higher than 12) but one cannot simply write down a way of
manipulating the elements as a simple vector.  In that event methods of
searching for optima in high dimensional spaces come into play --
conjugate gradient methods or genetic optimization methods or simulated
annealing methods.  A classic example is the travelling salesman
problem, where there are e.g. 12 cities that must be visited in some
order and the goal is to minimize a cost function associated with each
pairwise connection between cities.  The space of possible solutions
scales rather poorly with the number of cities, and finding a >>good<<
solution isn't easy or particularly likely with random search methods,
let alone finding the >>best<< solution.

GA's or simulated annealing both work much better; gradient methods are
likely to be irrelevant if the cost functions have no continuous
differential relationships to be exploited.  In physics, related
problems are e.g. the spin glass, where one seeks a configuration of
spins that minimizes a free energy function where a random "cost
function" is the sign of the spin coupling between nearest neighbor
lattice sites.

Both of these problems are known to be "difficult" in that finding
methods that scale well with the number of spins or cities is not easy,
and both of them are >>very<< difficult from the point of view of being
able to find "the" best solution instead of an "acceptably good
solution" (solution methods are almost invariably stochastic, so the
best one can give is a probability that a given solution is the best one
unless an exhaustive search is completed).

It isn't completely clear that your problem is in this latter category
of computational complexity, but if it is you'll need to real a whole
lot about search and optimization methodology for high dimensional
spaces before proceeding.

Hope this helps you begin to get a grip on the problem.  I may be way
off base in my understanding of what you want to do -- in that event,
please describe the problem in more detail.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From acgetchell at ucdavis.edu  Fri Mar 23 13:19:01 2001
From: acgetchell at ucdavis.edu (Adam Getchell)
Date: Fri, 23 Mar 2001 13:19:01 -0800
Subject: computationally hard or not (algorithm question)
References: <20010323154852.A9050@rakete.joerdens.de>
Message-ID: <004d01c0b3de$e5993ab0$8c11eda9@lanitza>

Hi Frank,

I'm not an expert or anything, but I happen to have "Introduction to
Algorithms" by Thomas Cormen, Charles Leiserson, and Ronald Rivest ("CLR"
for short) and Chapter 35, Computational Geometry, might have what you're
looking for.

The closest points algorithm they used is based on divide and conquer, and
runs in O(nlgn) time. It's a 2-D algorithm, but I don't *think* generalizing
to n-dimensions will change things since the first step is to do a sort on
the X and Y coordinates which is O(nlgn) time per dimension, but done
serially so running time shouldn't be affected. (Well, actually, you store
your points in an appropriate data structure and keep a table of pointers
for each dimension and sort those ....)

However, if you want more immediate satisfaction the basic algorithm is also
described here:

http://www.cs.cornell.edu/cs409-sp99/Lectures/Lecture%2010/sld002.htm

The divide part shouldn't be hard, since the line will still divide your
region in to (XL, XR) (YL,YR) (ZL,ZR) ... for each dimension. I'm not
certain how constraining your "box" will be; for 2-D you can reduce it to
checking 7 other neighbors, but generalized for a hypercube of n dimensions
the points might be a bit larger. You say 12 dimensions? CLR mentions using
L-distance (or Manhattan distance) instead of Euclidean distance ... perhaps
that may make it more tractable.

A practicing computer scientist probably has a much better idea than my
naive ramblings ....

Hope that helps,

--Adam
acgetchell at ucdavis.edu

----- Original Message -----
From: "Frank Joerdens" <frank at joerdens.de>
To: <beowulf at beowulf.org>
Cc: <ttt at archi-me-des.de>; <adam at archi-me-des.de>; <deadmovintarget at web.de>
Sent: Friday, March 23, 2001 6:48 AM
Subject: OT: computationally hard or not (algorithm question)


> I've been lurking on this this list for a few months cuz I think
> beowulfs are cool - but so far I've had neither the money nor any use
> for one. Now I have a problem which might fall into the general vicinity
> of beowulfery (but then again, it might not). You might have a pointer
> to the relevant resources (Yes! I am willing and able to RTFM!):
>
> Consider an n-dimensional array containing something on the order of
> 10^4 (i.e. thens of thousands) elements, where each number may assume an
> integer value between 1-12. For the sake of the argument (it's easier to
> visualize; actually, n would be something on the order of a dozen),
> assume that n is 3, so you're basically looking at a box-shaped cloud of
> dots in 3-D space, where each dot's position is defined by its
> three-dimensional carthesian co-ordinates which are stored in the array.
>
> Now, take any one of those dots, search for the 10 (some figure on this
> order of magnitude) dots which are closest to it, and order those by
> proximity to the origin of the search.  This sounds pretty hard,
> computationally. Not for 3 Dimensions, since there the number of
> possible positions is only 12^3 = 1728, but for 12, its 12^12 =
> 8916100448256. I guess you'd have to find some efficient shortest-pair
> algorithm that works for n dimensions (my algorithm book only has one
> that works for 2), find the shortest pair, remove it from the array,
> then find the next value etc..
>
> Does anyone have an idea, or a pointer to some reading matter about it,
> as to how computationally expensive that would be? Any guess as to what
> kind of hardware you'd have to throw at it (I could settle for less
> dimensions if it turns out that I can't afford the hardware) when I want
> the result set within a couple of seconds at the _very_ most?
>
> Does anyone have a link to an article about the best algorithm for this
> kind of problem?
>
> Many thanks in advance,
> Frank
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


From newt at scyld.com  Fri Mar 23 13:34:34 2001
From: newt at scyld.com (Daniel Ridge)
Date: Fri, 23 Mar 2001 16:34:34 -0500 (EST)
Subject: Scyld Beowulf channel bonding
In-Reply-To: <3A24C866.6C785EDA@sis.it>
Message-ID: <Pine.LNX.4.21.0103231632570.4133-100000@eleanor.wdhq.scyld.com>

I've never set up channel bonding on Scyld Beowulf -- but I have a
quick note about insmod on Scyld.

> I'm looking to use ethernet channel bonding on my Scyld Beowulf 2
> cluster.
> I do the following for two 2 node cluster (Master and one slave):
>     1) boot up slave node without bonding;
>     2) after  the node is up :
>            bpcp bonding.o 0:/tmp
>            bpsh 0 /sbin/insmod /tmp/bonding.o

You can do this in one operation with:

insmod --node 0 bonding.o

Regards,
	Dan Ridge
	Scyld Computing Corporation


From joerg at jasper.stanford.edu  Fri Mar 23 13:45:38 2001
From: joerg at jasper.stanford.edu (=?iso-8859-1?Q?J=F6rg?= Kaduk)
Date: Fri, 23 Mar 2001 13:45:38 -0800
Subject: computationally hard or not (algorithm question)
References: <20010323154852.A9050@rakete.joerdens.de> <004d01c0b3de$e5993ab0$8c11eda9@lanitza>
Message-ID: <3ABBC402.35CD668A@jasper.stanford.edu>

Hi all,
I am not an expert on anything either, but I would follow Adam in
saying Frank should look at geometric algorithms. 
I also think, depending on the problem it should be possible to avoid
to evaluate all point pair distances. I think, there should be 
algorithms, which determine hyperspaces from properties of the 
original set (convex would be nice). Using these hyperspaces
one should be able to reduce the amount of sorting significantly.
In 2d there are ways to cut the original set using hyperspaces (in this
case lines) and search only in the area determined by the cuts.
Unfortunately I do not know anything more about this, sorry.
Good luck,
Joerg

Adam Getchell wrote:
> 
> Hi Frank,
> 
> I'm not an expert or anything, but I happen to have "Introduction to
> Algorithms" by Thomas Cormen, Charles Leiserson, and Ronald Rivest ("CLR"
> for short) and Chapter 35, Computational Geometry, might have what you're
> looking for.
> 
> The closest points algorithm they used is based on divide and conquer, and
> runs in O(nlgn) time. It's a 2-D algorithm, but I don't *think* generalizing
> to n-dimensions will change things since the first step is to do a sort on
> the X and Y coordinates which is O(nlgn) time per dimension, but done
> serially so running time shouldn't be affected. (Well, actually, you store
> your points in an appropriate data structure and keep a table of pointers
> for each dimension and sort those ....)
> 
> However, if you want more immediate satisfaction the basic algorithm is also
> described here:
> 
> http://www.cs.cornell.edu/cs409-sp99/Lectures/Lecture%2010/sld002.htm
> 
> The divide part shouldn't be hard, since the line will still divide your
> region in to (XL, XR) (YL,YR) (ZL,ZR) ... for each dimension. I'm not
> certain how constraining your "box" will be; for 2-D you can reduce it to
> checking 7 other neighbors, but generalized for a hypercube of n dimensions
> the points might be a bit larger. You say 12 dimensions? CLR mentions using
> L-distance (or Manhattan distance) instead of Euclidean distance ... perhaps
> that may make it more tractable.
> 
> A practicing computer scientist probably has a much better idea than my
> naive ramblings ....
> 
> Hope that helps,
> 
> --Adam
> acgetchell at ucdavis.edu
> 
> ----- Original Message -----
> From: "Frank Joerdens" <frank at joerdens.de>
> To: <beowulf at beowulf.org>
> Cc: <ttt at archi-me-des.de>; <adam at archi-me-des.de>; <deadmovintarget at web.de>
> Sent: Friday, March 23, 2001 6:48 AM
> Subject: OT: computationally hard or not (algorithm question)
> 
> > I've been lurking on this this list for a few months cuz I think
> > beowulfs are cool - but so far I've had neither the money nor any use
> > for one. Now I have a problem which might fall into the general vicinity
> > of beowulfery (but then again, it might not). You might have a pointer
> > to the relevant resources (Yes! I am willing and able to RTFM!):
> >
> > Consider an n-dimensional array containing something on the order of
> > 10^4 (i.e. thens of thousands) elements, where each number may assume an
> > integer value between 1-12. For the sake of the argument (it's easier to
> > visualize; actually, n would be something on the order of a dozen),
> > assume that n is 3, so you're basically looking at a box-shaped cloud of
> > dots in 3-D space, where each dot's position is defined by its
> > three-dimensional carthesian co-ordinates which are stored in the array.
> >
> > Now, take any one of those dots, search for the 10 (some figure on this
> > order of magnitude) dots which are closest to it, and order those by
> > proximity to the origin of the search.  This sounds pretty hard,
> > computationally. Not for 3 Dimensions, since there the number of
> > possible positions is only 12^3 = 1728, but for 12, its 12^12 =
> > 8916100448256. I guess you'd have to find some efficient shortest-pair
> > algorithm that works for n dimensions (my algorithm book only has one
> > that works for 2), find the shortest pair, remove it from the array,
> > then find the next value etc..
> >
> > Does anyone have an idea, or a pointer to some reading matter about it,
> > as to how computationally expensive that would be? Any guess as to what
> > kind of hardware you'd have to throw at it (I could settle for less
> > dimensions if it turns out that I can't afford the hardware) when I want
> > the result set within a couple of seconds at the _very_ most?
> >
> > Does anyone have a link to an article about the best algorithm for this
> > kind of problem?
> >
> > Many thanks in advance,
> > Frank
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> >
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
J?rg Kaduk                            Tel.: 1 650 325 1521 x 416
Carnegie Institution of Washington    FAX: 1 650 325 6857
Dept. of Plant Biology
260 Panama Street                     joerg at jasper.stanford.edu
Stanford, CA 94305-4150               http://Jasper.Stanford.EDU/joerg/


From timothy.g.mattson at intel.com  Fri Mar 23 14:02:57 2001
From: timothy.g.mattson at intel.com (Mattson, Timothy G)
Date: Fri, 23 Mar 2001 14:02:57 -0800
Subject: MPI/Beowulf vs. SM Programming/OpenMP
Message-ID: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD81@orsmsx47.jf.intel.com>

Chris,

This is a very complex issue. For any generalization I make, you can find
conflicting cases. So take all of the following with a grain of salt.

First, here are some advantages of OpenMP:

OpenMP is a much easier way to write parallel programs than MPI.  It is much
smaller than MPI so its much easier to learn. 

OpenMP encourages the writing of parallel programs that are semantically
consistent with their serial coutnerparts.  In fact, "encourage" is too weak
of a term -- its actually a bit tricky to write OpenMP programs that are not
sequentially consistent. This is very imporatnt to software developers who
need to support both parallel and sequential hardawre.

Finally, in many cases, you can use OpenMP to add parallelism incrementally.
You start with a serial code, and bit by bit add parallelism until you
achieve the desired performance.  I'm not saying this is impossible with
MPI, but its not a well supported mode of programming in the MPI space.

... But there is a downside...

OpenMP is less general than MPI.  I've been writting parallel programs for
almost 2 decades and I can honestly say that I've only encountered a few
algorithms I can't express with MPI.  It can get horrendously difficult, but
MPI is quite general.  OpenMP, on the other hand, is geared toward "loop
splitting" or loosely synchronous SPMD algorithms. A general MPMD algorithm
with lots of assynchronous events would be hard to do with OpenMP.
(actually, it can be hard with MPIch as well, but then you can go with
MPI-LAM or PVM). 

MPI also maps onto a wider range of hardware than OpenMP.  Yes, there are
some attempts to map OpenMP onto distributed memory systems, but this will
only work for a subset of OpenMP applications. On the other hand, MPI does
quite well on shared memory systems.

This last point is very important.  For most people on this list --
cluseters are the parallel architecture of choice.  OpenMP works well on SMP
nodes within the cluster, but if you want to parallelize jobs across the
cluster, MPI is a much better option than OpenMP.

So you need to look at your application to figure out what sort of parallel
algorithms you'll be using, and you need to understand your target hardware
to make sure the code will run where you need it to.  With that information
in hand, you can decide whether to go with OpenMP or MPI.  If its clusters
you want to play with, chance are MPI will better suit you.

There is much more to say, but that will hopefully be enough to get you
started.

--Tim Mattson


-----Original Message-----
From: Chris Richard Adams [mailto:chrisa at aspatech.com.br]
Sent: Friday, March 23, 2001 6:07 AM
To: Beowulf (E-mail)
Subject: MPI/Beowulf vs. SM Programming/OpenMP


Hi everyone;

I've been coding for almost 5 years in a mix of C, Java and now Python.
I am now focused on learning more about parallel programming for
applications/algorithms related to genetic sequencial analysis within
databases - bioinformatics.  The last month or so I've been studying
about different methods that exist and I'm getting confused about where
to start.

I was convinced after reading material on Beowulf that it was the way to
go, but I've recently stumbled upon the OpenMP site and read more about
shared memory techniques.  It seems to me for the type of applications
I'm focusing on...this is a much better approach because I don't have to
spend so much time learning MPI and all the communications.  I can just
focus on learning the algorthms (this is their big sell point anyway).  

1.) Is this really true?

2.) Can anyone point out how Beowulf/MPI is the better solution and
learning path?

3.) Is their room for both Beowulf/MPI and Shared-Mem tech. in the
future?

I would really appreciate hearing your feedback.

Regards,
Chris 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From hahn at coffee.psychology.mcmaster.ca  Fri Mar 23 15:00:07 2001
From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn)
Date: Fri, 23 Mar 2001 18:00:07 -0500 (EST)
Subject: OT: computationally hard or not (algorithm question)
In-Reply-To: <20010323154852.A9050@rakete.joerdens.de>
Message-ID: <Pine.LNX.4.10.10103231652150.16820-200000@coffee.psychology.mcmaster.ca>

> Consider an n-dimensional array containing something on the order of
> 10^4 (i.e. thens of thousands) elements, where each number may assume an

10K elements is a fairly small number.  especially when the space
is 12^12 (density is ~ 1/900K).

> Now, take any one of those dots, search for the 10 (some figure on this
> order of magnitude) dots which are closest to it, and order those by

why 10?

> 8916100448256. I guess you'd have to find some efficient shortest-pair
> algorithm that works for n dimensions (my algorithm book only has one
> that works for 2), find the shortest pair, remove it from the array,
> then find the next value etc..

do you really want to extract pairs like this?  it sounds a bit like
a dendrogram (from clustering).

the obvious place to look is any algorithms book (kD trees), and 
possibly also the clustering literature.  sorting the dimensions
based on their span or variance might be a smart heuristic, too.

I've been thinking of something similar: clustering ~25600-dimensional data.
the data are actually 400-sample timecourses recorded from 64 scalp
electrodes, and generally we'd have a few thousand of them to cluster.
(actually just rejecting outliers would probably be of immediate practical
interest...)

there's also a large body of literature on analyzing similarity in
data with many dimensions, each of which is 26 deep, and in analyzing
data with many, many dimensions, each of which is 4 deep.  
(text and genetics if you didn't guess ;)

> dimensions if it turns out that I can't afford the hardware) when I want
> the result set within a couple of seconds at the _very_ most?

the enclosed program run in around 7 seconds on my cheap duron/600,
and is a fairly obvious implementation of what you described.
it's also O(N^2) in the number of elements, but linear in the 
length of each string.  the next obvious step would be to use a 
kD tree, or something a little smarter than the quadratic rescanning
of the array.

regards, mark hahn.
-------------- next part --------------
#include <stdlib.h>
#include <myio.H>
#include <math.h>
#include <vector.H>

typedef char atom;
static inline double sq(atom a) { return double(a)*double(a); }

const unsigned itemLength = 12;

class Item {
    atom *data;
  public:
    Item() {
	data = new atom[itemLength];
	for (unsigned i=0; i<itemLength; i++)
	    data[i] = (atom) (12 * drand48());
    }
    friend double distance(const Item &a, const Item &b) {
	double sum = 0;
	for (unsigned i=0; i<itemLength; i++) {
	    atom d = a.data[i] - b.data[i];
	    sum += sq(d);
	}
	return sum;
    }
    friend ostream operator<<(ostream &os, const Item &item) {
	for (unsigned i=0; i<itemLength; i++)
	    os << char('a' + item.data[i]);
	return os;
    }
};

int
main() {
    unsigned itemCount = 10000;
    Vector<Item> items(itemCount);

    for (unsigned a=0; a<items.n; a+=2) {
	unsigned bestItem;
	double bestScore = DBL_MAX;
	for (unsigned b=a+1; b<itemCount; b++) {
	    double d = distance(items[a], items[b]);
	    if (d < bestScore) {
		bestScore = d;
		bestItem = b;
	    }
	}
//	cout << items[a] << ' ' << items[bestItem]
//	     << ": score is " << bestScore << endl; 

	// swap bestItem with last item.
	Item t = items[a+1];
	items[a+1] = items[bestItem];
	items[bestItem] = t;
    }

    return 0;
}

From RSchilling at affiliatedhealth.org  Fri Mar 23 16:28:05 2001
From: RSchilling at affiliatedhealth.org (Schilling, Richard)
Date: Fri, 23 Mar 2001 16:28:05 -0800
Subject: NFS file server performance
Message-ID: <51FCCCF0C130D211BE550008C724149EBE1139@mail1.affiliatedhealth.org>

You might be reducing your throughput by running three NICS on the same
physical segment through the same switch.  The bottleneck will be the I/O on
the file server because it has to serve I/O through three NICS to get to the
same set of disks.

Well, you might check to see if any logical subnets are set up (you know,
are the computers in the cluster grouped together with diferent IP
domains?).  There may have been a reason for that.

Good luck.

--Richard


-----Original Message-----
From: Christian Storm [mailto:chr at kisac.cgr.ki.se]
Sent: Tuesday, March 20, 2001 12:33 PM
To: beowulf at beowulf.org
Subject: NFS file server performance


Hi,

I just took over a Beowulf cluster and I'm having having a great time
reconfiguring everything ... :)
It is partly used on large databases (up to 20 GB). Local storage is
not possible therefore the databases reside on a disk of a dedicated
file server that is mounted on all nodes of the cluster.

Here are the questions:

1. To improve performance two additional networks card were put into the
file server. Then the cluster was splitted in three networks (all sitting
on the same switch). Each subnetwork is mounting the file throw a
different NIC. These *seems* to work. But it is rather static and it is
not very elegant ... .
I experienced with the new 2.4 bridiging feature (by assigning all 3 NICs
to a bridge), but it just seems to add redundancy, not performance. I
assume some kind of channel bonding would be needed - as far as I know
supported by the network-cards (3c980) but not by the driver (3c90x).

Anybody knows a solution ?

2. What would be good number of NFS Daemons to run on the file-server ?
(accessed by 12 nodes through 3 NICs, PIII500 system with SCSI)
Currently I'm running 16 with socket input queue resized to 1MB.

Thanks in advance
Christian


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010323/c032494b/attachment.html>

From lindahl at conservativecomputer.com  Fri Mar 23 16:45:57 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Fri, 23 Mar 2001 19:45:57 -0500
Subject: parallelizing OpenMP apps
In-Reply-To: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD67@orsmsx47.jf.intel.com>; from timothy.g.mattson@intel.com on Thu, Mar 22, 2001 at 11:11:08AM -0800
References: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD67@orsmsx47.jf.intel.com>
Message-ID: <20010323194557.E2533@wumpus.hpti.com>

On Thu, Mar 22, 2001 at 11:11:08AM -0800, Mattson, Timothy G wrote:

> Mapping OpenMP onto an HPF-style programming environment can be as-hard or
> harder than going straight to MPI.

That's true in general. But this tool (SMS) was designed for weather
and climate codes. You actually start by throwing away all the OpenMP
cruft and converting to a serial code. Then the tool can insert most
of the data layout directives for you if you only have one data
decomposition, which is fairly common.

> I haven't personnally used it, but the OMNI compiler project at RWCP lets
> you run OpenMP programs on a cluster.

As Craig Tierney points out, this isn't true, but if it was, I assure
you that scalability would be poor for programs that aren't
embarrassingly parallel. So it would depend on how many jobs and of
what size you're wanting to run: if you want to run single codes on a
large numbers of nodes, OpenMP isn't going to get you there.

I believe that the Portland Group's HPF compiler does have the ability
to compile down to message passing of a couple of types. But scaling
is poor compared to MPI, because the compiler can't combine messages
as well as a human or SMS can. If you're praying for a 2X speedup, it
may get you there. If you want 100X...

-- g


From rajkumar at csse.monash.edu.au  Fri Mar 23 22:56:51 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Sat, 24 Mar 2001 17:56:51 +1100
Subject: [Fwd: NSF/TFCC Workshop on Teaching Computing]
Message-ID: <3ABC4533.E5113714@csse.monash.edu.au>

-------- Original Message --------
Subject: NSF/TFCC Workshop on Teaching Computing
Date: Fri, 23 Mar 2001 11:06:15 -0400
From: Barry Wilkinson <abw at UNCC.EDU>
Reply-To: abw at UNCC.EDU
Organization: University of North Carolina at Charlotte
To: uparc-l at bucknell.edu

I apologize if you receive this message multiple times. To be removed
for this mailing list, please send me email at abw at uncc.edu.  Please
note this workshop is free to faculty and includes accommodation but
space is limited!

Barry Wilkinson
University of North Carolina at Charlotte
________________________________________________________________

ADVANCE ANNOUNCEMENT

NSF/TFCC Workshop on Teaching Cluster Computing

Wednesday July 11th - Friday July 13th, 2001
Department of Computer Science
University of North Carolina at Charlotte

http://www.cs.uncc.edu/~abw/CCworkshop2001/

This intensive workshop, funded by the National Science Foundation* and
sponsored by the IEEE Task Force on Cluster Computing, provides
educators with materials and formal instruction to enable them to teach
cluster computing at the undergraduate and graduate level. Participants
will receive formal lectures and guided hands-on experience using a
dedicated cluster of SUN computers. In addition to conventional
message-passing cluster computing using industry standard
message-passing tools, participants will also learn distributed shared
memory programming on a cluster using readily available software.
Finally, a full day is dedicated to how to obtain and install the
software needed, and to teach cluster computing. Comprehensive
educational materials, including a textbook, will be provided for use in
the workshop and in their courses after returning to their home
institution. Post-workshop follow-up/support will be available.

The workshop will last three days and will take place in the Department
of Computer Science at the University of North Carolina at Charlotte.
There are no fees for this workshop. Accommodation and meals will be
provided at the University of North Carolina at Charlotte at no charge
to the participants. However, participants are expected to provide for
their own travel to and from Charlotte. In unusual circumstances, some
travel expenses of participants may be paid but only if the participants
cannot obtain needed support and there are workshop funds available.
Contact the workshop organizer for more information.

Provisional Timetable - see http://www.cs.uncc.edu/~abw/CCworkshop2001/

Organizer and Instructor:

Barry Wilkinson, Professor
Department of Computer Science
University of North Carolina at Charlotte
(704) 687 4879
abw at uncc.edu (preferred)

TO REGISTER

Send request by email to organizer at abw at uncc.edu giving name,
position, and affiliation (full address). The workshop is for
faculty/instructors who are interested to teaching cluster computing at
their own institution. Basic knowledge of C is assumed. Space on this
workshop is limited to a maximum of 15 participants.


*Note: The workshop is contingent upon funding by the National Science
Foundation, which is anticipated but not yet formally approved.


From frank at joerdens.de  Sat Mar 24 08:45:24 2001
From: frank at joerdens.de (Frank Joerdens)
Date: Sat, 24 Mar 2001 17:45:24 +0100
Subject: computationally hard or not (algorithm question)
In-Reply-To: <004d01c0b3de$e5993ab0$8c11eda9@lanitza>; from acgetchell@ucdavis.edu on Fri, Mar 23, 2001 at 01:19:01PM -0800
References: <20010323154852.A9050@rakete.joerdens.de> <004d01c0b3de$e5993ab0$8c11eda9@lanitza>
Message-ID: <20010324174524.A14930@rakete.joerdens.de>

On Fri, Mar 23, 2001 at 01:19:01PM -0800, Adam Getchell wrote:
> Hi Frank,
> 
> I'm not an expert or anything, but I happen to have "Introduction to
> Algorithms" by Thomas Cormen, Charles Leiserson, and Ronald Rivest ("CLR"
> for short) and Chapter 35, Computational Geometry, might have what you're
> looking for.

Got that too :), this is where I found the 2D shortest-pair algorithm I
mentioned.

Thanks, Frank


From frank at joerdens.de  Sat Mar 24 10:47:01 2001
From: frank at joerdens.de (Frank Joerdens)
Date: Sat, 24 Mar 2001 19:47:01 +0100
Subject: OT: computationally hard or not (algorithm question)
In-Reply-To: <Pine.LNX.4.30.0103231017580.9126-100000@ganesh.phy.duke.edu>; from rgb@phy.duke.edu on Fri, Mar 23, 2001 at 02:36:23PM -0500
References: <20010323154852.A9050@rakete.joerdens.de> <Pine.LNX.4.30.0103231017580.9126-100000@ganesh.phy.duke.edu>
Message-ID: <20010324194701.A15121@rakete.joerdens.de>

On Fri, Mar 23, 2001 at 02:36:23PM -0500, Robert G. Brown wrote:
[ . . . ]
> You indicate that you intend to pick a point and then look for its ten
> nearest neighbors (according to some metric you haven't mentioned, so
> I'll assume a flat Euclidean metric).  If I understand this correctly
> (and there aren't any complications that kick the problem into the
> "complex" category) a solution might look like:
> 
> typedef struct {
>    double coord[12];
> } element;
> 
> element elements[10000];
> double distance[10000]
> 
> then of course it is simple to pick an element (or a point), evaluate
> the distance of each element in the elements from the element or point
> (storing the result in e.g. distance[i]) and sort the indices of
> distance (not the list itself).  This scales no worse than N plus the
> scaling of the sort algorithm used, and the scaling has nothing to do
> with the number of points in the space.  Indeed, each element can
> contain real number coordinates (hence double coord[12]) instead of one
> of 12 possible integer values with some integer metric and it won't make
> a lot of difference in the time required for the computation and no
> difference at all in the scaling and a beowulf is likely not needed
> unless you plan to do this a LOT.

Hm, true. It's actually not as complicated as I imagined! I'll have to
play with this a bit more to figure out whether it would be sufficiently
efficient but I think it might actually be that simple.

> 
> Alternatively, the problem may be something where you have to store the
> elements in an array that maps to the spatial coordinates -- you have a
> ************Space array addressable as
> Space[a][b][c][d][e][f][g][h][i][j][k][l] where each array element
> contains either 0 or one of 12 values.  You then have to pick a point (a
> nontrivial operation in and of itself, as if a-l can be 1-12 each and
> there are only 10^4 slots that aren't 0, you'll have to pick random
> coordinates a lot of times to find one that doesn't contain 0.  Since
> this array is sparse, the sane thing would be to invert it as described
> above; however, one COULD always e.g. search concentric 12-cubes around
> the point until one finds ten entries.  This would scale terribly --
> remember the space is only 0.0000001% occupied (or something like that
> -- I'm not doing the arithmetic carefully) so you'd have to search
> rather large cubes around the point before you were at all "likely" to
> find 10 hits if the elements aren't spatially bunched in their
> distribution.

Someone else suggested this approach and I initially thought it would be
the solution. But you're obviously right in that the sparseness of the
array would make it very unlikely that you'd find anything nearby if the
distribution is random (which I'd expect it to be, more or less). This
_would_ scale terribly!

> 
> There are a number of physics and statistics problems that lie between
> these two extremes -- the dimensionality of the space is high (perhaps
> much higher than 12) but one cannot simply write down a way of
> manipulating the elements as a simple vector.  In that event methods of
> searching for optima in high dimensional spaces come into play --
> conjugate gradient methods or genetic optimization methods or simulated
> annealing methods.  A classic example is the travelling salesman
> problem, where there are e.g. 12 cities that must be visited in some

That was on mind when I posted the question. I wrote a little program in
Pascal a while back that implemented the subset-sum algorithm described
in Cormen, Leiserson and Rivest's Introduction To Algorithms, which is
in the same class of NP-hard problems as the travelling salesman problem.
This did scale terribly with the number of elements . . . 

Many thanks,

Frank


From frank at joerdens.de  Sat Mar 24 11:17:23 2001
From: frank at joerdens.de (Frank Joerdens)
Date: Sat, 24 Mar 2001 20:17:23 +0100
Subject: OT: computationally hard or not (algorithm question)
In-Reply-To: <Pine.LNX.4.10.10103231652150.16820-200000@coffee.psychology.mcmaster.ca>; from hahn@coffee.psychology.mcmaster.ca on Fri, Mar 23, 2001 at 06:00:07PM -0500
References: <20010323154852.A9050@rakete.joerdens.de> <Pine.LNX.4.10.10103231652150.16820-200000@coffee.psychology.mcmaster.ca>
Message-ID: <20010324201723.B15121@rakete.joerdens.de>

On Fri, Mar 23, 2001 at 06:00:07PM -0500, Mark Hahn wrote:
[ . . . ]
> > Now, take any one of those dots, search for the 10 (some figure on
this
> > order of magnitude) dots which are closest to it, and order those by
>
> why 10?

Something along those lines. The result set should very easily
"human-parsable", visually, at a glance, more or less.

[ . . . ]
> there's also a large body of literature on analyzing similarity in
> data with many dimensions, each of which is 26 deep, and in analyzing
> data with many, many dimensions, each of which is 4 deep.
> (text and genetics if you didn't guess ;)

That sounds very interesting indeed! To spill the beans about what I'm
up to (it's a little embarassing, seeing that you supercomputing people
are into fairly serious stuff mostly; protein sequences, physics, data
mining etc.), it's an online community/game where I want to give users
the option to find "similar" avatars to theirs. Similarity is a
massively complex notion, psychologically, and ideally this would be a
problem that you'd attack via some AI strategy (expert systems, or
systems that "learn" similarity), but the process needs to be _fast_: If
a user comes to the site, outfits her/his avatar with attributes and
then hits the go button to look for potential buddies, something around
a second would be acceptable. This is bearing in mind that there might
be dozens of those queries running more or less simultaneously, or in
quick succession. Hence my thinking that you'd define a set of
properties according to which every attribute is rated. Consider
clothing: Every item would get a rating according to coolness, elegance
and sportiness (we played that through; this would be _way_ too narrow,
you'd need a quite a few more differentiated categories), which means
you're looking at a 3D space where each dimension might be only 3
dimensions deep (2 = very, 1 = kind of, 0 = not at all). Actually, I am
thinking now that I could do with a depth of only 3 or 4 but that I'd
still need around a dozen dimensions (we haven't fixed the property set
yet). 

Do you have a starting point from where to dive into this body of
literature by any chance?

> > dimensions if it turns out that I can't afford the hardware) when I want
> > the result set within a couple of seconds at the _very_ most?
> 
> the enclosed program run in around 7 seconds on my cheap duron/600,
> and is a fairly obvious implementation of what you described.
> it's also O(N^2) in the number of elements, but linear in the 
> length of each string.  the next obvious step would be to use a 
> kD tree, or something a little smarter than the quadratic rescanning
> of the array.

I don't read C too well but as far as I could make out, the program you
enclosed (great!) implements Robert's first suggestion to 

-------------------------- begin quote -------------------------- 
to pick an element (or a point), evaluate the distance of each element
in the elements from the element or point (storing the result in e.g.
distance[i]) and sort the indices of distance (not the list itself).
-------------------------- end quote -------------------------- 

I'll try that!

Many thanks,
Frank


From rgb at phy.duke.edu  Sat Mar 24 14:59:05 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Sat, 24 Mar 2001 17:59:05 -0500 (EST)
Subject: OT: computationally hard or not (algorithm question)
In-Reply-To: <20010324201723.B15121@rakete.joerdens.de>
Message-ID: <Pine.LNX.4.30.0103241752180.9649-100000@ganesh.phy.duke.edu>

On Sat, 24 Mar 2001, Frank Joerdens wrote:

> That sounds very interesting indeed! To spill the beans about what I'm
> up to (it's a little embarassing, seeing that you supercomputing people
> are into fairly serious stuff mostly; protein sequences, physics, data
> mining etc.), it's an online community/game where I want to give users
> the option to find "similar" avatars to theirs. Similarity is a
> massively complex notion, psychologically, and ideally this would be a
> problem that you'd attack via some AI strategy (expert systems, or
> systems that "learn" similarity), but the process needs to be _fast_: If
> a user comes to the site, outfits her/his avatar with attributes and
> then hits the go button to look for potential buddies, something around
> a second would be acceptable. This is bearing in mind that there might
> be dozens of those queries running more or less simultaneously, or in
> quick succession. Hence my thinking that you'd define a set of
> properties according to which every attribute is rated. Consider
> clothing: Every item would get a rating according to coolness, elegance
> and sportiness (we played that through; this would be _way_ too narrow,
> you'd need a quite a few more differentiated categories), which means
> you're looking at a 3D space where each dimension might be only 3
> dimensions deep (2 = very, 1 = kind of, 0 = not at all). Actually, I am
> thinking now that I could do with a depth of only 3 or 4 but that I'd
> still need around a dozen dimensions (we haven't fixed the property set
> yet).
>
> Do you have a starting point from where to dive into this body of
> literature by any chance?

Similarity is my business.  Read about Parzen-Bayes classifiers -- this
isn't exactly what you want (because you don't have discrete
identifiable classes) but the notion of a classification/similarity
metric is developed there.  The next interesting idea to explore would
be using a neural network.  One can build a kind of network that
identifies a given individual out of a crowd, and then "score" the crowd
with the network to find near misses.  Expensive on a per-person basis,
but depending on how hard you want to work you might be able to
distribute the time.  This would also parallelize very nicely -- train
nets for each person in parallel on many hosts, coarsely aggregate them
by score, and perhaps identify a classification schema a posteriori.

The same kind of thing is VERY useful in e.g. Web Business (or any
business) identifying (potential) customers "like" your best customers,
for example.

> I don't read C too well but as far as I could make out, the program you
> enclosed (great!) implements Robert's first suggestion to
>
> -------------------------- begin quote --------------------------
> to pick an element (or a point), evaluate the distance of each element
> in the elements from the element or point (storing the result in e.g.
> distance[i]) and sort the indices of distance (not the list itself).
> -------------------------- end quote --------------------------
>
> I'll try that!

Read about Parzen-Bayes first.  This is still they way you'd want to
store the data; PB will just help you work out a way of creating a
"good" similarity metric.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From kragen at pobox.com  Sat Mar 24 18:09:46 2001
From: kragen at pobox.com (kragen at pobox.com)
Date: Sat, 24 Mar 2001 21:09:46 -0500 (EST)
Subject: Huinalu Linux SuperCluster
Message-ID: <200103250209.VAA07287@kirk.dnaco.net>

"Ron Brightwell" <rbbrigh at valeria.mp.sandia.gov> writes:
> Keep in mind that peak theoretical performance accurately measures your ability
> to spend money, while MPLinpack performance accurately measures your ability
> to seek pr -- I mean it measures the upper bound on compute performance from
> a parallel app.

A particular parallel app, not any arbitrary parallel app; if your
parallel app is SETI at Home, I'd expect to see numbers closer to peak
theoretical performance than to MPLinpack.


From rbbrigh at valeria.mp.sandia.gov  Sat Mar 24 18:47:53 2001
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Sat, 24 Mar 2001 19:47:53 -0700 (MST)
Subject: Huinalu Linux SuperCluster
In-Reply-To: <200103250209.VAA07287@kirk.dnaco.net> from
 "kragen@pobox.com" at Mar 24, 2001 09:09:46 PM
Message-ID: <200103250249.TAA13062@dogbert.mp.sandia.gov>

> 
> > Keep in mind that peak theoretical performance accurately measures your ability
> > to spend money, while MPLinpack performance accurately measures your ability
> > to seek pr -- I mean it measures the upper bound on compute performance from
> > a parallel app.
> 
> A particular parallel app, not any arbitrary parallel app; if your
> parallel app is SETI at Home, I'd expect to see numbers closer to peak
> theoretical performance than to MPLinpack.

If your parallel app is SETI at Home, you don't have a parallel app.

As I said before, I wasn't trying to be exact with my description of MPLinpack
-- I was trying to be humorous.  One of the reasons that I didn't try for
exactness is because I'm very familiar with the tendancy of a few people on
this list to try to "correct" others down to the point of arguing semantic
details that don't add any value to the discussion.

If you want to start a thread about your lax definition of a parallel app,
then that's ok, but it's unlikely to be of any value to most people on the
list.

-Ron


From rajkumar at csse.monash.edu.au  Sun Mar 25 04:01:37 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Sun, 25 Mar 2001 22:01:37 +1000
Subject: CCGrid 2001: cfp (March 31, early bird deadline)!
Message-ID: <3ABDDE21.A86D51DA@csse.monash.edu.au>

Dear Friends,

Please find enclosed advance program and call for participation for the: 
 CCGrid 2001: First ACM/IEEE International Symposium on Cluster Computing & the Grid
to be held in Brisbane, Australia (15-18 May 2001). 

The program consists of
   - 6 Keynote Speakers (leading international experts in cluster and grid computing)
   - 2 Invited Talks
   - 1 Panel session
   - 4 Industry/State-of-the-art talks 
   - 82 technical papers
   - 7 workshops
   - 3 tutorials (open to all and FREE i.e., no extra fee)

The conference also hosts poster and research exhibition sessions and the submissions 
for such poster papers is still open. 

We are expecting a large attendance. Please plan to participate and register early 
to take advantage of low registration fee.

*** The deadline for early registration (highly discounted fee) is: 31 March, 2001. *** 

We are looking forward to welcome and see you in Brisbane!

Thank you very much.

Sincerely Yours,

CCGrid 2001 Team
http://www.ccgrid.org
-------------------------------------------------------------------------------------

########################################################################
#                                                                      #
#    ###   ###   ####  #####  ###  ####     ####    ###    ###   ##    #
#   #     #     #      #   #   #   #   #        #  #   #  #   #   #    #
#   #     #     #  ##  ####    #   #    #     #    #   #  #   #   #    #
#   #     #     #   #  #   #   #   #   #    #      #   #  #   #   #    #
#    ###   ###   ####  #   #  ###  ####     #####   ###    ###   ###   #
#                                                                      #
########################################################################

 First ACM/IEEE International Symposium on Cluster Computing & the Grid
                         (CCGrid 2001)

           http://www.ccgrid.org | www.ccgrid2001.qut.edu.au

      15-18 May 2001, Rydges Hotel, South Bank, Brisbane, Australia

                   
                    CALL FOR PARTICIPATION
                    ----------------------

             *** Early bird registration 31 March ***


Keynotes
********

  * The Anatomy of the Grid: Enabling Scalable Virtual Organizations
    Ian Foster, Argonne National Laboratory and the University of Chicago, USA

  * Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers
    Andrzej Goscinski, Deakin University, Australia

  * Programming High Performance Applications in Grid Environments
    Domenico Laforenza, CNUCE-Institute of the Italian National Research Council, Italy

  * Global Internet Content Delivery
    Bruce Maggs, Carnegie Mellon University and Akamai Technologies, Inc., USA.

  * Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid
    Satoshi Matsuoka, Tokyo Institute of Technology, Japan

  * The Promise of InfiniBand for Cluster Computing
    Greg Pfister, IBM Server Technology & Architecture, Austin, USA

Invited Plenary Talks
*********************

  * The World Wide Computer: Prospects for Parallel and Distributed Computing on the Web
    Gul A. Agha, University of Illinois, Urbana-Champaign (UIUC), USA

  * Terraforming Cyberspace
    Jeffrey M. Bradshaw, University of West Florida, USA

Industry Plenary Talks
**********************

  * High Performance Computing at Intel: The OSCAR software solution stack for cluster computing
    Tim Mattson, Intel Corporation, USA

  * MPI/FT: Architecture and Taxonomies for Fault-Tolerant, Massage-Passing Middleware for
    Performance-Portable Parallel Computing
    Tony Skjellum, MPI Software Technology, Inc., USA

  * Effective Internet Grid Computing for Industrial Users    
    Ming Xu, Platform Corporation, Canada

  * Sun Grid Engine: Towards Creating a Compute Power Grid
    Wolfgang Gentzsch, Sun Microsystems, USA

FREE Tutorials  
**************

  * The Globus Toolkit for Grid Computing
    Ian Foster, Argonne National Laboratory, USA

  * An Introduction to OpenMP
    Tim Mattson, Intel Corporation, USA

  * Three Tools to Help with Cluster and Grid Computing: ATLAS, PAPI, and NetSolve
    University of Tennessee and Oak Ridge National Laboratory, USA

Panel
*****

  * The Grid: Moving it to Prime Time
    Moderator: David Abramson, Monash University, Australia.

Symposium Mainstream Sessions
*****************************
(Features 45 papers selected out of 126 submissions by peer review)

  * Component and Agent Approaches
  * Distributed Shared Memory
  * Grid Computing
  * Input/Output and Databases
  * Message Passing and Communication
  * Performance Evaluation
  * Scheduling and Load balancing
  * Tools for Management, Monitoring and Debugging

Workshops
*********
(Features 37 peer-reviewed papers selected by workshop organisers)

  * Agent based Cluster and Grid Computing
  * Cluster Computing Education
  * Distributed Shared Memory on Clusters
  * Global Computing on Personal Devices
  * Internet QoS for Global Computing
  * Object & Component Technologies for Cluster Computing
  * Scheduling and Load Balancing on Clusters

Important Dates
***************

  * Early bird registration            31 March
    (register online, check out web site)
  * Tutorials & workshops              15 May
  * Symposium main stream & workshops  16-18 May

Call for Poster/Research Exhibits:
**********************************

Those interested in exhibiting poster papers, please contact Poster Chair
Hai Jin (hjin at hust.edu.cn) or browse conference website for details.

Sponsors
********

  * IEEE Computer Society (www.computer.org)
  * IEEE Task Force on Cluster Computing (www.ieeetfcc.org)
  * Association for Computing Machinery  (ACM) and SIGARCH (www.acm.org)
  * IEEE Technical Committee on Parallel Processing (TCPP)
  * Queensland Uni. of Technology (QUT), Australia (www.qut.edu.au)
  * Platform Computing, Canada (www.platform.com)
  * Australian Partnership for Advanced Computing (APAC) (www.apac.edu.au)
  * Society for Industrial and Applied Mathematics (SIAM, USA) (www.siam.org)
  * MPI Software Technology Inc., USA (www.mpi-softtech.com)
  * International Business Machines (IBM) (www.ibm.com)
  * Akamai Technologies, Inc., USA (www.akamai.com)
  * Sun Microsystems, USA (www.sun.com)
  * Intel Corporation, USA (www.intel.com)

Further Information
*******************

Please browse the symposium web site:
  http://www.ccgrid.org | www.ccgrid2001.qut.edu.au

For specific clarifications, please contact one of the following:
 Conference Chairs: R. Buyya (rajkumar at buyya.com) or G. Mohay (mohay at fit.qut.edu.au)
 PC Chair: Paul Roe (ccgrid2001 at qut.edu.au)
------------------------------------------------------------------------------------


From ole at scali.no  Sun Mar 25 22:26:23 2001
From: ole at scali.no (Ole W. Saastad)
Date: Mon, 26 Mar 2001 08:26:23 +0200
Subject: parallelizing OpenMP apps; pghpf & MPI
References: <200103241700.MAA29335@blueraja.scyld.com>
Message-ID: <3ABEE10F.8A0A0F45@scali.no>

> Greg Lindahl wrote:

> I believe that the Portland Group's HPF compiler does have the ability
> to compile down to message passing of a couple of types. But scaling
> is poor compared to MPI, because the compiler can't combine messages
> as well as a human or SMS can. If you're praying for a 2X speedup, it
> may get you there. If you want 100X...
> Greg Lindahl


Portland hpf does indeed use MPI as the transport layer.

I works well with ScaMPI which is the implementation I  have tested. 
I get speedups from 2.33 to 3.04 with 4 cpus for the BN-H benchmark, 
class W, with MG as an exception where the serial code is better. 

For the pfbench benchmark I get speedups ranging from 1.35 to 3.71,
again with one exception where the serial code run faster.

Our license is limited to four cpus so I have not tested with more.

More information under support at Scali's web site (see below).

Ole


-- 
Ole W. Saastad, Dr.Scient.
Scali AS P.O.Box 70 Bogerud 0621 Oslo NORWAY 
Tel:+47 22 62 89 68(dir) mailto:ole at scali.no http://www.scali.com 
ScaMPI: bandwidth .gt. 220 MB/sec. latency .lt. 4us.


From ole at scali.no  Mon Mar 26 04:34:46 2001
From: ole at scali.no (Ole W. Saastad)
Date: Mon, 26 Mar 2001 14:34:46 +0200
Subject: parallelizing OpenMP apps; pghpf & MPI
Message-ID: <3ABF3766.CB45125A@scali.no>

> Greg Lindahl wrote:

> I believe that the Portland Group's HPF compiler does have the ability
> to compile down to message passing of a couple of types. But scaling
> is poor compared to MPI, because the compiler can't combine messages
> as well as a human or SMS can. If you're praying for a 2X speedup, it
> may get you there. If you want 100X...
> Greg Lindahl


Portland hpf does indeed use MPI as the transport layer.

I works well with ScaMPI which is the implementation I  have tested. 
I get speedups from 2.33 to 3.04 with 4 cpus for the BN-H benchmark, 
class W, with MG as an exception where the serial code is better. 

For the pfbench benchmark I get speedups ranging from 1.35 to 3.71,
again with one exception where the serial code run faster.

Our license is limited to four cpus so I have not tested with more.

More information under support at Scali's web site (see below).

Ole


-- 
Ole W. Saastad, Dr.Scient.
Scali AS P.O.Box 70 Bogerud 0621 Oslo NORWAY 
Tel:+47 22 62 89 68(dir) mailto:ole at scali.no http://www.scali.com 
ScaMPI: bandwidth .gt. 220 MB/sec. latency .lt. 4us.


From josip at icase.edu  Mon Mar 26 08:38:02 2001
From: josip at icase.edu (Josip Loncaric)
Date: Mon, 26 Mar 2001 11:38:02 -0500
Subject: NFS file server performance
References: <51FCCCF0C130D211BE550008C724149EBE1139@mail1.affiliatedhealth.org>
Message-ID: <3ABF706A.FF629BD2@icase.edu>

Linux NFS is a bottleneck in itself, even with Gigabit Ethernet.  You
can speed up the network, but the current Linux NFS implementation has
limitations which make it five times slower than other forms of file
transfer.  Straight rcp or ftp via Gigabit Ethernet typically reaches
about 25-30 MB/s, while NFS using the same hardware delivers only about
5-6 MB/s.

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134


From yocum at linuxcare.com  Mon Mar 26 09:39:09 2001
From: yocum at linuxcare.com (Dan Yocum)
Date: Mon, 26 Mar 2001 11:39:09 -0600
Subject: NFS file server performance
References: <51FCCCF0C130D211BE550008C724149EBE1139@mail1.affiliatedhealth.org> <3ABF706A.FF629BD2@icase.edu>
Message-ID: <3ABF7EBD.2DB46DE@linuxcare.com>

Josip,

Josip Loncaric wrote:
> 
> Linux NFS is a bottleneck in itself, even with Gigabit Ethernet.  You
> can speed up the network, but the current Linux NFS implementation has
> limitations which make it five times slower than other forms of file

Well, I think that statement requires a little more qualification: Linux
<-> Linux NFSv2 performance is quite good - up to about 8MB/s on fast
ethernet.  Linux <-> IRIX NFSv2 is slightly less than that, but not 1/5
the performance.  Linux <-> AIX NFSv2, I believe is somewhat less again,
(though I didn't have a machine to do tests on this).  Linux <->
OSF/1/Tru64 is "challenging" but again, I didn't have a machine to
perform these test.  Linux <-> Sun NFSv2 sucks under normal conditions. 
Thomas Davis gets good performance, but he's got some big E450 or
something that brute forces the data through.  Apparently there are
double caching issues between SunOS and Linux NFS server/clients.  I
don't know the details.

I performed a bunch of bonnie tests between various setups when I was
back at Fermilab and have since trashed the results so I can't quote the
exact data. 

I can't comment on NFFv3 performance since I haven't done any tests, but
I would be very interested in seeing what it is between various setups
Linux <-> Sun, Linux <-> IRIX, Linux <-> Linux (hint, hint).

Cheers,
Dan


-- 
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.  630.697.8066 tel
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Putting open source to work.


From edwards at icantbelieveimdoingthis.com  Mon Mar 26 10:02:09 2001
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Mon, 26 Mar 2001 11:02:09 -0700
Subject: NFS file server performance
In-Reply-To: <3ABF706A.FF629BD2@icase.edu>; from josip@icase.edu on Mon, Mar 26, 2001 at 11:38:02AM -0500
References: <51FCCCF0C130D211BE550008C724149EBE1139@mail1.affiliatedhealth.org> <3ABF706A.FF629BD2@icase.edu>
Message-ID: <20010326110209.A1653@icantbelieveimdoingthis.com>

I'm new to Beowulf. I have just gotten a small athalon cluster running
under Scyld and I was interested in your comments about file transfer.
I am under the impression that an MPI data transfer does not use NFS. Is 
this true? Also, Given the bus speed of normal motherboards, does 
increasing the network speed have a large impact on global performance?

Art Edwards

On Mon, Mar 26, 2001 at 11:38:02AM -0500, Josip Loncaric wrote:
> Linux NFS is a bottleneck in itself, even with Gigabit Ethernet.  You
> can speed up the network, but the current Linux NFS implementation has
> limitations which make it five times slower than other forms of file
> transfer.  Straight rcp or ftp via Gigabit Ethernet typically reaches
> about 25-30 MB/s, while NFS using the same hardware delivers only about
> 5-6 MB/s.
> 
> Sincerely,
> Josip
> 
> -- 
> Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
> ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
> NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
> Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From agrajag at linuxpower.org  Mon Mar 26 10:05:11 2001
From: agrajag at linuxpower.org (Jag)
Date: Mon, 26 Mar 2001 10:05:11 -0800
Subject: NFS file server performance
In-Reply-To: <20010326110209.A1653@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Mon, Mar 26, 2001 at 11:02:09AM -0700
References: <51FCCCF0C130D211BE550008C724149EBE1139@mail1.affiliatedhealth.org> <3ABF706A.FF629BD2@icase.edu> <20010326110209.A1653@icantbelieveimdoingthis.com>
Message-ID: <20010326100511.R13901@kotako.analogself.com>

On Mon, 26 Mar 2001, Art Edwards wrote:

> I'm new to Beowulf. I have just gotten a small athalon cluster running
> under Scyld and I was interested in your comments about file transfer.
> I am under the impression that an MPI data transfer does not use NFS. Is 
> this true? Also, Given the bus speed of normal motherboards, does 
> increasing the network speed have a large impact on global performance?

MPI does not use nfs.  nfs is used when you're on a slave node and try
to look at a file in /home.  It uses nfs to make that file (which is
really on the head node) available to the slave node.

The performance increase from increasing network speed depends on what
you are doing.  If your jobs are sending a lot of data over the network
and are having to sit idle while it waits for data to be transfered,
then yes, increasing network speed will improve overal performance.  If
that's not the case, then increasing network speed will probablly speed
up the startup of your programs a little as well as boottime for the
slave nodes, but won't really affect much other than that.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010326/11cd4389/attachment.sig>

From josip at icase.edu  Mon Mar 26 10:25:56 2001
From: josip at icase.edu (Josip Loncaric)
Date: Mon, 26 Mar 2001 13:25:56 -0500
Subject: Asus motherboards and ACPI
Message-ID: <3ABF89B4.EB686DD3@icase.edu>

Although Linux currently does not care about ACPI and strictly speaking
this problem has virtually zero impact on Beowulf clusters, it may be of
interest to those who use dual boot machines at their desk.

Many popular Asus motherboards, including the P2B-D with PCB revision
1.05 and earlier, have a minor hardware problem when ACPI is used (too
many ACPI events are generated, which can lead to system instability
under ACPI mode Windows 2000).  See these links:

http://france.asus.com/products/techref/Acpi/solution.html
http://www.asus.com/Products/Techref/Acpi/win2000.html

Despite the Asus' optimistic description, this hardware bug frequently
crashes the ACPI mode W2K kernel (however, the APM mode W2K kernel works
fine).  While Asus has a simple fix (move one resistor) this operation
is not to be attempted lightly.  The SMT resistor in question is only
about 1mm long and you need experience and good equipment to do such
precision work.  If you want to install W2K on one of the affected
motherboards but want to avoid hardware rework, be sure to override the
W2K setup (which defaults to ACPI kernel if there is ACPI support in
BIOS) and choose an appropriate non-ACPI kernel at install time (press
F5 when asked for third party SCSI drivers: ask Microsoft for details).

Sincerely,
Josip


From josip at icase.edu  Mon Mar 26 11:29:35 2001
From: josip at icase.edu (Josip Loncaric)
Date: Mon, 26 Mar 2001 14:29:35 -0500
Subject: NFS file server performance
References: <51FCCCF0C130D211BE550008C724149EBE1139@mail1.affiliatedhealth.org> <3ABF706A.FF629BD2@icase.edu> <3ABF7EBD.2DB46DE@linuxcare.com>
Message-ID: <3ABF989F.3AD163B8@icase.edu>

Dan Yocum wrote:
> 
> Josip Loncaric wrote:
> >
> > Linux NFS is a bottleneck in itself, even with Gigabit Ethernet.  You
> > can speed up the network, but the current Linux NFS implementation has
> > limitations which make it five times slower than other forms of file
> 
> Well, I think that statement requires a little more qualification: Linux
> <-> Linux NFSv2 performance is quite good - up to about 8MB/s on fast
> ethernet.

Linux <-> Linux NFSv2 over Gigabit Ethernet performs worse than over
Fast Ethernet, but rcp and ftp improve by about a factor of 2.5-3.  

While rcp/ftp use TCP, NFSv2 uses UDP and each of its 8KB blocks is
split into 6 UDP packets.  If any of the six is lost, all six have to be
resent.  To minimize this unhappy situation, our Gigabit Ethernet cards
interrupt the CPU only on every 6th packet received, but the fundamental
problem is that faster networks increase the probablilty that NFSv2 will
drop some packets, so after retransmits the performance gets worse than
what you'd see on a slower network.  Unfortunately, Linux NFSv3 is not
out of its development phase yet.

To answer Art's questions, MPI communication uses TCP (not NFS), except
of course when the code wants to access NFS mounted filesystems.  Faster
networks can improve global performance -- if communication is a
bottleneck (when this can be avoided, faster network won't help much). 
While the PCI bus can limit performance of any high-end network card,
this limit is 133 MB/s or higher.  With the right hardware, you can
typically reach about an order of magnitude higher bandwidth than Fast
Ethernet.  Is it worth the cost?  Would you rather have more CPUs or a
faster network?  You have to benchmark your code and then decide.  Most
of our codes are written to avoid the communication bottleneck.  Only
when this cannot be done do faster networks become attractive.

Sincerely,
Josip


-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134


From xyzzy at speakeasy.org  Mon Mar 26 12:00:40 2001
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Mon, 26 Mar 2001 12:00:40 -0800 (PST)
Subject: NFS file server performance
In-Reply-To: <3ABF7EBD.2DB46DE@linuxcare.com>
Message-ID: <Pine.LNX.4.04.10103261146240.5485-100000@xyzzy.dsl.speakeasy.net>

On Mon, 26 Mar 2001, Dan Yocum wrote:
> I can't comment on NFFv3 performance since I haven't done any tests, but
> I would be very interested in seeing what it is between various setups
> Linux <-> Sun, Linux <-> IRIX, Linux <-> Linux (hint, hint).

This is Linux <-> Linux, 2.2.18, 2 channel 100MB ethernet.  This disk is a
raid5 array on a mylex extremeraid 2000.  Unfortunately, neither machine was
idle during the tests, but that effected the "per char" tests mostly.  I'm not
sure how to check what nfs version is getting used, but I don't think it's
NFSv3.

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
NFS      1024 10591 79.8 13400 16.2  5613 13.4  9629 83.1 17727 33.2 554.0  9.6
Local    1024  9488 91.0 17369 16.4  9995 19.0  9214 78.9 51125 27.7 675.7  6.4


From Dean.Carpenter at pharma.com  Mon Mar 26 12:04:25 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Mon, 26 Mar 2001 15:04:25 -0500
Subject: SMP support with the scyld package
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47E5@a1mbx01.pharma.com>

Hummm.  I see that it ships with the 2.2.17 sources, but I don't see any SMP
kernels lying around ...

Anything specific in the config that needs to be done to build a "Scyld"
kernel ?  I know bproc has to get in there, and I see a few bproc* files in
the source tree.  Or is it as simple as a

make clean
make dep
make bzImage

in the 2.2.17 source tree ?  How about 2.2.19 ?  Given a virgin kernel
source tree from kernel.org, what needs doing to integrate bproc with it ?

--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)


-----Original Message-----
From: Jag [mailto:agrajag at linuxpower.org]
Sent: Wednesday, March 14, 2001 12:21 AM
To: Marc Cozzi
Cc: 'beowulf at beowulf.org'
Subject: Re: SMP support with the scyld package


On Tue, 13 Mar 2001, Marc Cozzi wrote:

> greetings,
> 
> I'm considering several dual 1GHz, 1GB Intel/Asus systems. Has anyone
> used the Beowulf package from Scyld Computing Corporation with
> SMP systems? Does one have to rebuild the kernel to enable SMP
> support or is it turned on by default? Are there issues with BProc
> and SMP?

Scyld ships UP and SMP kernel.  I have a cluster that is running the SMP
kernel (although the machines only have on processer per at the moment).
Everything works fine with the one caveat that before you make the node
boot image with beosetup, you have to make sure /boot/vmlinuz is
pointing to the SMP kernel (that or specify a different kernel when
making the image in beosetup).  My install was done as an overlay
install, I'm not sure if you use Scyld's modified anaconda on the CD if
it will do that correctly or not.

BProc will still treat each machine as one node even if it has two
processors in it.  However, I believe that beompi does understand the
concept of multiple processors per node and can work with it.
Unfortunately I don't have a cluster of SMP machines, so I haven't been
able to really test that.


Jag


From agrajag at linuxpower.org  Mon Mar 26 12:22:21 2001
From: agrajag at linuxpower.org (Jag)
Date: Mon, 26 Mar 2001 12:22:21 -0800
Subject: SMP support with the scyld package
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC47E5@a1mbx01.pharma.com>; from Dean.Carpenter@pharma.com on Mon, Mar 26, 2001 at 03:04:25PM -0500
References: <759FC8B57540D311B14E00902727A0C002EC47E5@a1mbx01.pharma.com>
Message-ID: <20010326122221.S13901@kotako.analogself.com>

On Mon, 26 Mar 2001, Carpenter, Dean wrote:

> Hummm.  I see that it ships with the 2.2.17 sources, but I don't see any SMP
> kernels lying around ...

It's in the kernel-smp package.
ftp://ftp.scyld.com/pub/beowulf/current/RPMS/i686/kernel-smp-2.2.17-33.beo.i686.rpm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010326/5c6419b9/attachment.sig>

From Dean.Carpenter at pharma.com  Mon Mar 26 14:06:08 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Mon, 26 Mar 2001 17:06:08 -0500
Subject: SMP support with the scyld package
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47E6@a1mbx01.pharma.com>

Ah ha.  OK, I found that on the CD, and installed it

	rpm -i kernel-smp-2.2.17-33.beo.i686.rpm

No problem, and I see the new kernel in /boot and the modules in
/lib/modules/2.2.17-beosmp.

So I change /boot/vmlinuz to point to the new kernel, and then create the
netboot image ...

	beoboot -d -2 -n -k /boot/vmlinuz-2.2.17-33.beosmp -m
/lib/modules/2.2.17.beosmp

That chews for a little while, and the debug output looks OK.  I noticed
that it recreated the /var/beowulf/boot.img file though ...  Why ?

In any case, I reboot on of the dual nodes with the *existing* boot floppy
(created last week from beosetup gui) and it appears to come up OK, but not
all the way.  The log shows that it's still trying to use the UP modules in
/lib/modules/2.2.17.beo.

Dang.  I thought the phase 1 kernel on the floppy was a generic one (from
/boot/vmlinuz.beoboot, which is just a symlink to
vmlinuz-2.2.17-33.beobeoboot :) just to get to the master node to boot the
"real" kernel - the smp one just installed.  Hmmm, OK, recreate the floppy
from the gui.  I don't think it should be different, but just in case.

This is worse.  It dies while loading vmlinuz, asking for another boot disk.

Ah ha (again).  I just noticed module-info in /boot, pointing to the UP
modules-info-2.2.17-33.beo file rather than the smp one.  I just changed
that and rebooted a couple of the dual nodes.  We'll see.

Nope.  Systems booted from the old floppy still try to use the UP modules in
/lib/modules/2.2.17-33.beo

So where exactly do all the kernels fit in ?

/boot/vmlinuz-2.2.17-33.beobeoboot	aka vmlinuz.beoboot
/var/beowulf/boot.img			What's the connection between these
for the floppy ?
/boot/vmlinuz-2.2.17-33.beosmp

Sorry for being thick here ...

--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)


-----Original Message-----
From: Jag [mailto:agrajag at linuxpower.org]
Sent: Monday, March 26, 2001 3:22 PM
To: Carpenter, Dean
Cc: 'beowulf at beowulf.org'
Subject: Re: SMP support with the scyld package


On Mon, 26 Mar 2001, Carpenter, Dean wrote:

> Hummm.  I see that it ships with the 2.2.17 sources, but I don't see any
SMP
> kernels lying around ...

It's in the kernel-smp package.
ftp://ftp.scyld.com/pub/beowulf/current/RPMS/i686/kernel-smp-2.2.17-33.beo.i
686.rpm


From keithu at parl.clemson.edu  Mon Mar 26 15:19:03 2001
From: keithu at parl.clemson.edu (Keith Underwood)
Date: Mon, 26 Mar 2001 18:19:03 -0500 (EST)
Subject: SMP support with the scyld package
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC47E6@a1mbx01.pharma.com>
Message-ID: <Pine.LNX.4.30.0103261817460.19467-100000@keithu-pc.parl.clemson.edu>

Ok, try booting the master SMP (even if it is a UP box, it's ok).  That
should do it.  Not sure why, but I seem to remember having that problem
when I had an SMP master and UP nodes.

				Keith


On Mon, 26 Mar 2001, Carpenter, Dean wrote:

> Ah ha.  OK, I found that on the CD, and installed it
>
> 	rpm -i kernel-smp-2.2.17-33.beo.i686.rpm
>
> No problem, and I see the new kernel in /boot and the modules in
> /lib/modules/2.2.17-beosmp.
>
> So I change /boot/vmlinuz to point to the new kernel, and then create the
> netboot image ...
>
> 	beoboot -d -2 -n -k /boot/vmlinuz-2.2.17-33.beosmp -m
> /lib/modules/2.2.17.beosmp
>
> That chews for a little while, and the debug output looks OK.  I noticed
> that it recreated the /var/beowulf/boot.img file though ...  Why ?
>
> In any case, I reboot on of the dual nodes with the *existing* boot floppy
> (created last week from beosetup gui) and it appears to come up OK, but not
> all the way.  The log shows that it's still trying to use the UP modules in
> /lib/modules/2.2.17.beo.
>
> Dang.  I thought the phase 1 kernel on the floppy was a generic one (from
> /boot/vmlinuz.beoboot, which is just a symlink to
> vmlinuz-2.2.17-33.beobeoboot :) just to get to the master node to boot the
> "real" kernel - the smp one just installed.  Hmmm, OK, recreate the floppy
> from the gui.  I don't think it should be different, but just in case.
>
> This is worse.  It dies while loading vmlinuz, asking for another boot disk.
>
> Ah ha (again).  I just noticed module-info in /boot, pointing to the UP
> modules-info-2.2.17-33.beo file rather than the smp one.  I just changed
> that and rebooted a couple of the dual nodes.  We'll see.
>
> Nope.  Systems booted from the old floppy still try to use the UP modules in
> /lib/modules/2.2.17-33.beo
>
> So where exactly do all the kernels fit in ?
>
> /boot/vmlinuz-2.2.17-33.beobeoboot	aka vmlinuz.beoboot
> /var/beowulf/boot.img			What's the connection between these
> for the floppy ?
> /boot/vmlinuz-2.2.17-33.beosmp
>
> Sorry for being thick here ...
>
> --
> Dean Carpenter
> Principal Architect
> Purdue Pharma
> dean.carpenter at pharma.com
> deano at areyes.com
> 94TT :)
>
>
> -----Original Message-----
> From: Jag [mailto:agrajag at linuxpower.org]
> Sent: Monday, March 26, 2001 3:22 PM
> To: Carpenter, Dean
> Cc: 'beowulf at beowulf.org'
> Subject: Re: SMP support with the scyld package
>
>
> On Mon, 26 Mar 2001, Carpenter, Dean wrote:
>
> > Hummm.  I see that it ships with the 2.2.17 sources, but I don't see any
> SMP
> > kernels lying around ...
>
> It's in the kernel-smp package.
> ftp://ftp.scyld.com/pub/beowulf/current/RPMS/i686/kernel-smp-2.2.17-33.beo.i
> 686.rpm
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

---------------------------------------------------------------------------
Keith Underwood                   Parallel Architecture Research Lab (PARL)
keithu at parl.clemson.edu                                  Clemson University


From johannes.grohn at sonera.com  Tue Mar 27 00:13:17 2001
From: johannes.grohn at sonera.com (Johannes =?ISO-8859-1?Q?Gr=F6hn?=)
Date: Tue, 27 Mar 2001 08:13:17 GMT
Subject: Scyld Beowulf channel bonding
In-Reply-To: <3A24C866.6C785EDA@sis.it>
References: <3A24C866.6C785EDA@sis.it>
Message-ID: <20010327.8131700@grohnjo1.tkk.tele.fi>

Hi,
I am in the same situation.  I would like to do channel bonding with 
scyld, but I get the same error as Massimo.  Is it because bond0 takes 
the ip address from eth0?  Does anyone have any ideas on how to get 
bonding to work with scyld?
Thanks,
Johannes

>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 11/29/00, 12:12:06 PM, <massimot at sis.it> wrote regarding Scyld Beowulf 
channel bonding:


> Hi,

> I'm looking to use ethernet channel bonding on my Scyld Beowulf 2
> cluster.
> I do the following for two 2 node cluster (Master and one slave):
>     1) boot up slave node without bonding;
>     2) after  the node is up :
>            bpcp bonding.o 0:/tmp
>            bpsh 0 /sbin/insmod /tmp/bonding.o
>     3) configure bond0:
>            bpsh 0 /sbin/ifconfig bond0 10.0.0.1 netmask 255.255.255.0
> broadcast 10.255.255.255 up
>            but it fails and the slave node demon die.


> Regards,
>     Massimo Torquati
>     HuginSoft.


> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


From romie at pinguin.stttelkom.ac.id  Mon Mar 26 23:50:25 2001
From: romie at pinguin.stttelkom.ac.id (romie)
Date: Tue, 27 Mar 2001 14:50:25 +0700 (JAVT)
Subject: diskless problem
Message-ID: <Pine.LNX.4.21.0103271434180.22914-100000@pinguin.stttelkom.ac.id>

dear,
Regarding to Beowulf-installation and Administration HOWTO
section 6.2 Configuring Disk-less Clients
there is CONFIG_RNFS_BOOTP and CONFIG_RNFS_RARP option to answer 
"y" (enabel) when compile kernel, but I can't find that option when 
i compiled my 2.2.16 linux kernel.

i kept compiling my kernel without that option,and got error message
when i start up my node client with that kernel.
there's no problem with scdt and acdn scrypt,it's run well , and my node
client got its ip address by using RARP .
on the server side , there's directory /tftpboot/Template/10.14.1.1/
contain bin, etc ,sbin directories , ...

warning : unable to open an initial console.
kernel panic: No init found. Try passing int option to kernel
what's wrong with my kernel ?? and 
where can i find that 2 kernel option above.

thank you

best regard
- romie - 


From myrridin at wlug.westbo.se  Tue Mar 27 11:48:16 2001
From: myrridin at wlug.westbo.se (Daniel Persson)
Date: Tue, 27 Mar 2001 21:48:16 +0200 (CEST)
Subject: Matlab on B2
Message-ID: <Pine.LNX.4.30.0103272146100.8727-100000@wlug.westbo.se>

Hi again,

Since the mailarchive isnt browseble this month, could someone help me
with getting Matlab to run on a Scyld cluster ?

I know that there where discussion about it and i belive that someone had
a nice solution for it (or was it with octave ?)


/Daniel

-- 
Daniel Persson

Westbo Linux User Group 	---> http://wlug.westbo.se
A swedish site about Gnome 	---> http://wlug.westbo.se/gnome
My personal pages 		---> http://wlug.westbo.se/~myrridin

Dagens kommentar :

The brain is a wonderful organ; it starts working the moment you get up
in the morning, and does not stop until you get to school.


From chrisa at ASPATECH.COM.BR  Tue Mar 27 12:40:42 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Tue, 27 Mar 2001 17:40:42 -0300
Subject: Master node install stops during "performing post install configuration"
Message-ID: <BD674CDC0A68C642B60038986FBB56DF015C60@mailsrv.ASPATECH.COM.BR>

Hi all;

During the master node install, once all the files are installed - it
makes it to the end and says, "performing post install configuration"
and thats it.  Even thought the install takes less than 15 minutes -
I've waiting up to 1/2hr - and never makes it past that message.  The
mouse still moves so I don't think it's frozen.

I have repeated this error on two different PCs. One compaq (ipaq) the
other a std. PC with an Intel board. I am doing a custom install, but
nothing complicated - all beowulf packages with some additional telnet,
inetd, ftp services.  I'll try to do just a default install and see, but
any clues?

I had Redhat 6.2 installed on the machines prior - so I don't think
there would be any hardware conflicts.

???
Thanks,
Chris


From chendric at qssmeds.com  Tue Mar 27 12:51:37 2001
From: chendric at qssmeds.com (Chris Hendrickson)
Date: Tue, 27 Mar 2001 15:51:37 -0500
Subject: Master node install stops during "performing post install configuration"
References: <BD674CDC0A68C642B60038986FBB56DF015C60@mailsrv.ASPATECH.COM.BR>
Message-ID: <3AC0FD59.3020800@qssmeds.com>

I encountered that same problem, do not enter any IP information for the 
internal (192.168.1.1) ethernet device, when I did not change any 
settings to that device, but instead left it at it's default, it 
continued properly.

Chris


Chris Richard Adams wrote:

> Hi all;
> 
> During the master node install, once all the files are installed - it
> makes it to the end and says, "performing post install configuration"
> and thats it.  Even thought the install takes less than 15 minutes -
> I've waiting up to 1/2hr - and never makes it past that message.  The
> mouse still moves so I don't think it's frozen.


-- 
"The box said requires Windows 95 or better... So I installed Linux"

Chris Hendrickson
QSS Group. Inc - MEDS
NASA/Goddard Space Flight Center
Voice: (301) 867-0081 Fax: (301) 867-0089


From chrisa at ASPATECH.COM.BR  Tue Mar 27 13:09:27 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Tue, 27 Mar 2001 18:09:27 -0300
Subject: Master node install stops during "performing post install configuration"
Message-ID: <BD674CDC0A68C642B60038986FBB56DF0138EF@mailsrv.ASPATECH.COM.BR>

uhhhh... Not sure if this is the cause, but should I have the slave
nodes already attached to the network.  Just looking in the docs, I see
that during install it will write to the floppy disks of each slave
node. Could this be the cause, because i do not have any slave nodes
attached yet? 

Thanks,
CHris

> -----Original Message-----
> From: Chris Richard Adams 
> Sent: Tuesday, March 27, 2001 5:41 PM
> To: Beowulf (E-mail)
> Subject: Master node install stops during "performing post install
> configuration"
> 
> 
> Hi all;
> 
> During the master node install, once all the files are installed - it
> makes it to the end and says, "performing post install configuration"
> and thats it.  Even thought the install takes less than 15 minutes -
> I've waiting up to 1/2hr - and never makes it past that message.  The
> mouse still moves so I don't think it's frozen.
> 
> I have repeated this error on two different PCs. One compaq (ipaq) the
> other a std. PC with an Intel board. I am doing a custom install, but
> nothing complicated - all beowulf packages with some 
> additional telnet,
> inetd, ftp services.  I'll try to do just a default install 
> and see, but
> any clues?
> 
> I had Redhat 6.2 installed on the machines prior - so I don't think
> there would be any hardware conflicts.
> 
> ???
> Thanks,
> Chris
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


From Dean.Carpenter at pharma.com  Tue Mar 27 13:27:22 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Tue, 27 Mar 2001 16:27:22 -0500
Subject: SMP support with the scyld package
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47EC@a1mbx01.pharma.com>

OK, I think I've answered one or two of my own questions here ... but there
are still issues.

/var/beowulf/boot.img is the netboot'd kernel, taken from the -k parameter
to beoboot.  The phase 1 kernel boots this image.  No problem - I got it :)
Doh - it was late, I was tired - that's it.

1st problem :
-------------
This appears to be with the smp kernel package.  It won't boot on the master
node.  I install the rpm with the command show below, and it appears to go
fine.  The files are all there, including the /lib/modules/2.2.17-33.beosmp.

	rpm -i kernel-smp-2.2.17-33.beo.i686.rpm

I adjust the master node thus :

/boot/vmlinuz -> /boot/vmlinuz-2.2.17-33.beosmp
/boot/System.map -> /boot/System.map-2.2.17-33.beosmp
/boot/modules-info -> /boot/module-info-2.2.17-33.beosmp (actually same as
.beo one)

Add a new entry to /etc/lilo.conf for the smp kernel, and reboot the master
node.  It starts to come up, tried to load the aic7xxx module which has tons
of unknown symbols and fails, so the root fs fails to mount, and it dies.
It was trying to load from /lib/aic7xxx.o - weird location ...

2nd problem :
-------------
For the hell of it, I tried to build a new kernel.  I copied
/usr/src/linux-2.2.17 away to save it, then copied
configs/kernel-2.2.17-i686-smp.config to .config and ran :

make clean
make dep
make bzImage  which fails with :

scripts/split-include include/linux/autoconf.h include/config
find: *: No such file or directory
scripts/split-include: find - pclose: Success		<<<<--- see below
make: *** [include/config/MARKER] Error 1

I added the " - pclose" to the split-include.c file at the very end to
verify where the error was being thrown.

3rd problem : (actually, from original question below, partially solved)
-------------
In any case, I try to create a phase 2 boot image using the smp kernel from
the rpm on the CD.  My original beoboot -2 from the earlier post failed
because the nodes tried to load modules from /lib/modules/2.2.17-33.beo.
That was my fimble fungers there - I used -m /lib/modules/2.2.17.beosmp -
forgot the -33 in there.  This is the one that almost works :

	beoboot -d -2 -n -k /boot/vmlinuz-2.2.17-33.beosmp -m
/lib/modules/2.2.17-33.beosmp

The node comes up, but fails because /dev/bproc doesn't exist :(  So it
looks like BProc wasn't integrated properly in the rpm ?  I found bproc.o in
/lib/modules/2.2.17-33.beo/misc, but it wasn't in the
/lib/modules/2.2.17-33.beosmp/misc dir.  Hrm.  rpm -qf bproc.o doesn't know
anything about it either - no package appears to own it.  It doesn't show up
in the list of files via rpm -ql for the standard kernel-2.2.17-33.beo or
the kernel-smp-2.2.17-33.beo.

Oh well.  Copy the bproc.o file to the smp modules tree.  Rebuild the boot
image with the beoboot -2 command as listed above.  Sure enough - bproc.o is
now listed in the debug output.  Reboot a dual-cpu node to see if it takes
:)

Nope.  bpsh 10 uname -r still shows 2.2.17-33.beo as the kernel.  Dang.

---------------

Given a raw Scyld install, can anyone show a cookbook sequence to converting
the master node as well as the slave nodes to a custom kernel ?  Not just
the SMP kernel from the rpm - we'll want to play with MOSIX, as well as the
2.2.19 kernel.

1. Install x and y packages to solve problem #2 above so we can actually
build kernels.
2. Pull down virgin kernel sources and untar into /usr/src/linux
3. Integrate the bproc patches .....  (how - what else ?)
4. Integrate MOSIX (and any other cool) patches
5. Build the kernel and modules (any special considerations ?)
6. Build the custom phase 2 boot image
7. Rock and roll :)


--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)


-----Original Message-----
From: Carpenter, Dean [mailto:Dean.Carpenter at pharma.com]
Sent: Monday, March 26, 2001 5:06 PM
To: 'Jag'
Cc: 'beowulf at beowulf.org'
Subject: RE: SMP support with the scyld package


Ah ha.  OK, I found that on the CD, and installed it

	rpm -i kernel-smp-2.2.17-33.beo.i686.rpm

No problem, and I see the new kernel in /boot and the modules in
/lib/modules/2.2.17-beosmp.

So I change /boot/vmlinuz to point to the new kernel, and then create the
netboot image ...

	beoboot -d -2 -n -k /boot/vmlinuz-2.2.17-33.beosmp -m
/lib/modules/2.2.17.beosmp

That chews for a little while, and the debug output looks OK.  I noticed
that it recreated the /var/beowulf/boot.img file though ...  Why ?

In any case, I reboot on of the dual nodes with the *existing* boot floppy
(created last week from beosetup gui) and it appears to come up OK, but not
all the way.  The log shows that it's still trying to use the UP modules in
/lib/modules/2.2.17.beo.

Dang.  I thought the phase 1 kernel on the floppy was a generic one (from
/boot/vmlinuz.beoboot, which is just a symlink to
vmlinuz-2.2.17-33.beobeoboot :) just to get to the master node to boot the
"real" kernel - the smp one just installed.  Hmmm, OK, recreate the floppy
from the gui.  I don't think it should be different, but just in case.

This is worse.  It dies while loading vmlinuz, asking for another boot disk.

Ah ha (again).  I just noticed module-info in /boot, pointing to the UP
modules-info-2.2.17-33.beo file rather than the smp one.  I just changed
that and rebooted a couple of the dual nodes.  We'll see.

Nope.  Systems booted from the old floppy still try to use the UP modules in
/lib/modules/2.2.17-33.beo

So where exactly do all the kernels fit in ?

/boot/vmlinuz-2.2.17-33.beobeoboot	aka vmlinuz.beoboot
/var/beowulf/boot.img			What's the connection between these
for the floppy ?
/boot/vmlinuz-2.2.17-33.beosmp

Sorry for being thick here ...

--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)


-----Original Message-----
From: Jag [mailto:agrajag at linuxpower.org]
Sent: Monday, March 26, 2001 3:22 PM
To: Carpenter, Dean
Cc: 'beowulf at beowulf.org'
Subject: Re: SMP support with the scyld package


On Mon, 26 Mar 2001, Carpenter, Dean wrote:

> Hummm.  I see that it ships with the 2.2.17 sources, but I don't see any
SMP
> kernels lying around ...

It's in the kernel-smp package.
ftp://ftp.scyld.com/pub/beowulf/current/RPMS/i686/kernel-smp-2.2.17-33.beo.i
686.rpm

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From chendric at qssmeds.com  Tue Mar 27 13:24:42 2001
From: chendric at qssmeds.com (Chris Hendrickson)
Date: Tue, 27 Mar 2001 16:24:42 -0500
Subject: Master node install stops during "performing post install configuration"
References: <BD674CDC0A68C642B60038986FBB56DF0138EF@mailsrv.ASPATECH.COM.BR>
Message-ID: <3AC1051A.6060504@qssmeds.com>

Chris Richard Adams wrote:

> uhhhh... Not sure if this is the cause, but should I have the slave
> nodes already attached to the network.  Just looking in the docs, I see
> that during install it will write to the floppy disks of each slave
> node. Could this be the cause, because i do not have any slave nodes
> attached yet? 
> 
> Thanks,
> CHris
> 
that should not be an issue, the master node does not actually write to 
the floppys of the compute nodes. After the install however, you must 
create a boot floppy for each node (the floppy will be created on the 
master node and then walked over to the computer nodes), but even so, 
that does not come into play untill after the reboot.

Chris

-- 
"The box said requires Windows 95 or better... So I installed Linux"

Chris Hendrickson
QSS Group. Inc - MEDS
NASA/Goddard Space Flight Center
Voice: (301) 867-0081 Fax: (301) 867-0089


From chrisa at ASPATECH.COM.BR  Tue Mar 27 13:39:01 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Tue, 27 Mar 2001 18:39:01 -0300
Subject: Master node install stops during "performing post install configuration"
Message-ID: <BD674CDC0A68C642B60038986FBB56DF0138F1@mailsrv.ASPATECH.COM.BR>

I entered the eth1 data manually when it presents both eth0 & eth1, then
I noticed it presents the same info (i think the same info) as default
in the following install window.  I'll leave all eth1 data blank and see
what happens.

Thanks...

> -----Original Message-----
> From: Chris Hendrickson [mailto:chendric at qssmeds.com]
> Sent: Tuesday, March 27, 2001 6:25 PM
> To: Chris Richard Adams
> Cc: Beowulf (E-mail)
> Subject: Re: Master node install stops during "performing post install
> configuration"
> 
> 
> Chris Richard Adams wrote:
> 
> > uhhhh... Not sure if this is the cause, but should I have the slave
> > nodes already attached to the network.  Just looking in the 
> docs, I see
> > that during install it will write to the floppy disks of each slave
> > node. Could this be the cause, because i do not have any slave nodes
> > attached yet? 
> > 
> > Thanks,
> > CHris
> > 
> that should not be an issue, the master node does not 
> actually write to 
> the floppys of the compute nodes. After the install however, you must 
> create a boot floppy for each node (the floppy will be created on the 
> master node and then walked over to the computer nodes), but even so, 
> that does not come into play untill after the reboot.
> 
> Chris
> 
> -- 
> "The box said requires Windows 95 or better... So I installed Linux"
> 
> Chris Hendrickson
> QSS Group. Inc - MEDS
> NASA/Goddard Space Flight Center
> Voice: (301) 867-0081 Fax: (301) 867-0089
> 
> 


From lowther at att.net  Tue Mar 27 15:33:14 2001
From: lowther at att.net (lowther at att.net)
Date: Tue, 27 Mar 2001 18:33:14 -0500
Subject: Matlab on B2
References: <Pine.LNX.4.30.0103272146100.8727-100000@wlug.westbo.se>
Message-ID: <3AC1233A.B1CCA913@att.net>

Daniel Persson wrote:
> 
> Hi again,
> 
> Since the mailarchive isnt browseble this month, could someone help me
> with getting Matlab to run on a Scyld cluster ?
> 
> I know that there where discussion about it and i belive that someone had
> a nice solution for it (or was it with octave ?)
> 
> /Daniel
> 

I don't remember the exact particulars, but there is a parallel version
of octave out there.  I did a google search.  Try "octave" +
"parallel".  

-- 
Ken Lowther
Youngstown, Ohio
http://www.atmsite.org


From carlos at nernet.unex.es  Wed Mar 28 01:09:25 2001
From: carlos at nernet.unex.es (=?Windows-1252?Q?Carlos_J._Garc=EDa_Orellana?=)
Date: Wed, 28 Mar 2001 11:09:25 +0200
Subject: BeoMPI doesn't work with more than 3 nodes
Message-ID: <01a401c0b766$c98ca9c0$7c12319e@unex.es>

Hello,

I want to work with BeoMPI in our cluster, so I have started with examples.
First, it doesn't work because I was using a wrong 'mpirun' script, after
that,
the 'cpi' example works fine with 1, 2 or 3 processors.

However, when I try to use more nodes, it doesn?t work, why?

Please, which is the right setup to work with BeoMPI?.

Thanks.

Carlos.

PD: Output of executing 'cpi' with 3 and 4 nodes (p4dbg=10)

[root at nereapc mpiex]# mpirun -np 3 ./cpi -p4dbg 10
10: xm_30944: (-) using procgroup file /proc/self/fd/3
10: p0_30944: (0.000007) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.001176) hostname in first line of procgroup is -1
10: p0_30944: (0.001228) hostname for first entry in proctable is -1
10: p0_30944: (0.001257) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.001421) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30946: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.050455) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30948: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.098959) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30944: (0.101561) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.101661) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30944: (0.101728) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.101772) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30944: (0.101811) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.101868) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30944: (0.141063) received type=1010101010, from=1
10: p0_30944: (0.141094) received type=1010101010, from=2
10: p0_30944: (0.141155) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30944: (0.141198) sent msg of type 1010101010 from 0 to 2 via socket
7
Process 0 on -1
10: p0_30944: (0.143268) sent msg of type 0 from 0 to 2 via socket 7
10: p0_30944: (0.143493) sent msg of type 0 from 0 to 1 via socket 6
Process 2 on 2
Process 1 on 1
10: p0_30944: (0.143752) received type=0, from=1
10: p0_30944: (0.143826) received type=0, from=2
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000780
10: p0_30944: (0.143927) sent msg of type 0 from 0 to 2 via socket 7
10: p0_30944: (0.143973) sent msg of type 0 from 0 to 1 via socket 6
10: p0_30944: (0.144176) received type=0, from=2
10: p0_30944: (0.144249) sent msg of type 0 from 0 to 1 via socket 6
10: p0_30944: (0.144301) received type=0, from=1
10: p0_30944: (0.144346) sent msg of type 0 from 0 to 2 via socket 7
[root at nereapc mpiex]#

[root at nereapc mpiex]# mpirun -np 4 ./cpi -p4dbg 10
10: xm_30951: (-) using procgroup file /proc/self/fd/3
10: p0_30951: (0.000012) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.001121) hostname in first line of procgroup is -1
10: p0_30951: (0.001220) hostname for first entry in proctable is -1
10: p0_30951: (0.001265) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.001425) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.001529) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30953: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.050602) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30955: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.099172) Beowulf: using beowulf version of gethostbyname_p4
10: rm_30957: (-) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.147644) Beowulf: using beowulf version of gethostbyname_p4
10: p0_30951: (0.151243) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30951: (0.151315) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30951: (0.151384) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30951: (0.151449) sent msg of type 1010101010 from 0 to 2 via socket
7
10: p0_30951: (0.151493) sent msg of type 1010101010 from 0 to 1 via socket
6
10: p0_30951: (0.151557) sent msg of type 1010101010 from 0 to 2 via socket
7
p1_30953:  p4_error: Timeout in establishing connection to remote process: 0


From deadline at plogic.com  Wed Mar 28 05:26:45 2001
From: deadline at plogic.com (Douglas Eadline)
Date: Wed, 28 Mar 2001 08:26:45 -0500 (EST)
Subject: BERT 77: Automatic Parallelizer
In-Reply-To: <3AB9DC71.BC9A0EE3@scali.no>
Message-ID: <Pine.LNX.4.30.0103280751390.21687-100000@otto.plogic.internal>

On Thu, 22 Mar 2001, Ole W. Saastad wrote:

> Hi,
> I have some questions about the Bert 77 Automatic Parallelizer
> from Paralogic. (http://www.plogic.com/bert.html)
>
> At first sight it looks nice and shiny, but how is the users
> experience?

If you need to do more than the demo version let me know.

#define software_conversion_soap_box

In general FORTRAN conversion (any conversion) is a tricky thing.

First, as far as I know, there is no "silver bullet" tool
that will take any standard F77 code and produce a nice scalable
application for a parallel computer.  As we have found, almost
every code can be better parallelized by user intervention.
In this regard, BERT provides a PROCESS for doing a parallelization.

Second, it is important to understand the difference between
concurrent and parallel (concurrent parts of your program are parts of
the program that can run independently and parallel parts of
your program are concurrent parts that run on different processors)

Presumably you want your parallel parts to run faster than
sequentially. Concurrency does not imply parallel execution.
Executing concurrent parts of a code in parallel is function
of the machine. (i.e. it is an efficiency issue)
With respect to efficiency BERT does a good job.

#endif

>
> Many of my colleges run climate models with OpenMP on fast
> sequential machines and would consider MPI based clusters if
> they could get some help to make the transition.
> This tool might be useful as a start to switch from OpenMP
> or sequential code to MPI based code. The task of converting
> programs from sequential to parallel is not a trivial one and
> I am very interested in how well Paralogic's program perform.
>
We don't support OpenMP directives. But these directives can
aid in the placement of BERT directives.

> I test would be to crunch the g98 fortran source code through
> and see if it gave any reasonable results.

We have done some large codes - in excess of 100K lines.
First thing to remember is that the global analysis required
for big codes takes some time, so some time is need to
work with these codes. (i.e. it is not a simple point and
click afternoon experience)

> Have anyone of you done test with programs this size ?
>
> The examples shows tremendous speedup by splitting loops over
> many nodes, but in real lift things are different.
>

Of course we give some examples that show good speedup.
But, what is most interesting, some of the example show very
poor results on some of the example machine profiles.

However, the thing to remember, the true achievable speed-up for your
application is based the algorithm (no tool rewrites algorithms),
the properties of the machines (i.e. CPU, interconnect, compiler,
message passing API, etc.) and how efficiently you can map the
concurrent parts of your algorithm to a specific machine.
BERT attempts to help the user do this.

It should also be noted that BERT can provide "negative"
results (i.e. it is very hard to parallelize this application
and speedup will be minimal). Although, not good news, it is
an important data point about your application. There
have been people who have spent months converting a code
only to learn that is will not work well on a specific
parallel computer.


Doug

-- 
-------------------------------------------------------------------
Paralogic, Inc.           |     PEAK     |      Voice:+610.814.2800
130 Webster Street        |   PARALLEL   |        Fax:+610.814.5844
Bethlehem, PA 18015 USA   |  PERFORMANCE |    http://www.plogic.com
-------------------------------------------------------------------


From sshealy at asgnet.psc.sc.edu  Wed Mar 28 09:08:28 2001
From: sshealy at asgnet.psc.sc.edu (Scott Shealy)
Date: Wed, 28 Mar 2001 12:08:28 -0500
Subject: Matlab on B2
Message-ID: <5773B442597BD2118B9800105A1901EE1B4DC9@asgnet2>


>Daniel Persson wrote:
> 
> Hi again,
> 
> Since the mailarchive isnt browseble this month, could someone help me
> with getting Matlab to run on a Scyld cluster ?
> 
> I know that there where discussion about it and i belive that someone had
> a nice solution for it (or was it with octave ?)
> 
> /Daniel
> 

>I don't remember the exact particulars, but there is a parallel version
>of octave out there.  I did a google search.  Try "octave" +
>"parallel".  

Also try sci-lab(also open source) they also have some parallel support.


From JParker at coinstar.com  Wed Mar 28 09:24:45 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Wed, 28 Mar 2001 09:24:45 -0800
Subject: Channel Bonding
Message-ID: <OF5E2816BB.C60322C1-ON08256A1D.005F3C52@coinstar.com>

G'Day !

If I have 8 nodes each with (2) 100Mb Ethernet cards, and I want to use 
channel bonding.  Can I use (1) 16 port switch, or do I need (2) 8 ports ? 
 If I use the (2) 8 ports, do the switches need to be connected to each 
other or are they isolated from each other ?

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010328/92cdfc21/attachment.html>

From tmattox at engr.uky.edu  Wed Mar 28 10:57:30 2001
From: tmattox at engr.uky.edu (Timothy I Mattox)
Date: Wed, 28 Mar 2001 13:57:30 -0500 (EST)
Subject: Channel Bonding
In-Reply-To: <OF5E2816BB.C60322C1-ON08256A1D.005F3C52@coinstar.com>
Message-ID: <Pine.LNX.4.30.0103281343340.7300-100000@skyhawk.ecc.engr.uky.edu>

Hello,
Unless the 16 port switch can be configured to handle it (and most can
not as far as I know), you would need two 8 port switches that are NOT
connected together.  Some high end switches have a form of trunking
(I'm not sure which flavor of trunking will work) that can properly
handle having more than one connection appear to have the same MAC
address.  Also, from comments here on the list, it seems that not all
VLAN support is created equal, so splitting a 16 port switch into two
VLANs won't necessarily work either.

The fundamental problem is that channel bonding makes several NICs in
the same box have identical MAC addresses, and that breaks the most
commonly used method(s) for routing ethernet packets inside of switches,
since MAC addresses are supposed to be unique.

Channel bonding is a very effective technique when configured properly.

On Wed, 28 Mar 2001 JParker at coinstar.com wrote:
> G'Day !
>
> If I have 8 nodes each with (2) 100Mb Ethernet cards, and I want to use
> channel bonding.  Can I use (1) 16 port switch, or do I need (2) 8 ports ?
>  If I use the (2) 8 ports, do the switches need to be connected to each
> other or are they isolated from each other ?
>
> cheers,
> Jim Parker
>
> Sailboat racing is not a matter of life and death ....  It is far more
> important than that !!!

-- 
Tim Mattox - tmattox at ieee.org - http://home.earthlink.net/~timattox
   http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/


From jakob at unthought.net  Wed Mar 28 13:25:14 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Wed, 28 Mar 2001 23:25:14 +0200
Subject: Channel Bonding
In-Reply-To: <Pine.LNX.4.30.0103281343340.7300-100000@skyhawk.ecc.engr.uky.edu>; from tmattox@engr.uky.edu on Wed, Mar 28, 2001 at 01:57:30PM -0500
References: <OF5E2816BB.C60322C1-ON08256A1D.005F3C52@coinstar.com> <Pine.LNX.4.30.0103281343340.7300-100000@skyhawk.ecc.engr.uky.edu>
Message-ID: <20010328232513.A21554@unthought.net>

On Wed, Mar 28, 2001 at 01:57:30PM -0500, Timothy I Mattox wrote:
> Hello,
> Unless the 16 port switch can be configured to handle it (and most can
> not as far as I know), you would need two 8 port switches that are NOT
> connected together.  Some high end switches have a form of trunking
> (I'm not sure which flavor of trunking will work) that can properly
> handle having more than one connection appear to have the same MAC
> address.  Also, from comments here on the list, it seems that not all
> VLAN support is created equal, so splitting a 16 port switch into two
> VLANs won't necessarily work either.
> 
> The fundamental problem is that channel bonding makes several NICs in
> the same box have identical MAC addresses, and that breaks the most
> commonly used method(s) for routing ethernet packets inside of switches,
> since MAC addresses are supposed to be unique.

At work we use an intel switch that allows "trunking" of several ports.
however, a 24 port switch has three "groups" of eight ports each (or was it four
groups of six ? I forgot), and you can only do trunking between ports in the
same group, and you can only trunk once in each group. Thus, we can only create
three (or four) trunk "sets" for each switch.

This is very switch-specific - you should check the capabilities of your own
switch.   I think this kind of capability is becoming more normal in lower
end switches as well.

However, once the switch is set up to trunk a few ports, enabling it in RedHat
7 with a 2.4 kernel is so easy it's almost cheating   :)  It works very well
indeed.  The RedHat initscripts are prepared for this setup, so there's no
special hackery needed at all.  I don't know about other distributions.

It's correct that the kernel uses the same MAC on all NICs that are trunked,
but this is what the switch expects, and it's the only sane way to do it as I
see it.   And I don't know why VLANs got involved in this discussion at all :)

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:


From JParker at coinstar.com  Wed Mar 28 13:44:17 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Wed, 28 Mar 2001 13:44:17 -0800
Subject: Channel Bonding
Message-ID: <OFD0E2F780.812D367E-ON88256A1D.00771F08@coinstar.com>

G'Day !

Thank you for all your responses.  In summary it sounds like the cheapest 
and easiest way is to use (2) 8 port switches that are independent of each 
other.  (ie not uplinked).

Do they have to be switches or will a generic non-switching hub work ?

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!
----- Forwarded by Jim Parker/Coinstar Inc/US on 03/28/01 01:47 PM -----


Jim Parker
03/28/01 09:24 AM

 
        To:     beowulf at beowulf.org
        cc: 
        Subject:        Channel Bonding

G'Day !

If I have 8 nodes each with (2) 100Mb Ethernet cards, and I want to use 
channel bonding.  Can I use (1) 16 port switch, or do I need (2) 8 ports ? 
 If I use the (2) 8 ports, do the switches need to be connected to each 
other or are they isolated from each other ?

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010328/b2eb6743/attachment.html>

From tmattox at engr.uky.edu  Wed Mar 28 15:20:06 2001
From: tmattox at engr.uky.edu (Timothy I Mattox)
Date: Wed, 28 Mar 2001 18:20:06 -0500 (EST)
Subject: Channel Bonding
In-Reply-To: <20010328232513.A21554@unthought.net>
Message-ID: <Pine.LNX.4.30.0103281632340.8096-100000@skyhawk.ecc.engr.uky.edu>

Hello,
Please excuse beating this dead horse...

No, a "normal" switch should NOT expect to see the same MAC address
coming from more than one port at any one time.  The standards body that
formalized the ethernet standard set up a mechanism for the entire
industry to be able to guarantee that every single Ethernt NIC sold in
the world has a unique MAC address!  It is a fundamental part of how
ethernet works.

The way Channel Bonding is implemented violates this unique MAC address
per device standard, by deliberately making several NICs have the same
MAC address.  It is a very cool way to have implemented the concept so
that higher layers of the network stack could be oblivious to the fact
that the lower level packets were going out (and coming in) on different
NICs.  However, you usually can't just wire it all up and have it "just
work", but I'll get back to that in a moment...

Each time a packet with a particular source MAC address goes through
a unmanaged switch, the port it arrived on is recorded in the switch's
routing table, replacing any previous entry for that MAC address.
If you connect several NICs with the same MAC address to the switch,
at any one time it would have only one port listed for a particular
MAC address, whichever NIC was the last to send out a packet.
Granted, not all switches have to work this way, but unmanaged
switches some how have to learn where each MAC address is located...
So, for all the posts from people saying "I tried channel boding,
and it was much SLOWER than when I used just one NIC"... this is what
is likely going on... your switch at any one time will be sending
all the packets for a particular MAC address down ONE port, and thus
blocking, and overflowing, and just making a real mess.

Originally the way to make channel bonding work was to make N copies
of your network, where N is the number of NICs you intend to bond
together.  And your N networks had to be isolated from each other,
so that their tree-spanning algorithm, etc. would not get confused
by seeing the same MAC coming from multiple places.

More modern/advanced/expensive switches have added the ability to
properly handle having the same MAC address appear to be connected
to several ports.  This has gone under a variety of names, but I think
the most commonly used term is "trunking".  As far as I was aware,
not all implementations of trunking can be used to connect bonded
NICs.  Please correct me if I am wrong on this aspect of trunking.

My comment about VLAN's not helping out was that it would seem that
you could split a switch in two (or three) parts, each as a Virtual LAN,
and then connect up your bonded NICs, one to each VLAN segment.
However, that appears not to always work either.  My guess is that the
internal routing tables in some switches with VLAN support will still do
lookups based on MAC addresses, and keep only one entry per MAC address.
I am only guessing on this since we haven't played with VLAN stuff yet
in our lab.

Here is some OLD documentation for channel bonding:
http://www.beowulf-underground.org/doc_project/BIAA-HOWTO/Beowulf-Installation-and-Administration-HOWTO-12.html
The most resent bonding documentation I can find is in the kernel source:
/usr/src/linux/Documentation/networking/bonding.txt
However, that document ASSUMES you will be using switches that
support trunking... ignoring that channel bonding worked
before such switches existed (1994?).

Anyway, my point was that it takes a special switch to be able to
do channel bonding WITHIN one switch.  So to answer the original question
of do I need one 16 way switch, or two 8 way switches for an 8 cluster
still is answered by:  Get two cheap unmanaged 8-port switches
and do NOT tie them together.  You can get 10/100 8-port switches
for less than $80... see http://www.buy.com for a few choices.

I would love to know of alternatives that would not use the
duplicated MAC address implementation of channel bonding.
Our FNN work with KLAT2 has made me look around for such alternatives,
and the closest I came across was the work to combine more than one PPP
connection (dual channel ISDN, or multiple regular modems).
It looks like we are going to have to roll our own solution when
we have time to do it...


On Wed, 28 Mar 2001, Jakob ?stergaard wrote:
> On Wed, Mar 28, 2001 at 01:57:30PM -0500, Timothy I Mattox wrote:
> > Hello,
> > Unless the 16 port switch can be configured to handle it (and most can
> > not as far as I know), you would need two 8 port switches that are NOT
> > connected together.  Some high end switches have a form of trunking
> > (I'm not sure which flavor of trunking will work) that can properly
> > handle having more than one connection appear to have the same MAC
> > address.  Also, from comments here on the list, it seems that not all
> > VLAN support is created equal, so splitting a 16 port switch into two
> > VLANs won't necessarily work either.
> >
> > The fundamental problem is that channel bonding makes several NICs in
> > the same box have identical MAC addresses, and that breaks the most
> > commonly used method(s) for routing ethernet packets inside of switches,
> > since MAC addresses are supposed to be unique.
>
> At work we use an intel switch that allows "trunking" of several ports.
> however, a 24 port switch has three "groups" of eight ports each (or was it four
> groups of six ? I forgot), and you can only do trunking between ports in the
> same group, and you can only trunk once in each group. Thus, we can only create
> three (or four) trunk "sets" for each switch.
>
> This is very switch-specific - you should check the capabilities of your own
> switch.   I think this kind of capability is becoming more normal in lower
> end switches as well.
>
> However, once the switch is set up to trunk a few ports, enabling it in RedHat
> 7 with a 2.4 kernel is so easy it's almost cheating   :)  It works very well
> indeed.  The RedHat initscripts are prepared for this setup, so there's no
> special hackery needed at all.  I don't know about other distributions.
>
> It's correct that the kernel uses the same MAC on all NICs that are trunked,
> but this is what the switch expects, and it's the only sane way to do it as I
> see it.   And I don't know why VLANs got involved in this discussion at all :)

-- 
Tim Mattox - tmattox at ieee.org - http://home.earthlink.net/~timattox
   http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/


From Dean.Carpenter at pharma.com  Thu Mar 29 07:46:43 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Thu, 29 Mar 2001 10:46:43 -0500
Subject: SMP support with the scyld package
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47F5@a1mbx01.pharma.com>

Hummm.  List is being very quiet these days.  Has anyone had luck/experience
with the stuff I posted earlier ?

---

Given a raw Scyld install, can anyone show a cookbook sequence to converting
the master node as well as the slave nodes to a custom kernel ?  Not just
the SMP kernel from the rpm - we'll want to play with MOSIX, as well as the
2.2.19 kernel.

1. Install x and y packages to solve problem #2 above so we can actually
build kernels.
2. Pull down virgin kernel sources and untar into /usr/src/linux
3. Integrate the bproc patches .....  (how - what else ?)
4. Integrate MOSIX (and any other cool) patches
5. Build the kernel and modules (any special considerations ?)
6. Build the custom phase 2 boot image
7. Rock and roll :)

--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)


From jearle at nortelnetworks.com  Thu Mar 29 07:52:07 2001
From: jearle at nortelnetworks.com (Jonathan Earle)
Date: Thu, 29 Mar 2001 10:52:07 -0500
Subject: Channel Bonding
Message-ID: <28560036253BD41191A10000F8BCBD116BDDD1@zcard00g.ca.nortel.com>

> -----Original Message-----
> From: Timothy I Mattox [mailto:tmattox at engr.uky.edu]
> 
> So, for all the posts from people saying "I tried channel boding,
> and it was much SLOWER than when I used just one NIC"... this is what
> is likely going on... your switch at any one time will be sending
> all the packets for a particular MAC address down ONE port, and thus
> blocking, and overflowing, and just making a real mess.

If all the data went down only one port, then at worst, shouldn't the speed
have been the same as when only one NIC was used?

In my case, two PCs, each with a Znyx 4port card (tulip bsed), kernel
2.4.0-test9, I didn't use a switch, but instead wired each port to the other
PC using a crossover cable (PC1/port1 to PC2/port1, PC1/port2 to PC2/port2,
etc).  Even in this config, the speed was dramatically slower than using one
NIC.

Jon


From r.grenyer at ic.ac.uk  Thu Mar 29 09:04:25 2001
From: r.grenyer at ic.ac.uk (Grenyer, Richard)
Date: Thu, 29 Mar 2001 18:04:25 +0100
Subject: Odd request
Message-ID: <B8E71A89E7E0D411B0E200A0C9D85B266A9D63@icex6.cc.ic.ac.uk>

Someone wrote:

>And your N networks had to be isolated from each other, 
>so that their tree-spanning algorithm, etc. would not get confused 

What is this tree-spanning algorithm. Is it detailed anywhere? What would I
need to search around under - keywords, acronyms, similar?

Many thanks,

Rich Grenyer


From tmattox at engr.uky.edu  Thu Mar 29 10:39:15 2001
From: tmattox at engr.uky.edu (Timothy I Mattox)
Date: Thu, 29 Mar 2001 13:39:15 -0500 (EST)
Subject: Odd request
In-Reply-To: <B8E71A89E7E0D411B0E200A0C9D85B266A9D63@icex6.cc.ic.ac.uk>
Message-ID: <Pine.LNX.4.30.0103291320350.13989-100000@skyhawk.ecc.engr.uky.edu>

Q: "Spanning tree algorithm?"
A:

I have found a good reference for ethernet is the O'Reilly book:
"Ethernet: The Definitive Guide" by Charles E. Spurgeon, 2000.

There is a standard method that ethernet switches use to detect
loops, and will shut down one of the links in the loop.
Essentially they form a tree by sending out special packets
amongst themselves.

I just did a google search on "spanning tree algorithm ethernet switch"
and found this from Cisco:
http://www.cisco.com/univercd/cc/td/doc/product/rtrmgmt/sw_ntman/cwsimain/cwsi2/cwsiug2/vlan2/stpapp.htm
"Understanding Spanning-Tree Protocol"

And this article from Network World Fusion:
http://www.nwfusion.com/netresources/0524spanning.html
"Spanning tree is still with us"

On Thu, 29 Mar 2001, Grenyer, Richard wrote:
> Someone wrote:
>
> >And your N networks had to be isolated from each other,
> >so that their tree-spanning algorithm, etc. would not get confused
>
> What is this tree-spanning algorithm. Is it detailed anywhere? What would I
> need to search around under - keywords, acronyms, similar?
>
> Many thanks,
>
> Rich Grenyer

-- 
Tim Mattox - tmattox at ieee.org - http://home.earthlink.net/~timattox
   http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/


From bob at drzyzgula.org  Thu Mar 29 10:45:59 2001
From: bob at drzyzgula.org (Bob Drzyzgula)
Date: Thu, 29 Mar 2001 13:45:59 -0500
Subject: Odd request
In-Reply-To: <B8E71A89E7E0D411B0E200A0C9D85B266A9D63@icex6.cc.ic.ac.uk>
References: <B8E71A89E7E0D411B0E200A0C9D85B266A9D63@icex6.cc.ic.ac.uk>
Message-ID: <20010329134559.A18701@drzyzgula.org>

The "Spanning Tree Algorithm" has its roots (sorry) in
the branch (geez, I really didn't set out to do this) of
mathematics known as Graph Theory. It describes a mechanism
whereby a cycle-free subtree of a connected graph, reaching
every node of the graph, can be identified. As with most
of these things, there's more than one way to do it --
any given graph will potentially have several different
spanning subtrees, although some will be "better" or
"more optimal" than others according to some criteria
or another. In particular, depending on the problem
one is attempting to solve, one might be looking for a
spanning subtree which maximizes or minimizes the number
or aggregate length (or perhaps "cost") of the arcs
which connect the various nodes. In LANs, one usually
is interested in a shortest-path spanning tree. Any
decent book on graph theory should discuss this -- I just
refreshed my memory on this out of my copy of "Flows in
Networks", by Ford & Fulkerson, Princeton 1962 (out of
print, sadly).

When translated to a network, however, a spanning tree
algorithm is used as the basis of what is then called a
"Spanning Tree Protocol". The whole point here is to find
a set of point-to-point links in a network over which you
can reach any network attachment point, but within which
one cannot get caught in a circle without immediately
backtracking over the last link followed. Since MAC-layer
bridges (and hence switches, which are fancy bridges)
know only about "here" and "there" and not "over yonder" --
their view does not extend beyond the immediately-connected
segements -- the failure to limit the usable paths in this
way can result in packets spinning around in a cycle and
never being forwarded to their intended destination.

As with just about any protocol, this too has been subject
to revision, reuse and refinement over the years. The
protocol used by modern switches is specified by the
IEEE 802.1d document which, alas, is not available for
free download anywhere -- IEEE charges real money for
their standards documents.  I believe that the protocol
used in 802.1d is based on an earlier DEC protocol, but
incompatibly so.

Just about any reasonably detailed book on network technology
should contain an explaination of spanning tree. As
I was searching for some other references (no, I didn't
just write up all this detail from memory), one book that
kept coming up was "Interconnections", by Radia Pearlman.
At Amazon, sixteen reviewers give it all five stars, one
only gives it four stars because he found himself
underprepared :-) Thus, I assume that it's a decent reference:

http://www.amazon.com/exec/obidos/ASIN/0201634481/

Hope this helps,
--Bob

On Thu, Mar 29, 2001 at 06:04:25PM +0100, Grenyer, Richard wrote:
> Someone wrote:
> 
> >And your N networks had to be isolated from each other, 
> >so that their tree-spanning algorithm, etc. would not get confused 
> 
> What is this tree-spanning algorithm. Is it detailed anywhere? What would I
> need to search around under - keywords, acronyms, similar?
> 
> Many thanks,
> 
> Rich Grenyer
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From lusk at mcs.anl.gov  Thu Mar 29 11:15:21 2001
From: lusk at mcs.anl.gov (Rusty Lusk)
Date: Thu, 29 Mar 2001 13:15:21 -0600
Subject: MPI/Beowulf vs. SM Programming/OpenMP 
In-Reply-To: Message from "Mattson, Timothy G" <timothy.g.mattson@intel.com> 
   of "Fri, 23 Mar 2001 14:02:57 PST." <F70F37F77E9FD211AC3F00A0C96B78DA03F8CD81@orsmsx47.jf.intel.com> 
Message-ID: <200103291915.NAA08188@mcs.anl.gov>

| A general MPMD algorithm with lots of assynchronous events would be hard to
| do with OpenMP.  (actually, it can be hard with MPIch as well, but then you
| can go with MPI-LAM or PVM).

Hi Tim,

I am curious as to what you are referring to with regard to MPICH.  

Rusty


From chrisa at ASPATECH.COM.BR  Thu Mar 29 12:41:15 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Thu, 29 Mar 2001 17:41:15 -0300
Subject: Installing module 3c90x.o returns: init_module: Device busy
Message-ID: <BD674CDC0A68C642B60038986FBB56DF0138FB@mailsrv.ASPATECH.COM.BR>

Hi all;

I need to create the module for the 3c905C-TX card.  I got the source
code from the Scyld site and followed the directions explicitly.  I
compiled with only a few warnings.  When I try to install the module
with 'insmod 3c90x.o' I get the error: 3c90x.o init_module: Device or
Resource busy.

I am running the Beowulf 2 prerelease Scyld version 2.2-16.  I did not
see this driver installed to begin with, hence this is why I am building
it.  

Suggestions??
THanks,
Chris


From bill at billnorthrup.com  Thu Mar 29 13:19:51 2001
From: bill at billnorthrup.com (Bill Northrup)
Date: Thu, 29 Mar 2001 13:19:51 -0800
Subject: VP6 -would you like to test it?
Message-ID: <006001c0b895$fdd9c840$1901a8c0@bob>

Hello,

I have just brought a Abit VP6 online with 2x 1000mhz and 256mb of ram.. The SCYLD#2 distro was slammed on and a few Compaq desktops were recruited as workers (1 450 int and the other a 380 AMD). A Cisco 3524 rounds out the connections as it hangs on my DSL here at home.  I am letting it burn in for awhile and during that time,  I would like to open it up to few folks to maybe test you app or favorite benchmarks. Drop me an email off the list and I will be more than happy to try and accomodate you.

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010329/9b794ba4/attachment.html>

From Dean.Carpenter at pharma.com  Thu Mar 29 15:10:05 2001
From: Dean.Carpenter at pharma.com (Carpenter, Dean)
Date: Thu, 29 Mar 2001 18:10:05 -0500
Subject: SMP kernel for Scyld
Message-ID: <759FC8B57540D311B14E00902727A0C002EC47FD@a1mbx01.pharma.com>

Hey All -

Anyone have any luck with the kernel-smp-2.2.17-33.beo.i686.rpm package ?
Any special tricks to getting it run cleanly on the master or slave nodes ?
As I said before, it appears to have some trouble with the modules.

Just out of interest, why isn't the smp version the default instead of the
up one ?

--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)


From bill at math.ucdavis.edu  Thu Mar 29 16:10:24 2001
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Thu, 29 Mar 2001 16:10:24 -0800
Subject: How to use a beowulf class materials?
Message-ID: <20010329161024.A1623@sphere.math.ucdavis.edu>

I've been tasked to teach a class in using a beowulf.  I've parallelized a
few serial fortan codes to use MPI, so I have the technical side covered,
at least for the 1st quarter.

Anyone have suggestions for textbooks?  Know of existing class
outlines available on the web?  Any other pointers?

I'm considering: Parallel Programming With MPI by Peter Pacheco to
cover the MPI part, but probably need an additional book on parallel
algorithms and programming.

I suspect it's of interest to the list, so please follow up to the list.

--
Bill Broadley
Programmer/Admin
Mathematics/Institute of Theoretical Dynamics
University of California, Davis


From agrajag at linuxpower.org  Thu Mar 29 17:38:13 2001
From: agrajag at linuxpower.org (Jag)
Date: Thu, 29 Mar 2001 17:38:13 -0800
Subject: SMP kernel for Scyld
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC47FD@a1mbx01.pharma.com>; from Dean.Carpenter@pharma.com on Thu, Mar 29, 2001 at 06:10:05PM -0500
References: <759FC8B57540D311B14E00902727A0C002EC47FD@a1mbx01.pharma.com>
Message-ID: <20010329173813.V13901@kotako.analogself.com>

On Thu, 29 Mar 2001, Carpenter, Dean wrote:

> Hey All -
> 
> Anyone have any luck with the kernel-smp-2.2.17-33.beo.i686.rpm package ?

Yes

> Any special tricks to getting it run cleanly on the master or slave nodes ?

First I installe dthe kernel-smp package on the master node.  I then
edited the lilo.conf to boot the smp kernel.  I ran '/sbin/lilo', then
rebooted it.

I then made the /boot/vmlinuz symlink point to the smp kernel.  After
that, I ran beosetup and told it to create the BeoBoot file.  Then I
rebooted all the slave nodes and everything worked fine.

> As I said before, it appears to have some trouble with the modules.

I had this problem.  Booting into the smp kernel before remaking the
beoboot file seemed to fix it.
> 
> Just out of interest, why isn't the smp version the default instead of the
> up one ?

I'm not sure.  The default kernel should be decided by anaconda (Red
Hat's installer).


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010329/5e313b6c/attachment.sig>

From yoon at bh.kyungpook.ac.kr  Thu Mar 29 21:51:56 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Fri, 30 Mar 2001 14:51:56 +0900
Subject: How to use a beowulf class materials?
References: <20010329161024.A1623@sphere.math.ucdavis.edu>
Message-ID: <002301c0b8dd$8e79bd00$5f72f2cb@TEST>

> I've been tasked to teach a class in using a beowulf.  I've parallelized a
> few serial fortan codes to use MPI, so I have the technical side covered,
> at least for the 1st quarter.

If I have a chance to be a teacher like you, I want to be a coordinator not teach item like MPI code or ... ...  
I mean in the internet there are many discussion group(mailing list) or there archive.
It is the first step to learn. and the e-mail for the very important contact point when your student don't know or ask something to the authors. 

 
> Anyone have suggestions for textbooks?  Know of existing class
> outlines available on the web?  Any other pointers?

first of all, 

Will you show your student the www.beowulf.org 's discussion archive  & the linking site from various University or Org?
and www.beowulf-underground.org ?
and other mailing list, FAQ, howto or Usenet site related in mpi ..


> 
> I'm considering: Parallel Programming With MPI by Peter Pacheco to
> cover the MPI part, but probably need an additional book on parallel
> algorithms and programming.
> 
> I suspect it's of interest to the list, so please follow up to the list.

I read "Parallel Programming" by Wilkinson & Allen  Prentice Hall 
ISDN 0-13-671710-1
and the Program Source like mpich or LAM itself is starting point to teach, I think. 
and the Presentation & FAQ or manual by author is another point to start.

Thank you very much

---------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon at bh.kyungpook.ac.kr
jhyoon at mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
 
Imagination is more important than knowledge.  A. Einstein


From Eugene.Leitl at lrz.uni-muenchen.de  Fri Mar 30 01:00:12 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Fri, 30 Mar 2001 11:00:12 +0200 (MET DST)
Subject: Director of Sientific Computing (fwd)
Message-ID: <Pine.GSO.4.03.10103301059190.28364-100000@sun1.lrz-muenchen.de>

---------- Forwarded message ----------
Date: Fri, 30 Mar 2001 09:42:06 +0100
From: Stuart Mackenzie <msshc at WARWICK.AC.UK>
To: MOLECULAR-DYNAMICS-NEWS at JISCMAIL.AC.UK
Subject: Director of Sientific Computing

Apologies for possible multiple postings

I would be grateful if you would bring the following advertisement to the
attention of any suitably qualified colleagues.
_______________________________________________

Dr Stuart Mackenzie,
Lecturer in Physical Chemistry,
Department of Chemistry,
University of Warwick,
Coventry
CV4 7AL

tel. 02476 523241
fax. 02476 524112


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

UNIVERSITY OF WARWICK

Director of The Centre for Scientific Computing

The University is establishing a new Centre for Scientific
Computing which will be inter-disciplinary in nature.  In its initial
phases, the University will provide up to six academic and
technical staff positions to launch this development.  The first
step is to seek a Director at Professorial level.  The Director will
be closely involved in appointing other colleagues and will be
expected to develop the Centre, shape its intellectual profile and
nurture its growth into a major research enterprise.  It is also
intended that the Centre will become involved in graduate
education. The successful candidate should have an international
reputation for research, a significant proportion of which is
computationally intensive, and a broad vision for the
development of computationally-based research in the sciences,
engineering and other academic areas.

The individual will be attached to a particular Department
although their role will be primarily within the Centre.  Currently
the seven Departments engaged in the Centre's development are
Biological Sciences, Chemistry, Computer Science, Engineering,
Mathematics, Physics and Statistics. However, applicants from
any area are encouraged to apply.

Salary will be within the Professorial range.

For further information see

http://www.warwick.ac.uk/fac/sci/ScientificComputing/index.htm

Informal enquiries about the post may be made to Professor
Stuart Palmer, Chair of the Search Committee (tel: 02476
523399; e-mail: S.B.Palmer at warwick.ac.uk)

Further particulars can be obtained from the Personnel Office,
University of Warwick, Coventry, CV4 7AL. Telephone: 024
7652 3627 and from jobs.ac.uk/jobfiles/AC1076.html.
Applications (3 copies) should name three referees.
Please quote Ref No. 35/3A/00.   The closing date for
applications is:  11 May 2001.


From kodym at mit.jyu.fi  Fri Mar 30 02:12:18 2001
From: kodym at mit.jyu.fi (Petr Ladislav Kodym)
Date: Fri, 30 Mar 2001 13:12:18 +0300 (EEST)
Subject: How to use a beowulf class materials?
Message-ID: <Pine.LNX.4.10.10103301311540.1067-100000@btx.math.jyu.fi>

	Hi,

>Anyone have suggestions for textbooks?  Know of existing class
>outlines available on the web?  Any other pointers?
>
>I'm considering: Parallel Programming With MPI by Peter Pacheco to
>cover the MPI part,

Have a look at MPI course material from EPCC:

http://www.epcc.ed.ac.uk/epcc-tec/documents/psform-sn-mpi.html


> but probably need an additional book on parallel algorithms and programming.

What about this one:

Introduction to Parallel Computing : Design and Analysis of Parallel Algorithms
                         by Vipin Kumar, Ananth Grama, Anshul Gupta, George
                         Karypis

http://www.amazon.com/exec/obidos/ASIN/0805331700/qid%3D985946846/107-7176547-6865350


	Petr


From bogdan.costescu at iwr.uni-heidelberg.de  Fri Mar 30 02:29:43 2001
From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu)
Date: Fri, 30 Mar 2001 12:29:43 +0200 (CEST)
Subject: Installing module 3c90x.o returns: init_module: Device busy
In-Reply-To: <BD674CDC0A68C642B60038986FBB56DF0138FB@mailsrv.ASPATECH.COM.BR>
Message-ID: <Pine.LNX.4.30.0103301220590.30921-100000@kenzo.iwr.uni-heidelberg.de>

On Thu, 29 Mar 2001, Chris Richard Adams wrote:

> I need to create the module for the 3c905C-TX card.  I got the source
> code from the Scyld site and followed the directions explicitly.  I
> compiled with only a few warnings.  When I try to install the module
> with 'insmod 3c90x.o' I get the error: 3c90x.o init_module: Device or
> Resource busy.

I think that you are making some kind of confusion here. AFAIK, the Scyld
site does not distribute any driver called 3c90x - this is the one
distributed by 3Com. Don's driver is called 3c59x and you should have it
already compiled.
RedHat is distributing both drivers and announcing 3c90x as the right one
for 905B and 905C cards, although 3c59x supports them too. But I do have
the impression that their detection is not very good: anaconda would use
(and add to /etc/conf.modules) 3c59x, while later, kudzu would install
3c90x. Maybe this situation will change with the 2.4 kernel, as 3c90x in
its current form is not 2.4 ready.

Sincerely,

Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


From agrajag at linuxpower.org  Fri Mar 30 05:48:56 2001
From: agrajag at linuxpower.org (Jag)
Date: Fri, 30 Mar 2001 05:48:56 -0800
Subject: Installing module 3c90x.o returns: init_module: Device busy
In-Reply-To: <BD674CDC0A68C642B60038986FBB56DF0138FB@mailsrv.ASPATECH.COM.BR>; from chrisa@ASPATECH.COM.BR on Thu, Mar 29, 2001 at 05:41:15PM -0300
References: <BD674CDC0A68C642B60038986FBB56DF0138FB@mailsrv.ASPATECH.COM.BR>
Message-ID: <20010330054855.X13901@kotako.analogself.com>

On Thu, 29 Mar 2001, Chris Richard Adams wrote:

> Hi all;
> 
> I need to create the module for the 3c905C-TX card.  I got the source
> code from the Scyld site and followed the directions explicitly.  I
> compiled with only a few warnings.  When I try to install the module
> with 'insmod 3c90x.o' I get the error: 3c90x.o init_module: Device or
> Resource busy.
> 
> I am running the Beowulf 2 prerelease Scyld version 2.2-16.  I did not
> see this driver installed to begin with, hence this is why I am building
> it.  

I had similar problem when I tried the prerelease.  However, when I
tried the actual release (27bz-7), this problem went away.  With the
27bz-7 release it used the 3c59x driver for my 3c905C-TX ethernet card
and it works just fine.


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010330/800fb65c/attachment.sig>

From rgb at phy.duke.edu  Fri Mar 30 06:47:26 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 30 Mar 2001 09:47:26 -0500 (EST)
Subject: How to use a beowulf class materials?
In-Reply-To: <20010329161024.A1623@sphere.math.ucdavis.edu>
Message-ID: <Pine.LNX.4.30.0103300941020.32647-100000@ganesh.phy.duke.edu>

On Thu, 29 Mar 2001, Bill Broadley wrote:

>
> I've been tasked to teach a class in using a beowulf.  I've parallelized a
> few serial fortan codes to use MPI, so I have the technical side covered,
> at least for the 1st quarter.
>
> Anyone have suggestions for textbooks?  Know of existing class
> outlines available on the web?  Any other pointers?

It isn't written as a textbook and isn't finished, but

http://www.phy.duke.edu/brahma/beowulf_online_book/

("Engineering a Beowulf-Style Compute Cluster") might have some useful
stuff.  Contributions are always welcome as well.

Other links on the Brahma site are also likely to be useful.  Of course
you know about "How to Build a Beowulf Cluster" book by Sterling,
Becker, .... not exactly a textbook either but still "the" beowulf
resource.

> I'm considering: Parallel Programming With MPI by Peter Pacheco to
> cover the MPI part, but probably need an additional book on parallel
> algorithms and programming.

http://www-unix.mcs.anl.gov/dbpp/

(can also be purchased, I believe).  Excellent resource.

>
> I suspect it's of interest to the list, so please follow up to the list.
>
> --
> Bill Broadley
> Programmer/Admin
> Mathematics/Institute of Theoretical Dynamics
> University of California, Davis
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


From jared_hodge at iat.utexas.edu  Fri Mar 30 07:14:48 2001
From: jared_hodge at iat.utexas.edu (Jared Hodge)
Date: Fri, 30 Mar 2001 09:14:48 -0600
Subject: How to use a beowulf class materials?
References: <20010329161024.A1623@sphere.math.ucdavis.edu>
Message-ID: <3AC4A2E8.6BBC77FD@iat.utexas.edu>

If I were going to teach a class on Beowulf computing, I would start
with the systems perspective and go over various different setups (NOWs,
COWs, Beowulfs, and Big Iron computers).  Discuss networking
possibilities (Gigabit connections, channel bonding and the switch
placement algorithms, latency and Bandwidth considerations).  I would
also focus on diskless clusters some since this (I think) is the way
cluster computing is going.  In the second section I would get more into
parallel programming algorithms and techniques, etc.  You've obviously
got the background for this.  Remember that although Beowulf clusters
are getting more and more popular, probably very few of your students
will actually work with a "purebred" beowulf cluster and even if they
do, many of the tools will be somewhat different by the time they
graduate.  Teach them to understand why certain practices have become
highly used and they will be ahead of the game in any parallel computing
environment.  Be sure to include links to sources of information, too.

Bill Broadley wrote:
> 
> I've been tasked to teach a class in using a beowulf.  I've parallelized a
> few serial fortan codes to use MPI, so I have the technical side covered,
> at least for the 1st quarter.
> 
> Anyone have suggestions for textbooks?  Know of existing class
> outlines available on the web?  Any other pointers?
> 
> I'm considering: Parallel Programming With MPI by Peter Pacheco to
> cover the MPI part, but probably need an additional book on parallel
> algorithms and programming.
> 
> I suspect it's of interest to the list, so please follow up to the list.
> 
> --
> Bill Broadley
> Programmer/Admin
> Mathematics/Institute of Theoretical Dynamics
> University of California, Davis
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Jared Hodge
Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
FAX: 512-471-9096
Email: Jared_Hodge at iat.utexas.edu


From JParker at coinstar.com  Fri Mar 30 11:32:30 2001
From: JParker at coinstar.com (JParker at coinstar.com)
Date: Fri, 30 Mar 2001 11:32:30 -0800
Subject: VA Linux's system imager
Message-ID: <OF21490AD7.2757DB31-ON08256A1F.006B31CB@coinstar.com>

G'Day !

Has anyone used this ?  Any comments ?  Seems like a good way to keep a 
cluster updated.

http://systemimager.sourceforge.net/

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more 
important than that !!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010330/cefe7c47/attachment.html>

From timothy.g.mattson at intel.com  Fri Mar 30 11:44:57 2001
From: timothy.g.mattson at intel.com (Mattson, Timothy G)
Date: Fri, 30 Mar 2001 11:44:57 -0800
Subject: MPI/Beowulf vs. SM Programming/OpenMP 
Message-ID: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CDD6@orsmsx47.jf.intel.com>

Rusty,

I guess my post wasn't clear enough.  

I am working with a software vendor.  They need to launch several different
programs that interact through MPI.  You can do this in PVM, you can do this
with LAM/MPI, but you can't do this with MPIch.  Or I should say, I don't
think it can be done with MPIch.  The folks at this company looked into the
matter and independently reached the same conclustion.

As I see it, with MPIch, I must have everything in a single program.  Yes,
the programs can execite readically different pathways so in most cases,
this doesn't restrict the algorithms I can work with.  There are other
engineering issues, however, that sometimes force  me to run completely
different programs that cooperate through MPI.  This is the situation the
un-named software vendor is faced with.

So that is is what I meant by my general Multiple Program Multiple Data (or
MPMD) comment.

Does this make sense?  Did I get something wrong and it is indeed the case
that MPIch can handle this situation?  I'd love to be wrong as it would make
my life simpler if I could do everything with only one version of MPI.

--Tim  

-----Original Message-----
From: Rusty Lusk [mailto:lusk at mcs.anl.gov]
Sent: Thursday, March 29, 2001 11:15 AM
To: Mattson, Timothy G
Cc: 'Chris Richard Adams'; Beowulf (E-mail)
Subject: Re: MPI/Beowulf vs. SM Programming/OpenMP 


| A general MPMD algorithm with lots of assynchronous events would be hard
to
| do with OpenMP.  (actually, it can be hard with MPIch as well, but then
you
| can go with MPI-LAM or PVM).

Hi Tim,

I am curious as to what you are referring to with regard to MPICH.  

Rusty


From ctierney at hpti.com  Fri Mar 30 12:47:55 2001
From: ctierney at hpti.com (Craig Tierney)
Date: Fri, 30 Mar 2001 13:47:55 -0700
Subject: MPI/Beowulf vs. SM Programming/OpenMP
In-Reply-To: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CDD6@orsmsx47.jf.intel.com>; from timothy.g.mattson@intel.com on Fri, Mar 30, 2001 at 11:44:57AM -0800
References: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CDD6@orsmsx47.jf.intel.com>
Message-ID: <20010330134755.I17830@hpti.com>

I guess I am confused now.  Are you saying you want to be able
to startup a.out on 3 processors and b.out on 2 processors and
have them talk to each other over MPI (mpich)?

Mpich can do this (I think it is with the -p4pg option).  I use
gm over mpich and I do this as well.  I have to modify the mpirun
script (Myricom's verison doesn't support it) but I have users
that do this now on the system.

Craig

On Fri, Mar 30, 2001 at 11:44:57AM -0800, Mattson, Timothy G wrote:
> Rusty,
> 
> I guess my post wasn't clear enough.  
> 
> I am working with a software vendor.  They need to launch several different
> programs that interact through MPI.  You can do this in PVM, you can do this
> with LAM/MPI, but you can't do this with MPIch.  Or I should say, I don't
> think it can be done with MPIch.  The folks at this company looked into the
> matter and independently reached the same conclustion.
> 
> As I see it, with MPIch, I must have everything in a single program.  Yes,
> the programs can execite readically different pathways so in most cases,
> this doesn't restrict the algorithms I can work with.  There are other
> engineering issues, however, that sometimes force  me to run completely
> different programs that cooperate through MPI.  This is the situation the
> un-named software vendor is faced with.
> 
> So that is is what I meant by my general Multiple Program Multiple Data (or
> MPMD) comment.
> 
> Does this make sense?  Did I get something wrong and it is indeed the case
> that MPIch can handle this situation?  I'd love to be wrong as it would make
> my life simpler if I could do everything with only one version of MPI.
> 
> --Tim  
> 
> -----Original Message-----
> From: Rusty Lusk [mailto:lusk at mcs.anl.gov]
> Sent: Thursday, March 29, 2001 11:15 AM
> To: Mattson, Timothy G
> Cc: 'Chris Richard Adams'; Beowulf (E-mail)
> Subject: Re: MPI/Beowulf vs. SM Programming/OpenMP 
> 
> 
> | A general MPMD algorithm with lots of assynchronous events would be hard
> to
> | do with OpenMP.  (actually, it can be hard with MPIch as well, but then
> you
> | can go with MPI-LAM or PVM).
> 
> Hi Tim,
> 
> I am curious as to what you are referring to with regard to MPICH.  
> 
> Rusty
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Craig Tierney (ctierney at hpti.com)
phone: 303-497-3112


From chrisa at ASPATECH.COM.BR  Fri Mar 30 13:06:47 2001
From: chrisa at ASPATECH.COM.BR (Chris Richard Adams)
Date: Fri, 30 Mar 2001 18:06:47 -0300
Subject: Installing module 3c90x.o returns: init_module: Device busy
Message-ID: <BD674CDC0A68C642B60038986FBB56DF0138FD@mailsrv.ASPATECH.COM.BR>

OK...I guess I should upgrade anyway.  I downloaded the RPMS from the
site for the 27bz-7 version.  I assume I just install all those packages
and I am officially upgraded?

At that point I could use the 3c59x drivers for my 3c905C-TX card?

advice appreciated!
Chris

> -----Original Message-----
> From: Jag [mailto:agrajag at linuxpower.org]
> Sent: Friday, March 30, 2001 10:49 AM
> To: Chris Richard Adams
> Cc: Beowulf (E-mail)
> Subject: Re: Installing module 3c90x.o returns: init_module: 
> Device busy
> 
> 
> On Thu, 29 Mar 2001, Chris Richard Adams wrote:
> 
> > Hi all;
> > 
> > I need to create the module for the 3c905C-TX card.  I got 
> the source
> > code from the Scyld site and followed the directions explicitly.  I
> > compiled with only a few warnings.  When I try to install the module
> > with 'insmod 3c90x.o' I get the error: 3c90x.o init_module: 
> Device or
> > Resource busy.
> > 
> > I am running the Beowulf 2 prerelease Scyld version 2.2-16. 
>  I did not
> > see this driver installed to begin with, hence this is why 
> I am building
> > it.  
> 
> I had similar problem when I tried the prerelease.  However, when I
> tried the actual release (27bz-7), this problem went away.  With the
> 27bz-7 release it used the 3c59x driver for my 3c905C-TX ethernet card
> and it works just fine.
> 
> 
> Jag
> 


From rbbrigh at valeria.mp.sandia.gov  Fri Mar 30 13:40:28 2001
From: rbbrigh at valeria.mp.sandia.gov (Ron Brightwell)
Date: Fri, 30 Mar 2001 14:40:28 -0700 (MST)
Subject: MPI/Beowulf vs. SM Programming/OpenMP
In-Reply-To: <F70F37F77E9FD211AC3F00A0C96B78DA03F8CDD6@orsmsx47.jf.intel.com>
 from "Mattson, Timothy G" at Mar 30, 2001 11:44:57 AM
Message-ID: <200103302141.OAA19907@dogbert.mp.sandia.gov>

> 
> I guess my post wasn't clear enough.  
> 
> I am working with a software vendor.  They need to launch several different
> programs that interact through MPI.  You can do this in PVM, you can do this
> with LAM/MPI, but you can't do this with MPIch.  Or I should say, I don't
> think it can be done with MPIch.  The folks at this company looked into the
> matter and independently reached the same conclustion.
> 
> As I see it, with MPIch, I must have everything in a single program.  Yes,
> the programs can execite readically different pathways so in most cases,
> this doesn't restrict the algorithms I can work with.  There are other
> engineering issues, however, that sometimes force  me to run completely
> different programs that cooperate through MPI.  This is the situation the
> un-named software vendor is faced with.
> 
> So that is is what I meant by my general Multiple Program Multiple Data (or
> MPMD) comment.
> 
> Does this make sense?  Did I get something wrong and it is indeed the case
> that MPIch can handle this situation?  I'd love to be wrong as it would make
> my life simpler if I could do everything with only one version of MPI.
>

I think there's a mixture of misunderstanding in what MPICH can do and in
what MPMD means.

MPICH supports the launching of several different binaries as a single MPI
job, i.e. all the processes are in the same MPI_COMM_WORLD.  For lack of a
better term, I've called this mode static MPMD, since the number of binaries
and the number of processes are fixed throughout the life of the parallel job.

MPICH doesn't yet support the MPI-2 style client/server operations that
allow independent parallel jobs to join, which is what LAM and PVM support.
I've called this dynamic MPMD since both the number of processes and number
of binaries can change throughout the life of the parallel job.

If there are better terms to describe the combination of these two things
(multiple binaries and joining independent parallel jobs), let me know.
I've had similar confusing discussions with users about which functionality
they wanted.

-Ron


From gscluster at hotmail.com  Sat Mar 31 15:46:52 2001
From: gscluster at hotmail.com (Georgia Southern Beowulf Cluster Project)
Date: Sat, 31 Mar 2001 18:46:52 -0500
Subject: Beowulf Trivia Question.
Message-ID: <F128IeSNLdcIlBUDrM50000f7b8@hotmail.com>

Hello all,

I'm part of a team completing the first Beowulf-style cluster available in 
our area and I've a question that everyone asks?  "Where does the name 
Beowulf originate?"  Essentially why did the NASA and Scyld guys call their 
clustering technique/philosophy "Beowulf"?  Is it just a fancy, 
cool-sounding name, or is there a deeper meaning, possibly metaphor?  Hope 
someone knows because I get asked this about every 5 minutes on some 
occassions and I don't have much of an answer.

Thanks,

Wes Wells
Georgia Southern Beowulf cluster
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com


From Nordwall at pnl.gov  Fri Mar 30 12:51:37 2001
From: Nordwall at pnl.gov (Nordwall, Douglas J)
Date: Fri, 30 Mar 2001 12:51:37 -0800
Subject: VA Linux's system imager
Message-ID: <3A2AF12EC5C8AE45B4EB90379D1235B98B6C4D@pnlmse04.pnl.gov>

I actually use system imager for our client linux machines here at PNL. Not bad
at all. For our clusters, we've been experimenting with npaci rocks. I've been
impressed so far

-----Original Message-----
From: JParker at coinstar.com [mailto:JParker at coinstar.com]
Sent: Friday, March 30, 2001 11:33 AM
To: beowulf at beowulf.org
Subject: VA Linux's system imager


G'Day ! 

Has anyone used this ?  Any comments ?  Seems like a good way to keep a cluster
updated. 

http://systemimager.sourceforge.net/

cheers,
Jim Parker

Sailboat racing is not a matter of life and death ....  It is far more important
than that !!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010330/afe0429d/attachment.html>

From kodym at math.jyu.fi  Fri Mar 30 02:08:12 2001
From: kodym at math.jyu.fi (Petr Ladislav Kodym)
Date: Fri, 30 Mar 2001 13:08:12 +0300 (EEST)
Subject: How to use a beowulf class materials?
In-Reply-To: <20010329161024.A1623@sphere.math.ucdavis.edu>
Message-ID: <Pine.LNX.4.10.10103301259020.1067-100000@btx.math.jyu.fi>

	Hi,

>Anyone have suggestions for textbooks?  Know of existing class
>outlines available on the web?  Any other pointers?
>
>I'm considering: Parallel Programming With MPI by Peter Pacheco to
>cover the MPI part,

Have a look at MPI course material from EPCC:

http://www.epcc.ed.ac.uk/epcc-tec/documents/psform-sn-mpi.html


> but probably need an additional book on parallel algorithms and programming.

What about this one:

Introduction to Parallel Computing : Design and Analysis of Parallel Algorithms
                         by Vipin Kumar, Ananth Grama, Anshul Gupta, George
                         Karypis

http://www.amazon.com/exec/obidos/ASIN/0805331700/qid%3D985946846/107-7176547-6865350


	Petr


From rockwlrs at cs.byu.edu  Thu Mar 29 10:22:21 2001
From: rockwlrs at cs.byu.edu (Nathan C Summers)
Date: Thu, 29 Mar 2001 11:22:21 -0700 (MST)
Subject: SMP support with the scyld package
In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC47F5@a1mbx01.pharma.com>
Message-ID: <Pine.LNX.4.30.0103291112240.14469-100000@cool.cs.byu.edu>

On Thu, 29 Mar 2001, Carpenter, Dean wrote:

> Hummm.  List is being very quiet these days.  Has anyone had luck/experience
> with the stuff I posted earlier ?
>
> ---
>
> Given a raw Scyld install, can anyone show a cookbook sequence to converting
> the master node as well as the slave nodes to a custom kernel ?  Not just
> the SMP kernel from the rpm - we'll want to play with MOSIX, as well as the
> 2.2.19 kernel.
>
> 1. Install x and y packages to solve problem #2 above so we can actually
> build kernels.
> 2. Pull down virgin kernel sources and untar into /usr/src/linux
> 3. Integrate the bproc patches .....  (how - what else ?)

Get the bproc patches from the source RPM.

> 4. Integrate MOSIX (and any other cool) patches

The MOSIX and bproc patches conflict like crazy.  After spending days
trying to get them to play nice together, I realized that I had better
things to do with my time.

It seems that MOSIX and bproc want to do almost the exact same thing in
slightly different ways in the same parts of the code.  A unified
interface should be possible (I'd use the bproc approach when resolving
the conflicts, since it seems to be cleaner.)  Unfortunately, suggesting
this union would revive the Eternal Flamewar of Process Migration, since
both camps think that the slight differences in thier code form the One
True Way to Migrate Processes.  Sigh, I hate politics...

> 5. Build the kernel and modules (any special considerations ?)

Pretty much any kernel should work on the master image (with the bproc
stuff, of course.)  I'd recommend the same image on the slaves that you
use on the master, so your selection should be based on the hardware
requirements.  It's pretty easy if your cluster is completely homogeneous,
more complicated if not.

> 6. Build the custom phase 2 boot image
> 7. Rock and roll :)

Rockwalrus


From bal at morelinux.com  Wed Mar 28 21:12:05 2001
From: bal at morelinux.com (bal)
Date: Thu, 29 Mar 2001 10:42:05 +0530
Subject: mpptest report errors on linux 
Message-ID: <3AC2C425.53F5D87F@morelinux.com>

hi everybody

I am trying to build a four node linux cluster. All machines are PIII
500 Mhz 
with 10/100 rtl 8139 ethrnet card connected using a 100Mbps switch.
mpich-1.2.1 is installed from source. mpptest reports following problem

./runmpptest -long -blocking -bisect  -fname long-blocking-bisect 
-gnuplot -np
Bisection tests-blocking
Exceeded 900.000000 seconds, aborting
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_1458:  p4_error: : 1
bm_list_1459:  p4_error: interrupt SIGINT: 2
rm_l_2_21356:  p4_error: interrupt SIGINT: 2
rm_l_3_12637:  p4_error: interrupt SIGINT: 2
rm_l_1_14220:  p4_error: interrupt SIGINT: 2
p3_12636:  p4_error: interrupt SIGINT: 2
p2_21355:  p4_error: interrupt SIGINT: 2
p1_14219:  p4_error: interrupt SIGINT: 2
/usr/local/beowulf/mpich-1.2.1/bin/mpirun: line 1:  1458 Broken pipe

I have tested it on kernel 2.2.18 with mosix and without mosix, also on
kernel 2.2.17
Do I have to apply some patches to kernel?

It seems the problem is releted with increased message length.
Problem even appears when using option short in place of long for some
tests.

Thanks
bal at morelinux.com


From sita_krish at hotmail.com  Wed Mar 28 07:54:09 2001
From: sita_krish at hotmail.com (Krishna Prasad)
Date: Wed, 28 Mar 2001 21:24:09 +0530
Subject: help please
Message-ID: <F122b6KZmcK8r6beUzm0001424a@hotmail.com>

anyone out there,

     Can you give me some examples of parallel processing applications that 
can be done ? (like ray tracing that has been done using pvm).
     I want the applications to be done by me.  So please give some real 
time programs that can be speeded up using the distributed processing 
technique.
     If possible please give me some web sites that have the information or 
source code about the parallel processing applications.

    Reply to my email address: sita_krish at hotmail.com

  THANKS,

yours sincerely,
krishna prasad.
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.


From Hans_Skagerlind at Dell.com  Tue Mar 27 04:39:05 2001
From: Hans_Skagerlind at Dell.com (Hans_Skagerlind at Dell.com)
Date: Tue, 27 Mar 2001 06:39:05 -0600
Subject: Linux on Xeon
Message-ID: <DF9C91433315D4119571009027D611840113C2F9@uppxmbl101.se.dell.com>

Does anyone know where to find info about comparing/benchmarking Linux on
Xeon vs Pentium III processors?

Thanks for advice!


 <<...OLE_Obj...>> 
Hans Skagerlind  	         
Technical Sales Representative	
Advanced Systems Group
Tel: 	 +46 08 590 05 462
Fax:	 +46 08 590 05 599
E-mail:  	 hans_skagerlind at dell.com


From bgbruce at it-curacao.com  Sat Mar 24 07:27:10 2001
From: bgbruce at it-curacao.com (B.G. Bruce)
Date: Sat, 24 Mar 2001 11:27:10 -0400
Subject: 2Gb Fibre channel ICS
Message-ID: <01032411412804.01426@portal1.it-curacao.com>

Has anyone looked at the performance-viability of small scale clusters (less
than 16 nodes) utilizing QLogic's 230x adapters and their SanBOX2 switches? 
We are looking to build a cluster that primarily runs serialized apps, or VIA
aware databases, which would benefit more from FC than say NFS/CODA servers
running over myrinet ?that had a FC backend, however we may look at
parallelizing some of the more compute intensive app servers.  Thoughts anyone?

Regards,
Brian.


From javier.iglesias at freesurf.ch  Sat Mar 24 20:01:13 2001
From: javier.iglesias at freesurf.ch (Javier Iglesias)
Date: Sun, 25 Mar 2001 05:01:13 +0100
Subject: NFS
Message-ID: <985489273.webexpressdV3.1.f@smtp.freesurf.ch>

Hi,

I'm planning a cluster for the computer science institut I'm 
working for, then collecting as much information on beowulf as 
available, and first-time poster to this list. 

It looks like using NFS is a great administrator time saver, 
but comes with a high network finger print.

Did someone tried to use 2 NICs per node configured so that :
1/ first is dedicated to interprocessor communication, while 
2/ second is used for NFS, DB calls, ...

Is this idea already known as bad/good one ?

--javier

--
Enjoy more time at less cost with sunrise freetime
http://go.sunrise.ch/de/sel/default.asp


From quarshie at mail.eecis.udel.edu  Fri Mar 23 08:25:05 2001
From: quarshie at mail.eecis.udel.edu (Rene Quarshie)
Date: Fri, 23 Mar 2001 11:25:05 -0500 (EST)
Subject: Beowulf digest, Vol 1 #332 - 20 msgs
In-Reply-To: <200103231608.LAA02131@blueraja.scyld.com>
Message-ID: <Pine.GSO.4.31.0103231121190.19564-100000@cel4.ece.udel.edu>

hi;  installation frol scyld cd:
i iust installed a beowulf cluster and i; learning how torun mpi on the
cluster...my problem is , anytime i compile using "cc -lmpi cpi.3 -o cpi"
it works fine, but i tried using "f77 -lmpif -o pi3.f pi3"
i get this error:
pi3.f: In program `main':
pi3.f:22:
         include 'mpif.h'
         ^
Unable to open INCLUDE file `mpif.h' at (^)

i know that mpif.h is in the include dir...i just dont know whats going
on...

last question:
how do you run an exec file on a beowulf if scyld cd was used to build the
cluster...
any help grtly appr

rene