Advice for 2nd cluster installation

Fri Jan 10 08:41:08 PST 2003

On Thu, 2003-01-09 at 21:33, Kwan Wing Keung wrote:

>     The question is: will there be a timing difference in case a processor
>     in the 3rd blade, insider the 3rd chassis, is trying to communicate
>     (through MPI) to a processor in another blade within the SAME chassis,
>     as compared to another processor in a blade housed within ANOTHER
>     chassis.

Yes.  You are increasing the number of switch hops by at least 1.  There
will be a timing difference, unless you get a blade system which uses a
secondary network on each blade to connect directly to the same switch
(such as a Myrinet, or an "Infiniband" unit).

>     A sales from a vendor answered me that there should be some difference,
>     as the communication within a blader centre will go through the
>     back-plane.  Once it goes out from the blade centre, the communication
>     has to go through an inter-chassis switch, thereby should have some
>     timing difference.  He further told me that it is the beauty of

That is substantially correct.

>     the "infiniteband" which I don't have any experience.

Infiniband.  Not many do have this experience yet.

>     However another sales answered me that they should be the same,
>     because all processors within the entire "rack" should have distinct
>     IP addresses, and the communication between any 2 processors should
>     be fair and equal.

Sounds like a sales person who didn't have their techie on the line with
them.  They are wrong.

In one chassis, using the supplied networks on the blades, you have an
internal 10/100 or (preferably)  GigE switch.  Between chassis, you
should have a GigE switch. 

Look at it as a hierarchy.  Draw a blade.  Put a circle around the
blade.  Draw the 7u rack mount around the blade, put a circle around
that.  Your communication path crosses circles, which represent
boundaries, and hence latency and bandwidth limitations.  

The circles represent network/communications bus fabric boundaries. 
Boundary crossings cost time and bandwidth.  You are drawing a logical
picture of the network.

If you have a different network fabric than the internal 7u switch
network, draw your blades residing in that switch (large circle).  With
Myrinet or its ilk (Quadrics, DolphinICS, et al) the latter picture
emerges (star network up to a point, then structure beyond that as node
counts exceed switch capacity).  There you have nearly the same
latency/bandwidth between every pair of ports, so it doesnt matter which
blade center your unit is in.  You cross the boundary effectively twice
(going into and out of a blade).

With the internal switches in use rather than the Myrinet type devices,
you cross 2 sets of boundaries to go out of a blade, and 2 sets to go
in.  4 effective hops versus 2.

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615