Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] How to configure a cluster network

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

andrew holway andrew at moonet.co.uk
Thu Jul 24 11:17:38 PDT 2008


:) me and jan work together at ClusterVision.

On Thu, Jul 24, 2008 at 7:14 PM, Jan Heichler <jan.heichler at gmx.net> wrote:
> Hallo Daniel,
>
> Donnerstag, 24. Juli 2008, meintest Du:
>
> [network configurations]
>
> I have to say i am not sure that all the configs you sketched really work. I
> never saw somebody creating loops in an IB fabric.
>
> DP> Since I am not network expert I would be glad if somebody explains
>
> DP> why the first solution is the best one.
>
> Let's say it as follows:
>
> 1) most applications are latency driven - not bandwidth driven. That means
> that half bisectional bandwidth is not cutting your application performance
> down to 50%. For most applications the impact should be less than 5% - for
> some it is really 0%.
>
> 2) Static routing in IB networks limits your bandwidth for many of the
> possible communication patterns anyway. For completely random communication
> it was like below 50%. So you buy a IB fabric with full bisectional but
> can't use it anyway - reducing the bisectional bandwidth is not impacting
> that much anymore (as far as i understood most whitepapers)
>
> 3) today you have usually 4 or 8 cores in one node. 12 nodes times 4/8 cores
> makes 48 or 92 cores that are connected with one HOP on the same switch.
> Many applications don't scale to that number of processes anyway. Before you
> try to think about optimizing the network to the maximum maybe it is better
> to think about your application, your ususal job sizes and the scheduling of
> the jobs. Try to avoid "cross switch communication" if possible. If you run
> small jobs like let's say of 8 nodes and you have 12 nodes on each switch
> and half bisectional bandwidth between them then it is 8 nodes on the first
> switch for job 1. For job 2 it is 4 nodes on switch one and 4 on switch two.
> Your bisectional bandwidth is big enough to handle this.
>
> I vote for the fat tree in picture one because i know it works and with 1)
> to 3) mentioned above it will give you good performance - especially if you
> run more than just one application (because optimizing is mostly optimizing
> for a single use case - if you have more than one it is hard to optimize).
>
> Regards,
>
> Jan
>



More information about the Beowulf mailing list