[Beowulf] How to configure a cluster network
andrew at moonet.co.uk
Thu Jul 24 11:17:38 PDT 2008
:) me and jan work together at ClusterVision.
On Thu, Jul 24, 2008 at 7:14 PM, Jan Heichler <jan.heichler at gmx.net> wrote:
> Hallo Daniel,
> Donnerstag, 24. Juli 2008, meintest Du:
> [network configurations]
> I have to say i am not sure that all the configs you sketched really work. I
> never saw somebody creating loops in an IB fabric.
> DP> Since I am not network expert I would be glad if somebody explains
> DP> why the first solution is the best one.
> Let's say it as follows:
> 1) most applications are latency driven - not bandwidth driven. That means
> that half bisectional bandwidth is not cutting your application performance
> down to 50%. For most applications the impact should be less than 5% - for
> some it is really 0%.
> 2) Static routing in IB networks limits your bandwidth for many of the
> possible communication patterns anyway. For completely random communication
> it was like below 50%. So you buy a IB fabric with full bisectional but
> can't use it anyway - reducing the bisectional bandwidth is not impacting
> that much anymore (as far as i understood most whitepapers)
> 3) today you have usually 4 or 8 cores in one node. 12 nodes times 4/8 cores
> makes 48 or 92 cores that are connected with one HOP on the same switch.
> Many applications don't scale to that number of processes anyway. Before you
> try to think about optimizing the network to the maximum maybe it is better
> to think about your application, your ususal job sizes and the scheduling of
> the jobs. Try to avoid "cross switch communication" if possible. If you run
> small jobs like let's say of 8 nodes and you have 12 nodes on each switch
> and half bisectional bandwidth between them then it is 8 nodes on the first
> switch for job 1. For job 2 it is 4 nodes on switch one and 4 on switch two.
> Your bisectional bandwidth is big enough to handle this.
> I vote for the fat tree in picture one because i know it works and with 1)
> to 3) mentioned above it will give you good performance - especially if you
> run more than just one application (because optimizing is mostly optimizing
> for a single use case - if you have more than one it is hard to optimize).
More information about the Beowulf