[Beowulf] Multirail Clusters: need comments
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caMon Dec 5 06:41:51 PST 2005
- Previous message: [Beowulf] Multirail Clusters: need comments
- Next message: [Beowulf] Multirail Clusters: need comments
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > so you're talking about 6 GB/s over quad-rail. it's hard to > > imagine what you would do with that kind of bandwidth > > This is a very bold assertion... well, 6 GB/s does seem like a lot, even for a 16-core machine that's barely practical today. it's also a fat enough node to make you really want topology-aware MPI. > > only you can answer this. my experience is that very few applications > > need anything like that much bandwidth - I don't think anyone ever saturated > > 1x quadrics on our alpha/elan3 clusters (~300 MB/s), and even on > > opteron+myri-d, latency seems like more of an issue. > > Your alpha/elan3 clusters would have been quad CPU machines. right, so crudely speaking, we can characterize it as 50 second (4G ram/300) and 22 flops/byte (833*2*4 mflops/300). > The one point you have missed however with multi-rail is that network > bandwidth is per *node* whereas number of CPU's per node is for the no, that's obvious. > large part increasing. 1Gb/s seems like a lot (or at least it did) but > put it in a 16 CPU machine and all of a sudden you have *less* per CPU > bandwidth than you had seven years ago in your alpha/elan3. Couple that 5 years :( well, today 16x is a bit exotic; I think we can agree that 2x2 is probably the norm. so a single infinipath link in a 2x2 (say 2.2 DC GHz opteron, with 16GB). that leads to 10 seconds, but 11 flops/byte - a different balance for sure, but how wrong? > with CPU's being n times faster to boot and all of a sudden multi-rail > is starting to less pie-in-the-sky and more look like a good idea. are cpus or nodes getting faster faster than interconnects? donno - 5 years ago, it 4x alphas were a pretty sane choice mainly for lack of attractive alternatives. it's just my perception, but I think there might actually be _less_ variance now in cores/node, with 2x2 being the most common and cost-effective configuration. 4-socket seems to not be getting all that much traction, though no doubt 4-core chips will change the core/node average in a couple years. > appear to hold true, for example given a 16*16 machine average bandwidth > between two CPUS won't quite double as you double the number of rails > because 15/256 ranks are local to any given process so will get linear > bandwidth independent of the number of rails. This however is simply a > matter of understanding the topology of the machine. Another odd case and indeed of your jobs. would a bandwidth-intensive program actually run on all 256 nodes, or would it tend to settle for 16x or 32x runs? (in the former case, number of rails might be completely moot!) in summary, I suspect that a "balance" based argument (probably flops/byte) makes sense, but it's not quite clear how much cpus are outpacing interconnect bandwidth. naturally, every application falls at a different place on this particular metric... regards, mark hahn.
- Previous message: [Beowulf] Multirail Clusters: need comments
- Next message: [Beowulf] Multirail Clusters: need comments
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
