[Beowulf] Broadcast - not for HPC - or is it?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Matt Hurd matthurd at acm.orgTue Oct 5 17:23:36 PDT 2010
- Previous message: [Beowulf] Broadcast - not for HPC - or is it?
- Next message: [Beowulf] Broadcast - not for HPC - or is it?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> From your description as well as from a quick look at the website, it > looks and smells like a hub - I mean a dumb hub, like those which > existed in the '90s before switching hubs (now called switches) took > over. If so, then HPC might not be a good target for you, as it has > long ago adopted switches for good reasons. Not as clever as a hub, as a hub goes from any one of N to any one or all of N with collision sense/detect relying on back off. This thing just goes from port A to port B1 ... port Bn using a simple optical coupler in the core. No contention as the paths are direct. I can't see it being too useful for HPC myself but I guess as Kevin pointed out perhaps there is a corner case or two. It does allow one of the B ports to be bi-directional so that a trader could set up a subsciption to a multicast group to be used by all ports. However, allowing no client to server is a security benefit and I guess if an exchange used such a thing they should just broadcast or some such and disable the bi-direction. - Hide quoted text - >> Primarily focused on low-latency >> distribution of market data to multiple users as the port to port > > HPC usage is a mixture of point-to-point and collective > communications; most (all?) MPI library use low level point-to-point > communications to achieve collective ones over Ethernet.. Another > important point is that the collective communications can be started > by any of the nodes - it's not one particular node which generates > data and then spreads it to the others; it's also relatively common > that 2 or more nodes reach the point of collective communication at > the same time, leading to a higher load on the interconnect, maybe > congestion. > > What might be worth a try is a mixed network config where > point-to-point communications go through one NIC connected to a switch > and the collective communications that can use a broadcast go through > another NIC connected to your packet replicator. However, IMHO it > would only make sense if the packet replicator makes some guarantees > about delivery: f.e. that it would accept a packet from node B even if > a packet from node A is being broadcasted at that time; this packet > from node B would be broadcasted immediately after the previous > transmission has finished. This of course means that each link > NIC-packet replicator needs to be duplex and some buffering should be > present - this was not the case of the dumb hubs mentioned earlier. I > think that such a setup would be enough for MPI_Barrier and MPI_Bcast. > > One other HPC related application that comes to my mind is distributed > storage. One of the main problems is keeping redundant metadata to > prevent the whole storage going down if one of the metadata servers > goes down. With such a packet replicator, the active metadata server > can broadcast it to the others; this would be just one operation - > with a switched architecture, this would require N-1 operations (N > being the total nr. of metadata servers) and would loose any pretence > of atomicity and speed. Not a bad thought the storage thought, but again I reckon that a sub micro switch would be a winner there on the functionality front. Switches, like the Fulcrum based ones, are pretty impressive and not too expensive. Along those lines, it's not a HPC app, at least in my head, but replication has uses for being able to do small fault tolerant quorums with microsecond oriented failover. >> They suggested interest in bigger port counts and mentioned >1000 ports. > > Hmmm, if it's only like a dumb hub (no duplex, no buffering), then I > have a hard time imagining how it would work at these port counts - > the number of collisions would be huge... Nope, not a dumb hub, even dumber ;-) No collisions just a tree of optical couplers frantically splitting the photon streams. The only real trick, albeit pretty minor, is ensuring the signal integrity is within budget and suitable for non-thinking plug and play. Regards, --Matt.
- Previous message: [Beowulf] Broadcast - not for HPC - or is it?
- Next message: [Beowulf] Broadcast - not for HPC - or is it?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
