[Beowulf] fast interconnects, HT 3.0 ...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Richard Walsh rbw at ahpcrc.orgTue May 23 10:14:08 PDT 2006
- Previous message: [Beowulf] fast interconnects, HT 3.0 ...
- Next message: [Beowulf] fast interconnects, HT 3.0 ...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Eugen Leitl wrote: > On Tue, May 23, 2006 at 11:04:52AM -0500, Richard Walsh wrote: > > >>> I don't know (like it would stop me), but there are few-port HT switches, >>> and if there are several ports on one chassis one could wire up >>> some topology, which, hopefully, will match the problem. >>> >>> >>> >> HT switches ... ?? ... can you point me to a reference? >> > > Google knows of several, and even some hotplug connector specs. > See also http://www.commsdesign.com/design_corner/showArticle.jhtml?articleID=16503595 > for a review. > > Thanks. >>> I'm not sure how this is different from vanilla packet-switched >>> MPI network. It's not about maintaining memory coherency. >>> >>> >> Well, of course you can run MPI over it as you can on the Cray and >> Altix, but >> you are artificially separating memory in software that is in fact >> closer in hardware. >> > > If I have some 10^3 nodes, and the context is not read-only > I always have to wait to make sure nobody is trying to write to > the same location. It's a worst case, but in a relativistic universe > maintaining the illusion of coherence over many copies is an > expensive one. Lots of signalling back and forth, until you > know the state is settled for sure. This might work for 8, 16, maybe 32 systems > in a close enough location -- but with 10^3 or 10^6 nodes it > has to give. > Mmm ... I do not think we are connecting. Off board non-coherence is managed by the application and is made possible in part by pGAS syntax in UPC/CAF. We have some very novel, fine grained UPC CFD codes running on the Cray X1 which do indexed adaptive mesh regeneration to model the flapping wings of a model humming bird to follow its shedding vortices. Performance is good and we manage the off board incoherence/synchrony nicely. It would have been almost impossible to write in MPI and its performance would be poor. The application has reasonably good scaling properties as it is. It even runs on our cluster ... yes in UPC ... (albeit much more slowly). it is has some data locality (not GUPS like) but the remeshing approach is fine grained ... the "messages" are direct remote memory puts and gets driven my vector instructions. HT 3.0 is presumbly more elaborate than the CRAY X1 ISA, but can provide similar, more direct, off-chassis, non-coherent memory addressing, No? This is in tune with the UPC and CAF programming models. >> That is where the pGAS programming models become more efficient. Remote >> memory references expressed in the syntax and compiled to >> instructions for >> direct puts and gets without management or translation by a NIC. It >> > > We're talking lunatic fringe interconnects where the wire or the fibre > is your FIFO, and the switch makes a routing decision after a few bits > of the headers have streamed past -- which is reasonably close to c. > With 10 GBit data rates and above that's a quick decision to take. > At 10 GBit/s your serial bit is just ~3 cm or 100 ps short -- in vacuum. > Shorter in glass, and much shorter in copper. So a very short message > can arrive within a few ns, which is order of magnitude RAM access. > I am talking about improving on the ~1500 nanos required by the best of today's interconnects for a single, remote 8-byte reference, and perhaps further hiding that reduced latency in a pipelined vector load operation inside the pipe. The question was: What can HT 3.0 provide non-coherently, off board in this regard? Maybe the answer is nothing ... but I have not heard it cogently argued yet. > >> would seem >> that HT 3.0 supports this model across chassis as long as the >> programmer manages >> memory synchronization. >> > > You have to bite the bullet and manage synchronization by higher-order > protocols. The physical world at the bottom is fundamentally message-passing. > You might notice it very much if you're working on us scale, but > in ns and below it you can't ignore it. > OK ... everything is a message ... even a Cray X1 vector write, but I am comparing MPI messages with something much smaller and more primitive. > >>>> Sounds like the Cray X1E pGAS memory model. Is there a role for >>>> >>>> >>> I don't think there is any other model but message passing. It's not >>> like this is a ccHT a la HORUS >>> http://en.wikipedia.org/wiki/HORUS_interconnect >>> >>> >> The inter-chassis, but non-coherent interface that HT 3.0 supports >> would seem to work >> very nicely with UPC and CAF. They run very well on the Cray X1, >> which provides >> coherent memory on-board only as well. >> > > > -- Richard B. Walsh Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw at ahpcrc.org | 612.337.3467 ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. -----------------------------------------------------------------------
- Previous message: [Beowulf] fast interconnects, HT 3.0 ...
- Next message: [Beowulf] fast interconnects, HT 3.0 ...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
