[Beowulf] 1.2 us IB latency?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduWed Mar 28 13:04:37 PDT 2007
- Previous message: [Beowulf] 1.2 us IB latency?
- Next message: [Beowulf] 1.2 us IB latency?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Peter Kjellstrom wrote: > On Wednesday 28 March 2007, Mark Hahn wrote: >>>> start timer >>>> send(other,small-message) recv(first,small-message) >>>> recv(other,small-message) send(first,small-message) >>>> stop timer >>>> >>>> I'll actually see 2.4 us between the timer calls? if I understand, >>>> aggregation would only help on a streaming test. in fact, this kind >>>> of isolated RPC-like exchange is what I see most commonly. >>> Assuming you could time it with any accuracy, yes. >> that's not an issue - rdtsc is perfectly good into the tens of ns range. > > I'll have to hack together a rdtsc based mpi microbenchmark some day it seems > =) Might I suggest just passing a MPI_INT back and forth and decrementing it each time to insure that the message makes it all the way to userspace before heading back to the other node? Seems like it would allow for easier timing (with gettimeofday) and also take into account various real world effects like interrupts and schedule effects. I guess it depends if you want marketing numbers or real world numbers ;-). Additionally you might want to do this in parallel, after all few clusters let their communication layer sit idle while a single pair of nodes communicate. Additionally you might want all possible pairs to communicate to see what effect locality has, this might be especially useful for comparing interconnect layers with various fractions of backplane bandwidth and differing methods for handling contention. I've written a code that does the above, it's still somewhat raw, I've yet to add some sanity checking and command line options to avoid recompiling, both high on my todo list. I do have some data from an infinipath cluster nto post, this data set is for 4 processors per node on 64 nodes (or a 177 node cluster) each node has a single port on a 288 port IB switch: http://cse.ucdavis.edu/~bill/n64p256/band_results.txt http://cse.ucdavis.edu/~bill/n64p256/lat_results.txt They should be easy to visualize, I personally use gnuplot' splot "filename" matrix. To see 32 node, with a single process per node numbers just replace the directory name above with "n32p1". > Sorry for being unclear here. What I wanted to say was that, unrelated to 1.5 > us ping-pong on mpi I have also observed verbs level latency (ib_write_lat) > of around 1 us. And that figure is not affected by any mvapich trickery :-). Good to hear, I'll hailly source if you (or anyone else) is willing to run my benchmark.
- Previous message: [Beowulf] 1.2 us IB latency?
- Next message: [Beowulf] 1.2 us IB latency?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
