[Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement

Mark Hahn hahn at mcmaster.ca
Fri Feb 9 15:21:15 PST 2007


>>> On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and ib-verbs,
>>> from the source repository) and is clearly way faster than NFS.
>>
>> "clearly"s like that make me nervous.  to an IB enthusiast, SDP may be
>> more aesthetically pleasing, but why do you think IPoIB should be noticably
>> slower than SDP?  lower cpu overhead, probably, but many people have no
>> problem running IP at wirespeed on IB/10GE-speed wires...
>
> As I understand it, one reason why SDP is faster than IPoIB is that the
> way IPoIB is currently spec'ed requires there be an extra copy relative
> to SDP.

that's what I meant by "cpu overhead".  but the point is that current
CPUs have 10-20 GB/s of memory bandwidth hanging around, so it's not 
necessarily much of a win to avoid a copy.  even in olden days, 
it was common to show some workloads where hosts doing TCP checksumming 
actually _benefited_ performance by populating the cache.

> It is also specced with a smaller MTU, which makes a fair
> difference.  I believe there is movement afoot to change the spec to
> allow for a larger MTU, but I'm not an IB expert and don't follow it
> religiously.

MTU is another one of those things that got a rep for importance,
but which is really only true in certain circumstances.  bigger MTU
reduces the per-packet overhead.  by squinting at the table in question, 
it appears to show ~300 MB/s on a single node.  with 8k packets, that's
~40K pps, vs ~5k pps for 64k MTU.  seems like a big win, right?  well,
except why assume each packet requires an interrupt?

reducing the overhead, whether through fewer copies or bigger MTUs
is certainly a good thing.  these days, neither is necessarily essential
unless you're really, really pushing the limits.  there are only a few
people in the universe (such as Cern, or perhaps the big telescopes) 
who genuinely have those kinds of data rates.  we're a pretty typical 
supercomputing center, I think, and see only quite short bursts into 
the GB/s range (aggregate, quadrics+lustre).

I'm genuinely curious: do you (anyone) have applications which sustain
many GB/s either IPC or IO?

regards, mark hahn.



More information about the Beowulf mailing list