[Beowulf] Cluster consistency checks

Michael Di Domenico mdidomenico4 at gmail.com
Tue Mar 29 06:10:59 PDT 2016


On Tue, Mar 29, 2016 at 9:01 AM, Olli-Pekka Lehto
<olli-pekka.lehto at csc.fi> wrote:
>>>>> - Simple MPI latency / bandwidth test called mpisweep that tests every
>>>>> link (I'll put this up on github later as well)
>>>
>>> Any reference to mpisweep yet?
>>>
>>> Google didn't give me much...
>>>
>>
>> That's an internal code I whipped up at some point. Pretty much the minimum
>> viable program to do a sweep of all the connections. I'll try to clean it up a
>> bit and put it up in the next few days.
>
> I put it now up on github. Very simple and short :)
>
> https://github.com/CSC-IT-Center-for-Science/mpisweep

as a programming exercise it would be handy to extend this to doing
around the world ping tests.  where by i mean 1 xfers to all, 2 xfers
to all, 3 xfers to all

i recently found a misbehaving IB card inside a multi-card switch
using this approach.   it was a very odd result where by only certain
pairings of hosts in certain directions were failing with slow latency
and bandwidth


More information about the Beowulf mailing list