[Beowulf] IPoIB failure

Chris Samuel samuel at unimelb.edu.au
Wed Jan 28 03:06:11 PST 2015


On Wed, 28 Jan 2015 11:51:16 AM Peter Kjellström wrote:

> The problem is most easily demonstrated by restarting the SM and then
> bringing up new ipoib interfaces on 6.6 hosts. This creates islands of
> connectivity.

Hmm, we have managed switches and so we don't restart the SM's on them unless 
we have to do a complete power-out for the machine room, which is very rare.  
Could well explain why we're not seeing this problem!

All the best,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci



More information about the Beowulf mailing list