Myrinet hardware reliability
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Victoria Pennington v.pennington at man.ac.ukFri Feb 7 01:40:05 PST 2003
- Previous message: leaky capacitors killing motherboards
- Next message: Myrinet hardware reliability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, We have a 113 node IBM x330 cluster with Myrinet 2000. We're experiencing very high failure rates on Myrinet switch ports (average 3 per month) and on Myrinet NICs to a lesser extent (about 1 per month). Ports and NICs are fine one minute, then one or the other just dies (for good). Cables (fibre, not copper) seem fine - one or two failures only in nearly a year. There is no pattern in the failures, and they are entirely unrelated to usage levels; seldom used nodes are just as likely to have failures as heavily used nodes. We have another small IBM cluster with Myrinet 2000 (16 port switch with copper cables), and this has run solidly for nearly 2 years with not one Myrinet hardware fault. I'd be really interested to know of others' experiences with Myrinet kit, especially in larger clusters. Thanks Victoria --- Dr Victoria Pennington Manchester Computing, Kilburn Building, University of Manchester, Oxford Road, Manchester M13 9PL tel. 0161 275 6830, email: v.pennington at man.ac.uk
- Previous message: leaky capacitors killing motherboards
- Next message: Myrinet hardware reliability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
