[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comMon Apr 6 06:11:50 PDT 2009
- Previous message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Next message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Chris Samuel wrote: > ----- "Rahul Nabar" <rpnabar at gmail.com> wrote: > >> I contact Dell. Responses range from the clueless to absurd. First, >> they convinced us it was Fedora. So I shifted to CentOS. They still >> claim CentOS is "unvalidated" but I refuse to spend a fortune to move >> over to RHEL like they want me to. > > Not that this helps, but you have my sympathy as I've > been dealing with the same stuff from IBM over a storage > server they sold us. > > Turns out I can make 7-12 drives in their external > enclosures fail in short order (seconds to minutes > between failures) by telling the software RAID to > do a check, thus: > > for i in md[0123]; do > echo check > /sys/block/$i/md/sync_action > done Are these softirq cpu hangs? could you tell me what cat /sys/block/md[0123]/md/stripe_cache_size reports? > > Even though we could reproduce it on 64-bit Debian > and 32-bit CentOS they wouldn't escalate the issue > until we could reproduce it on RHEL5 - which we did > today. > > Sigh.. > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Next message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
