[Beowulf] Tips for diagnosing intermittent problems on a small cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
stephen mulcahy smulcahy at aplpi.comMon Nov 26 00:47:19 PST 2007
- Previous message: [Beowulf] Tips for diagnosing intermittent problems on a small cluster
- Next message: [Beowulf] I/O workload of an application in distributed file system
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Andrew M.A. Cater wrote: > Same here with on a single machine with an earlier model Tyan board - it > happened to us either after a very occasional kernel panic/exception or > after 25-28 days of continuous running. I've got a 2885 here, if I can > just find two Opterons, memory and a case :-) I'll let you know if this > one does it too. > > There _may_ be some PSU involvement with ours: the machine and fans are > running but not accepting connections. You have to disconnect the power > for a few minutes for it to even boot again properly. Powercycling from > the front panel doesn't always work > > Debian etch, stock Debian kernel (2.6.18-5 from memory). We're running the same kernels pretty much. To be fair to our system, it seems to be rock solid in general and certainly has no problems switching on or off normally. Nor have we seen any kernel panics (as far as I can remember). Nonetheless, I'm hearing some anecdotes relating to some Tyan motherboards and this kind of behaviour. Until I figure out the root cause and a more targetted fix, I guess we'll be rebooting more often! Thanks, -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway)
- Previous message: [Beowulf] Tips for diagnosing intermittent problems on a small cluster
- Next message: [Beowulf] I/O workload of an application in distributed file system
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
