[Beowulf] Strange hardware? problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Orion Poplawski orion at cora.nwra.comFri Apr 27 10:17:57 PDT 2007
- Previous message: [Beowulf] is ethernet network proper for a 50-node Beowulf cluster?
- Next message: [Beowulf] Strange hardware? problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm at a loss and trying to see if anyone else has had similar problems. We've got two pairs of identical machines: - 2 Tyan S2882 dual processor Opteron 244 stepping 10 - 2 Tyan S2882-D dual processor dual core Opteron 275 stepping 2 We have two (relatively complicated) numerical models (RAMS and a homegrown one) that will blow up in random locations on the 244 machines but run fine on the 275 machines. By blow up it appears the calculations get corrupted in some way and the numbers get un-physical in RAMS and the simulation exits. With the other model we get segfaults. Memtest86 runs fine. No other hardware issues that I can find. We've tried FC4/5 on the 244 machines. At one point all were running identical FC5 installs with the same problems. Problem is not exactly reproducible unfortunately. It will crash at different times in the simulations, but they will crash at some point with the length of runs we are doing. Are there any cpu tests out there that would check the accuracy of various calculations? Thanks! -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com
- Previous message: [Beowulf] is ethernet network proper for a 50-node Beowulf cluster?
- Next message: [Beowulf] Strange hardware? problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
