[Beowulf] Multisocket mainboard hardware problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bruno Coutinho coutinho at dcc.ufmg.brThu Jan 15 15:28:22 PST 2009
- Previous message: [Beowulf] Multisocket mainboard hardware problems
- Next message: [Beowulf] Multisocket mainboard hardware problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Some common cpu tests: - linpack - mprime: http://www.mersenne.org/freesoft/ - compile a kernel Linpack and mprime are great to do cpu burn in tests. Mprime has a option to verify results, so you can detect aritmetic errors and there's a option for testing your machine without joining the grid. 2009/1/15 Jon Aquilina <eagles051387 at gmail.com> > try running memtest+86 its a cd that you boot on to that tests the memory > leave it running for a few hrs to makes sure it is the ram or sockets. i am > not sure about how to test the cpu. > > On Tue, Jan 13, 2009 at 10:26 AM, Francesco Pietra < > francesco.pietra at accademialucchese.it> wrote: > >> Hi: >> >> I am posting here from a suggestion on the Debian amd64 site. My >> original posting to the mainboard factory/vendor in Europe only >> resulted in uninteresting suggestions, and they did not answer any >> more. >> >> My question is directed to the attention of users familiar with >> multisocket UMA-type mainboards based on 875 dual opteron AMD CPU. My >> own is Supermicro H8QC8 with chipset nVidia CK804 and AMD 8132, driven >> by Debian Linux amd64 lenny. >> >> One of the CPUs has suddenly lost viability to its >> 4-slots memory bank (shut down the machine in order, the problem arose on >> next >> loading Linux). Still, the CPU cores are OK, hypertransport links are >> fully working, parallelization to both Amber 10 and NWChem 5.1 is >> fully provided, but one of the CPUs must be slower, having to borrow >> memory from the other >> banks. The hardware status, after a period of complete darkness, is >> described in the attached lshw_deb64_7Jan2009.txt. >> >> As each bank of Kingston DDR1 is filled 2+2+1+1 GB, I identified the >> faulty bank, removed all slots from there, and replaced the 1+1 GB >> slots at another bank with 2 + 2 GB from the faulty bank, so that now >> the computer is at 20GB. The situation is described in the attached >> lshw_deb64_lessCPU2_scrambling1G_2G_CPU4_7Jan2009.txt. Actually, >> identification of the CPU (CPU2) related to the faulty mem bank is >> insecure: I just considered the nearest CPU to the faulty bank. The >> manual is not helpful to this regard . >> >> I understand that, in order to remove non-mainboard causes, I should >> be certain that a CPU has not lost memory control. Since replacing (I >> have one spare second-hand CPU) or scrambling, the CPUs is quite >> troublesome, and risky, in my context (there is very little space >> around the mainboard in the rack that I engineered to accept the >> mainboard). Ventilation is excellent, however. >> >> Therefore, is it any software way to check if the CPUs are fully in >> order, including the memory controller? lshw and other software >> provided only partial help in my hands. >> >> Also any other suggestion would be greatly appreciated. >> >> Thanks for your kind attention >> >> francesco pietra >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > -- > Jonathan Aquilina > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090115/65ebc878/attachment.html
- Previous message: [Beowulf] Multisocket mainboard hardware problems
- Next message: [Beowulf] Multisocket mainboard hardware problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
