Burn-in Utilities
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Velocet math at velocet.caWed Apr 24 08:30:08 PDT 2002
- Previous message: Burn-in Utilities
- Next message: Burn-in Utilities
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Apr 24, 2002 at 10:45:19AM -0400, Justin Nemmers's all... > All: > I am in search of a utility that will allow me to burn-in a > new PC. Ideally, it would peg the procs at 100% as well as exercise > the memory (as much as 2Gb/Node. I know there is a Sun provided > utility to do this on Sparc systems, but does anyone have a > suggestion for a linux-based (perl would work, too) that will do the > same thing? The packages (in debian and redhat AFAIK) cpuburn and memtest will do you nicely. We run 5 odd of each of burnMMX burnK7 and memtest on our athlon machines for 2-3 days and see if even one crashes. We've had a crash on machines tested AFTER being in service with no problems for 3-4 months. So its definitely a hardcore excercise. Oh we also stick dnetc on them on top of all that just to make sure its hurting. I think they're set to generate the most heat possible in the CPU during operation. They definitely draw the most current - when we were first setting up our cluster and werent sure of power draw, 8 dual 1.333Ghz athlon boards (no drives) would run G98 fine on a 15 amp circuit - as soon as we ran burnMMX/k7 we'd blow breakers. We run 5-10 to get a nice high context switch going and excercise the OS as well ;) We (through trial and error) found that running only 1 each of burnMMX/burnK7 at a time will often not crash for days, whereas running 5-10 will. (In fact, we only consider a crash within 12 hours to be a reason to RMA it if its slated for a workstation running windows. 12 hours of that test is almost equivalent to a crash every 3-6 months of regular LINUX desktop use (and with windows how can you tell? :)) Its actually suprising how well you can measure the quality of boards that way. Out of 40 246x Tyan boards we found one bad stick of ram and 0 cpus and boards bad using this method. However with ECS K75As we found 1/10 boards as shipped to us would die in 1-6 hours under this load, and another 1/10 will die within the 2-3 days. while ! burnMMX; do RMA_via_VAR; done Nonetheless we've never seen every unit of a certain brand always crash within that time - eventually we get good boards - so using proper sorting after testing in this manner you can always end up with a set of good boards (at least as far as these tests are concerned). So far with any board that makes it past 2-3 days of this we've never seen a problem with Gaussian98, Gromacs or distributed-net afterwards (at least until we hit long term electron migration path problems due to regular CPU heat wear and tear...) but none of our boards/CPUs (the PcChips M817 LMRs are hitting 16 months of continuous operation) are there yet. /kc > > Cheers, > Justin > -- > > System Administrator > National Institutes of Health > Center for Information Technology > 9000 Rockville PK > Building 12B 2N/207 > Bethesda, MD 20892-5680 > 301.496.0396 > http://biowulf.nih.gov > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
- Previous message: Burn-in Utilities
- Next message: Burn-in Utilities
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
