[Beowulf] Cluster Benchmarks?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jun 14 12:02:09 PDT 2004
- Previous message: [Beowulf] Cluster Benchmarks?
- Next message: [Beowulf] Cluster Benchmarks?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 14 Jun 2004, Jonathan Michael Nowacki wrote: > Does anyone know of any impartial benchmark websites? Something that > would compare the Xserve vs. Opteron vs. Athlon vs. P4 for scientific > computing? > > I found this website, but it's internal use only. Too bad. > http://www.unc.edu/atn/asg/benchmark/benchmark_2003.html Curiously enough, I'm working actively on cpu_rate, an impartial benchmark. cpu_rate is a fully GPL v2b, open source benchmark that I'm rewriting in an "object oriented" design where each test is in a more or less standard wrapper with well-defined init, alloc, free, test and results routines. The rest of the code is a reusable, consistent timing shell that automagically computes how hard it needs to work to get good precision with a high precision timer (usually the cpu clock itself, but on an Xserve you might have to use gettimeofday -- the "automagic" part means that if you do this you'll still get consistent precision but the inner loops will likely have to run longer to get it). The timing harness runs a selectable number of interations of the entire timing process (default 100) and returns both the mean timing and the standard deviation, in nanoseconds, of the selected operation. This is in the general category of "microbenchmark" -- the code in the tested segment of the testing routines is typically a small code fragment or even an atomic operation. However, the harness can manage vector streams with a single index, and there is even a trick one can use to automatically test at least memory access both for streaming (sequential) access and for random/shuffled access, which can be a very illuminating test. Tests that it will run at the moment include: rgb at lilith|B:1112>cpu_rate -l # Name Remark ======================================================================== 0 Null test Test validation loop, should take "no time" (infinite rate) 1 bogomflops d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector) 2 bogomtrids d[i] = (ad + bd - cd)*d[i] (8 byte double vector) 3 stream copy d[i] = a[i] (8 byte double vector) 4 stream scale d[i] = xtest*d[i] (8 byte double vector) 5 stream add d[i] = a[i] + b[i] (8 byte double vector) 6 stream triad d[i] = a[i] + xtest*b[i] (8 byte double vector) 7 memory read/write Reads, then writes (4 byte integer vector) 8 memory read Reads (4 byte integer vector) 9 memory write Writes (4 byte integer vector) 10 savage xtest = tan(atan(exp(log(sqrt(xtest*xtest))))) Note that it contains the four stream tests, sort of. I say sort of because although it tests the stream operations, it uses strictly malloc'd memory for the vectors and consequently has one more (pointer) layer of indirection than stream, which allocates the vectors in the data segment (no dynamic pointers). The stream results are typically a tiny bit slower than "stream" per se, but I think they are more useful as you can observe how stream results vary with vector size as one sweeps across e.g. various cache sizes and strides. I'm also going to experiment a bit to see if I can have a hard allocated variant of stream independent of a malloc'd version, and use some clever indirection to avoid malloc'ing memory for the latter until one exceeds the hard allocated data space. The "cost" of this will be that the code will have a rather large default data size even if one is running (say) savage, which requires no memory at all to speak of. As noted, the direct memory tests can use shuffled or sequential access with dramatically different rates, as one would expect, running out of main memory for vectors larger than cache. Alas, as I write this I >>AM<< working on it, and am about 2/3 of the way through eliminating what I hope/expect is the last pernicious memory leak. With luck, it will take me only another hour or two to get the code to where it runs perfectly for the last three tests and a bit longer to run the full suite of tests on Celeron, P4, AMD boxen just to be sure it works there still. By (maybe) five or six pm EST I'll likely have the new image up, in what I'd call late alpha or beta mode. The whole point of the rewrite is that this suite SHOULD be very easy to add your own code fragment to for testing purposes. Copy any of the existing tests (mflops.c and mflops.h, say) to mynewtest.c and mynewtest.h. Add a couple of lines to tests.h (one to an enum list, one include line for mynewtest.h). Add a line to cpu_rate_startup.c to call the initialization routine. Edit mynewtest.c in pretty obvious ways, documented (I hope) in the comments, compile, run. Out should spit nanosecond timings of said operation(s), with standard deviation and a "bogomega"-rate of same (millions of operations per second). Eventually it should be straightforward to instrument lots of microscopic operations and subroutine/library calls. I wrote this originally (back in the 80's, MUCH more crudely) to try to answer the simple question "how fast can this system do a floating point operation", a thing that vendor estimates and most benchmarks of the day never answered to anything like my satisfaction. You can find it either on my personal web pages under Beowulf or on the brahma site under resources, but you might wait until I announce the new revision "formally" hopefully later today, along with the URL(s), to retrieve it. Hope it helps. rgb > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Cluster Benchmarks?
- Next message: [Beowulf] Cluster Benchmarks?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
