[Beowulf] Nehalem memory configs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSat Apr 11 11:43:42 PDT 2009
- Previous message: [Beowulf] Nehalem memory configs
- Next message: [Beowulf] Nehalem memory configs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Since the part is released, I can report a stream test :) richard.walsh at comcast.net wrote: > 64 GB/sec is the right dual-socket theoretical number for this > situation, and Intel > > presents the value of 33 GB/sec for the stream triad for the dual > socket boards, > > so 35 GB/sec could be a copy perhaps, but nothing was mentioned about > any > > benchmark in the memory piece. In any case, I think we have the > right theoretical > > and probable real-world numbers expressed here, if people were > wondering. 2-socket Intel MB with 2 dual core (not quad core) Nehalem E5502 1.8 GHz processors, running stream omp (I bumped N way up to get a reasonable measurement). landman at velocibunny:~/stream$ ./stream_c_omp.exe ------------------------------------------------------------- STREAM version $Revision: 5.8 $ ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 200000000, Offset = 0 Total memory required = 4577.6 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested = 4 ------------------------------------------------------------- Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 130623 microseconds. (= 130623 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 16545.0680 0.1942 0.1934 0.1958 Scale: 16098.2714 0.1996 0.1988 0.2019 Add: 17929.8514 0.2684 0.2677 0.2697 Triad: 17682.8117 0.2719 0.2715 0.2722 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- and for laughs, same test run (with same binary) on Shanghai 2.3 GHz (2376) with OMP_NUM_THREADS=4 landman at pegasus-a3g:~/stream$ ./stream_c_omp.exe ------------------------------------------------------------- STREAM version $Revision: 5.8 $ ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 200000000, Offset = 0 Total memory required = 4577.6 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested = 4 ------------------------------------------------------------- Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... Printing one line per active thread.... ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 210029 microseconds. (= 210029 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 10885.6547 0.2943 0.2940 0.2946 Scale: 10966.1188 0.2923 0.2918 0.2929 Add: 12019.7420 0.4002 0.3993 0.4012 Triad: 12127.1875 0.3965 0.3958 0.3968 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- I suspect we have the pegasus memory in a non-optimal config, will look later on next week. Assuming we can get a pair of quad core Nehalem units into our test machine, it appears that 32 GB/s on stream is quite possible. Right now it looks like ~4 GB/s per thread. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Nehalem memory configs
- Next message: [Beowulf] Nehalem memory configs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
