[Beowulf] Woodcrest Memory bandwidth
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comMon Aug 14 13:02:40 PDT 2006
- Previous message: [Beowulf] Woodcrest Memory bandwidth
- Next message: [Beowulf] Woodcrest Memory bandwidth
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: > kinda sucks, doesn't it? here's what I get for a not-new dual-275 with > 8x1G PC3200 (I think): > > Function Rate (MB/s) RMS time Min time Max time > Copy: 5714.6837 0.0840 0.0840 0.0841 > Scale: 5821.0766 0.0825 0.0825 0.0826 > Add: 6437.8226 0.1119 0.1118 0.1120 > Triad: 6414.2079 0.1123 0.1123 0.1124 > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf Now that I am back from "da Yoo Pee" I can post some of my numbers. Here is Our dual core opteron 275. 4-threads Function Rate (MB/s) Avg time Min time Max time Copy: 9999.3465 0.0356 0.0320 0.0360 Scale: 8888.4147 0.0360 0.0360 0.0360 Add: 9230.2533 0.0542 0.0520 0.0560 Triad: 9230.2321 0.0538 0.0520 0.0560 1-thread Function Rate (MB/s) Avg time Min time Max time Copy: 4705.6130 0.0711 0.0680 0.0720 Scale: 4705.6130 0.0702 0.0680 0.0720 Add: 4615.1161 0.1067 0.1040 0.1080 Triad: 4444.1975 0.1080 0.1080 0.1080 using a PathScale compiled binary. I see slightly higher numbers using PGI 6.1-2 compiled binaries for single threads, not sure why. The 6.1-5/6 compiled are worse :( Function Rate (MB/s) RMS time Min time Max time Copy: 6666.9379 0.0454 0.0300 0.0500 Scale: 4000.0610 0.0567 0.0500 0.0600 Add: 4285.7330 0.0758 0.0700 0.0800 Triad: 4285.7330 0.0747 0.0700 0.0900 Same code binary on woodcrest 2.66 GHz Function Rate (MB/s) RMS time Min time Max time Copy: 5000.1240 0.0427 0.0400 0.0500 Scale: 5000.1240 0.0452 0.0400 0.0500 Add: 5000.0445 0.0685 0.0600 0.0800 Triad: 5000.0445 0.0712 0.0600 0.0800 Intel 9.1 compiled version (64 bit) 1-thread Function Rate (MB/s) RMS time Min time Max time Copy: 4447.6829 0.1440 0.1439 0.1445 Scale: 4613.8072 0.1388 0.1387 0.1390 Add: 4256.9431 0.2256 0.2255 0.2259 Triad: 4187.6605 0.2294 0.2292 0.2302 2-threads Function Rate (MB/s) RMS time Min time Max time Copy: 7288.3813 0.0882 0.0878 0.0893 Scale: 7186.2381 0.0891 0.0891 0.0893 Add: 7085.0852 0.1357 0.1355 0.1365 Triad: 6916.0273 0.1389 0.1388 0.1392 3-threads Function Rate (MB/s) RMS time Min time Max time Copy: 6589.2489 0.0989 0.0971 0.1001 Scale: 6528.4171 0.0988 0.0980 0.0997 Add: 6535.0076 0.1488 0.1469 0.1504 Triad: 6563.9202 0.1486 0.1463 0.1496 4-threads Function Rate (MB/s) RMS time Min time Max time Copy: 6645.4125 0.0965 0.0963 0.0976 Scale: 6994.6233 0.0916 0.0915 0.0917 Add: 6373.0207 0.1508 0.1506 0.1509 Triad: 6710.7522 0.1432 0.1431 0.1433 I may have been Bill's 10 GB/s source, and that may have been a mixup on my part. FWIW: the PathScale compiled binaries on this machine give Function Rate (MB/s) Avg time Min time Max time Copy: 7272.4071 0.0453 0.0440 0.0480 Scale: 7272.2298 0.0462 0.0440 0.0480 Add: 5999.6258 0.0827 0.0800 0.0840 Triad: 5999.6302 0.0831 0.0800 0.0840 and the PGI compiled ones give Function Rate (MB/s) RMS time Min time Max time Copy: 6608.0161 0.0970 0.0969 0.0977 Scale: 4592.3298 0.1395 0.1394 0.1397 Add: 4259.8885 0.2262 0.2254 0.2269 Triad: 4244.0478 0.2269 0.2262 0.2273 They may be slightly different versions of the original source (notice the labels on the columns), but the core measurements are the same. On the Opteron 275, we have two memory nodes, each with multiple banks per node. landman at dualcore:~/stream> numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 cpubind: nodebind: membind: 0 1 landman at dualcore:~/stream> numactl --hardware available: 2 nodes (0-1) node 0 size: 2015 MB node 0 free: 1276 MB node 1 size: 4025 MB node 1 free: 2416 MB node distances: node 0 1 0: 10 20 1: 20 10 On the woodcrest, it looks like a single memory node. landman at woody:~> numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 cpubind: nodebind: membind: 0 landman at woody:~> numactl --hardware available: 1 nodes (0-0) node 0 size: 4017 MB node 0 free: 2649 MB node distances: node 0 0: 10 I have it on good authority that with the other chipset (we have a Blackford here), we should see higher numbers. Not exceeding the Opteron 275 though. When I have time, I will investigate this more and write about it on my blog. FWIW, I am not seeing a clear performance picture emerging. I have heard speculation/rumor from others, but I prefer measurement, and my measurements while consistent, are not exposing a nice and meaningful picture where I can say "yes its faster" or "no it isn't". What I can say is that Woodcrest is interesting. It just may be overhyped by a "compliant" media. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Woodcrest Memory bandwidth
- Next message: [Beowulf] Woodcrest Memory bandwidth
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
