[Beowulf] CCL:Question regarding Mac G5 performance
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comMon May 24 15:40:45 PDT 2004
- Previous message: [Beowulf] CCL:Question regarding Mac G5 performance
- Next message: [Beowulf] Gaussian Support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Konstantin Kudin wrote: >--- Joe Landman <landman at scalableinformatics.com> wrote: > > > >>> It is unlikely that one can gain much speed from going to 64 bits, >>> >>> >>but >> >> >>>the support for larger memory and unlimited scratch files is very >>>worthwhile in itself. >>> >>> >>I have seen in md43, moldy, and a few others, about 20-30% under gcc >>recompilation with -m64 on Opteron. For informatics codes it was >>about the same. >> >> > > Well, bioinformatics codes presumably run mostly integer operations. >This is very different from heavy floating point calculations in G03 >which are double precision in all architectures and run with >approximately the same speed regardless of 32/64 bitness. > > md43 and moldy are molecular dynamics codes. I had been thinking of running some tests with GAMESS, or some other codes just so I have a sense of how electronic structure codes do on the system. For computationally intensive codes, going to 64 bits on the Opteron gets you a) double the number of general purpose registers, b) double the number of SSE registers. This means that the optimizer, if it is dealing with a heavy register pressure code, can do a better job of scheduling the resources. It also means that some codes may be able to leverage more instructions in flight per cycle because of resource availability. The address space is also flat as compared to the segmented space of the 32 bit mode. It is most definitely not a simple case of there being just a 32 vs 64 bit address space. That advantage is there, but it is not the only one. One of the interesting side effects of the NUMA systems has to do with memory bandwidth per CPU as you increase the number of CPUs in a node. For a correctly populated system (e.g. memory evenly distributed among the CPUs), each CPU has full bandwidth to its local memory, and an additional latency hop to remote memory on the same node. If you stack all the memory on a single CPU (as I have seen many folks do, then run benchmarks, and report their results), you share memory bandwidth. In this case, you get the sort of results we see occasionally reported here. Similar results occur if you have a kernel (say an ancient one like 2.4.18) that doesn't know much about NUMA and related scheduling. > > > >>Your mileage will vary of course, but I expect with Gaussian and >>others that >>overflow memory, the overall system design will be as important (if >>not more so) >>to the overall performance than the CPU architecture, unless you can >>somehow >>isolate the computation to never spill to disk. >> >> > > With G03, some types of jobs will mostly be compute bound, and others >will be mostly I/O bound. This is reasonably trivial to predict >beforehand. I've tested jobs which were compute bound because testing >the other side of the equation is more difficult due to more factors. > > For I/O bound jobs a box with loads of RAM and fast sequential I/O is >the best. Something like dual-quad Opteron with 16-32Gb of RAM and 2-4 >ATA disks with RAID0 (striping) is a good choice these days. > > :) I might suggest the 3ware folks for their controllers. Just pick your file systems and stripe width carefully. Joe
- Previous message: [Beowulf] CCL:Question regarding Mac G5 performance
- Next message: [Beowulf] Gaussian Support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
