[Beowulf] Accelerator for data compressing
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eric Thibodeau kyron at neuralbs.comTue Oct 7 11:07:14 PDT 2008
- Previous message: [Beowulf] large MPI adopters
- Next message: [Beowulf] Accelerator for data compressing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dmitri Chubarov wrote: > Hello, > > we have got a VX50 down here as well. We have observed very different > scalability on different applications. With an OpenMP molecular > dynamics code we have got over 14 times speedup while on a 2D finite > difference scheme I could not get far beyond 3 fold. > 2D finite difference can be comm intensive is the mesh is too small for each processor to have a fair amount of work to do before needing the neighboring values from a "far" node. > On Tue, Oct 7, 2008 at 10:45 PM, Eric Thibodeau <kyron at neuralbs.com> wrote: > >> PS: Interesting figures, I couldn't resist compressing the same binary DB on >> a 16Core Opteron (Tyan VX50) machine and was dumbfounded to get horrible >> results given the same context. The processing speed only came up to 6.4E6 >> bytes/sec ...for 16 cores, and they were all at 100% during the entire run >> (FWIW, I tried different block sizes and it does have an impact but this >> also changes the problem parameters). >> > > Reading your message in the Beowulf list I should say that it looks > interesting and probably shows something happening with the memory > access on the NUMA nodes. Did you try to run the archiver with > different affinity settings? > I don't have affinity control over the app per say. I would have to look/modify pbzip's code. Although, note that the PID's assignment to one processor is governed by the kernel and is thus a scheduler issue. Also note that I have noticed that the kernel doesn't just have fun moving the processes around the cores. > We have observed that the memory architecture shows some strange > behaviour. For instance the latency for a read from the same NUMA node > for different nodes varies significantly. > This is the nature of NUMA. Furthermore, if you have to cross to a far CPU, the latency is also dependent on the CPU's load. > Also on the profiler I often see that x86 instructions that have one > of the operands in memory may > take disproportionally long. I believe that could explain the 100% CPU > load reported by the kernel. > How do you identify the specific instruction using a profiler, this is something that interests me. > From the very little knowledge of this platform that we have got, I > tend to advise the users not to expect good speedup on their > multithreaded applications. Using OpenMP (from GCC 4.3.x) and an embarrassingly parallel problem (computing K-Means on a large database), I do get significant speedup (15-16). > Yet it would be interesting to get a > better understanding of the programming techniques for this sedecimus > and the similar machines. OpenMP is IMHO the easiest one that will bring you the most performance out of 3 lines of #pragma directives. If you manage to get a cluster of VX50s, then learn a bit of MPI to glue all of this together ;) > Even more so due to the QPI systems becoming > commercially available very soon. Don't know that one (QPI)...oh...new Intel stuff...no matter how much I try to stay ahead, I'm always years behind! > At the moment we have got a few > small kernels written in C and Fortran with OpenMP that we use to > evaluate different parallelization strategies. Unfortunately, there > are no tools I would know of that could help to explain what's going > on inside the memory of this machine. > Of course, check out TAU ( http://www.cs.uoregon.edu/research/tau/home.php ), it will at least help you identify bottlenecks and give you an impressive profiling infrastructure. > I am very much interested to hear more about your experience with VX50. > > Best regards, > Dima Chubarov > > -- > Dmitri Chubarov > junior researcher > Siberian Branch of the Russian Academy of Sciences > Institute of Computational Technologies > http://www.ict.nsc.ru/indexen.php > Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081007/bb7f12af/attachment.html
- Previous message: [Beowulf] large MPI adopters
- Next message: [Beowulf] Accelerator for data compressing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
