Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Accelerator for data compressing

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Eric Thibodeau kyron at neuralbs.com
Tue Oct 7 11:07:14 PDT 2008


Dmitri Chubarov wrote:
> Hello,
>
> we have got a VX50 down here as well. We have observed very different
> scalability on different applications. With an OpenMP molecular
> dynamics code we have got over 14 times speedup while on a 2D finite
> difference scheme I could not get far beyond 3 fold.
>   
2D finite difference can be comm intensive is the mesh is too small for 
each processor to have a fair amount of work to do before needing the 
neighboring values from a "far" node.
> On Tue, Oct 7, 2008 at 10:45 PM, Eric Thibodeau <kyron at neuralbs.com> wrote:
>   
>> PS: Interesting figures, I couldn't resist compressing the same binary DB on
>> a 16Core Opteron (Tyan VX50) machine and was dumbfounded to get horrible
>> results given the same context. The processing speed only came up to 6.4E6
>> bytes/sec ...for 16 cores, and they were all at 100% during the entire run
>> (FWIW, I tried different block sizes and it does have an impact but this
>> also changes the problem parameters).
>>     
>
> Reading your message in the Beowulf list I should say that it looks
> interesting and probably shows something happening with the memory
> access on the NUMA nodes. Did you try to run the archiver with
> different affinity settings?
>   
I don't have affinity control over the app per say. I would have to 
look/modify pbzip's code. Although, note that the PID's assignment to 
one processor is governed by the kernel and is thus a scheduler issue. 
Also note that I have noticed that the kernel doesn't just have fun 
moving the processes around the cores.
> We have observed that the memory architecture shows some strange
> behaviour. For instance the latency for a read from the same NUMA node
> for different nodes varies significantly.
>   
This is the nature of NUMA. Furthermore, if you have to cross to a far 
CPU, the latency is also dependent on the CPU's load.
> Also on the profiler I often see that x86 instructions that have one
> of the operands in memory may
> take disproportionally long. I believe that could explain the 100% CPU
> load reported by the kernel.
>   
How do you identify the specific instruction using a profiler, this is 
something that interests me.
> From the very little knowledge of this platform that we have got, I
> tend to advise the users not to expect good speedup on their
> multithreaded applications. 
Using OpenMP (from GCC 4.3.x) and an embarrassingly parallel problem 
(computing K-Means on a large database), I do get significant speedup 
(15-16).
> Yet it would be interesting to get a
> better understanding of the programming techniques for this sedecimus
> and the similar machines.
OpenMP is IMHO the easiest one that will bring you the most performance 
out of 3 lines of #pragma directives. If you manage to get a cluster of 
VX50s, then learn a bit of MPI to glue all of this together ;)
> Even more so due to the QPI systems becoming
> commercially available very soon.
Don't know that one (QPI)...oh...new Intel stuff...no matter how much I 
try to stay ahead, I'm always years behind!
>  At the moment we have got a few
> small kernels written in C and Fortran with OpenMP that we use to
> evaluate different parallelization strategies. Unfortunately, there
> are no tools I would know of that could help to explain what's going
> on inside the memory of this machine.
>   
Of course, check out TAU ( 
http://www.cs.uoregon.edu/research/tau/home.php ), it will at least help 
you identify bottlenecks and give you an impressive profiling 
infrastructure.
> I am very much interested to hear more about your experience with VX50.
>
> Best regards,
>   Dima Chubarov
>
> --
>   Dmitri Chubarov
>   junior researcher
>   Siberian Branch of the Russian Academy of Sciences
>   Institute of Computational Technologies
>   http://www.ict.nsc.ru/indexen.php
>   

Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20081007/bb7f12af/attachment.html


More information about the Beowulf mailing list