[Beowulf] Accelerator for data compressing

Vincent Diepeveen diep at xs4all.nl
Fri Oct 3 08:27:36 PDT 2008


Hi Carsten,

In your example the only thing that seems to matter to you is  
*collecting* data speed,
in short the realtime compression speed that tapestreamers can get,  
to give one
example.

In your example you need to compress each time stuff.

That's not being realistic however. I'll give you 2 arguments.

In an ideal situation you compress things only a single time. So  
decompression speed
matters if you want to realtime lookup data. It would be far more  
ideal in your situation to
already have things very well compressed at your drive, doing some  
realtime compression/decompression
is not very useful then, the hardware compression of those tape  
streamers already is doing some simplistico
Runlength encoding usually.

Now another lack is that you assume stuff that's on your 10TB array  
is never getting used by not a single user
over the network.

Vincent




On Oct 3, 2008, at 1:49 PM, Carsten Aulbert wrote:

> Hi Vincent,
>
> Vincent Diepeveen wrote:
>> Ah you googled 2 seconds and found some oldie homepage.
>
> Actually no, I just looked at my freshmeat watchlist of items still to
> look at :)
>
>>
>> Look especially at compressed sizes and decompression times.
>
> Yeah, I'm currently looking at
> http://www.maximumcompression.com/data/summary_mf3.php
>
> We have a Gbit network, i.e. for us this test is a null test, since it
> takes 7-zip close to 5 minutes to compress the data set of 311 MB  
> which
> we could blow over the network in less than 5 seconds, i.e. in this  
> case
> tar would be our favorite ;)
>
>>
>> The only thing you want to limit over your network is the amount of
>> bandwidth over your network.
>> A real good compression is very helpful then. How long compression  
>> time
>> takes is nearly not relevant,
>> as long as it doesn't take infinite amounts of time (i remember a new
>> zealand compressor which took 24
>> hours to compress a 100MB data). Note that we are already at a phase
>> that compression time hardly
>> matters, you can buy a GPU for that to offload your servers for that.
>>
>
> No, quite on the contrary. I would like to use a compressor within a
> pipe to increase the throughput over the network, i.e. to get  
> around the
> ~ 120 MB/s limit.
>
>> Query time (so decompression time) is important though.
>>
>
> No for us that number is at least as important than the compression  
> time.
>
> Imagine the following situation:
>
> We have a file server with close to 10 TB of data on it in nice chunks
> with a since of about 100 MB each. We buy a new server with new disks
> and the new one can now hold 20 TB and we would like to copy the data
> over. So for us the more important figure is the
> compression/decompression speed which should be >> 100 MB/s on our  
> systems.
>
> If 7-zip can only compress data at a rate of less than say 5 MB/s  
> (input
> data) I can much much faster copy the data over uncompressed  
> regardless
> of how many unused cores I have in the system. Exactly for these  
> cases I
> would like to use all cores available to compress the data fast in  
> order
> to increase the throughput.
>
> Do I miss something vital?
>
> Cheers
>
> Carsten
>




More information about the Beowulf mailing list