[Beowulf] MonetDB's lightweight compression; Re: Accelerator for data compressing
atp at piskorski.com
Fri Oct 3 13:05:49 PDT 2008
On Fri, Oct 03, 2008 at 01:49:04PM +0200, Carsten Aulbert wrote:
> No, quite on the contrary. I would like to use a compressor within a
> pipe to increase the throughput over the network, i.e. to get around the
> ~ 120 MB/s limit.
Carsten, it is probably not directly relevant to you, but you may want
to check out MonetDB, particularly their newer "X100" bleeding edge
R&D version. Among other things, they've published papers with lots
of interesting detail on using super-lightweight software compression
to greatly increase database disk IO bandwith.
Their main software tricks for faster disk IO seemed to be:
One, EXTREMELY lightweight compression schemes - basically table
lookups designed to be as cpu friendly as posible. Two, keep the data
compressed in RAM as well so that you can cache more of the data, and
indeed keep it the compressed until as late in the CPU processing
pipeline as possible. From what I remember, MonetDB/X100 actually
does all decompression solely in the CPU cache, inline with query
Back c. July 2005, a Sandor Heman, one of the MonetDB guys, looked at
zlib, bzlib2, lzrw, and lzo, to improve database disk IO bandwith, and
"... in general, it is very unlikely that we could achieve any
bandwidth gains with these algorithms. LZRW and LZO might increase
bandwidth on relatively slow disk systems, with bandwidths up to
100MB/s, but this would induce high processing overheads, which
interferes with query execution. On a fast disk system, such as our
350MB/s 12 disk RAID, all the generic algorithms will fail to achieve
"Super-Scalar Database Compression between RAM and CPU Cache"
Back in 2006, the cool new X100 features were not released in MonetDB
proper (which is Open Source), but by now that may have changed.
Lots of links:
"MonetDB/X100 - A DBMS In The CPU Cache"
Andrew Piskorski <atp at piskorski.com>
More information about the Beowulf