[Beowulf] MonetDB's lightweight compression; Re: Accelerator for data compressing

Andrew Piskorski atp at piskorski.com
Fri Oct 3 13:05:49 PDT 2008


On Fri, Oct 03, 2008 at 01:49:04PM +0200, Carsten Aulbert wrote:

> No, quite on the contrary. I would like to use a compressor within a
> pipe to increase the throughput over the network, i.e. to get around the
> ~ 120 MB/s limit.

Carsten, it is probably not directly relevant to you, but you may want
to check out MonetDB, particularly their newer "X100" bleeding edge
R&D version.  Among other things, they've published papers with lots
of interesting detail on using super-lightweight software compression
to greatly increase database disk IO bandwith.

Their main software tricks for faster disk IO seemed to be:
One, EXTREMELY lightweight compression schemes - basically table
lookups designed to be as cpu friendly as posible.  Two, keep the data
compressed in RAM as well so that you can cache more of the data, and
indeed keep it the compressed until as late in the CPU processing
pipeline as possible.  From what I remember, MonetDB/X100 actually
does all decompression solely in the CPU cache, inline with query
processing.

Back c. July 2005, a Sandor Heman, one of the MonetDB guys, looked at
zlib, bzlib2, lzrw, and lzo, to improve database disk IO bandwith, and
claimed that:

  "... in general, it is very unlikely that we could achieve any
  bandwidth gains with these algorithms. LZRW and LZO might increase
  bandwidth on relatively slow disk systems, with bandwidths up to
  100MB/s, but this would induce high processing overheads, which
  interferes with query execution. On a fast disk system, such as our
  350MB/s 12 disk RAID, all the generic algorithms will fail to achieve
  any speedup."

  http://www.google.com/search?q=MonetDB+LZO+Heman&btnG=Search
  http://homepages.cwi.nl/~heman/downloads/msthesis.pdf
    "Super-Scalar Database Compression between RAM and CPU Cache"

Back in 2006, the cool new X100 features were not released in MonetDB
proper (which is Open Source), but by now that may have changed.
Lots of links:

  http://www.bestechvideos.com/2008/02/21/monetdb-x100-a-very-fast-column-store
  ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm
    "MonetDB/X100 - A DBMS In The CPU Cache"
  http://www.monetdb.nl/
  http://homepages.cwi.nl/~mk/MonetDB/
  http://sourceforge.net/projects/monetdb/
  http://homepages.cwi.nl/~boncz/x100.html
  http://www.jsequeira.com/blog/2006/01/17.html

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/



More information about the Beowulf mailing list