[Beowulf] Accelerator for data compressing
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlFri Oct 3 03:53:49 PDT 2008
- Previous message: [Beowulf] Accelerator for data compressing
- Next message: [Beowulf] Accelerator for data compressing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
hi Carsten, Ah you googled 2 seconds and found some oldie homepage. Try this homepage www.maximumcompression.com Far better testing over there. Note that it's the same testset there that gets compressed a lot. In real life, database type data is having all kind of patterns which PPM type compressors find. My experience is that at terabyte level the better compressors at maximumcompression.com, are a bit too slow (PAQ) and not so good like simple things like 7-zip. Look especially at compressed sizes and decompression times. The only thing you want to limit over your network is the amount of bandwidth over your network. A real good compression is very helpful then. How long compression time takes is nearly not relevant, as long as it doesn't take infinite amounts of time (i remember a new zealand compressor which took 24 hours to compress a 100MB data). Note that we are already at a phase that compression time hardly matters, you can buy a GPU for that to offload your servers for that. Query time (so decompression time) is important though. If we look to graphics there: 026 7-Zip 4.60b -m0=ppmd:o=4 764420 81.58 1.4738 .. 94 BZIP2 1.0.5 -9 890163 78.55 1.7162 .. 158 PKZIP 2.50 -exx 1250536 69.86 2.4110 159 HIT 2.10 -x 1250601 69.86 2.4111 160 GZIP 1.3.5 -9 1254351 69.77 2.4184 161 ZIP 2.2 -9 1254444 69.77 2.4185 162 WINZIP 8.0 (Max Compression) 1254444 69.77 2.4185 Note a real supercompressor is getting it even tinier: 003 WinRK 3.0.3 PWCM 912MB 568919 86.29 1.0969 Again all these tests are at microlevel. Just a few megabtes of data that gets compressed. You don't build a big infrastructure just for a few megabytes, it's not so relevant. The traffic over your network dominates there, plenty of idle server cores there is, in fact there is so many companies now that buy dual cores, as they do not know how to keep the cores in quad cores busy. This is all microlevel. Things really change when you have terabytes to compress and HUGE files. Bzip2 is ugly slow for files in gigabyte size, 7-zip is totally beating it there. Vincent On Oct 3, 2008, at 11:27 AM, Carsten Aulbert wrote: > Hi all > > Bill Broadley wrote: >> >> Another example: >> http://bbs.archlinux.org/viewtopic.php?t=11670 >> >> 7zip compress: 19:41 >> Bzip2 compress: 8:56 >> Gzip compress: 3:00 >> >> Again 7zip is a factor of 6 and change slower than gzip. > > Have you looked into threaded/parallel bzip2? > > freshmeat has a few of those, e.g. > > http://freshmeat.net/projects/bzip2smp/ > http://freshmeat.net/projects/lbzip2/ > > (with the usual disclaimer that I haven't tested them myself). > > HTH > > carsten > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] Accelerator for data compressing
- Next message: [Beowulf] Accelerator for data compressing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
