The arrow of time

Ivan Voras' blog

What to use for log compression?

I've made a patch for newsyslog to use xz compression since I just assumed xz would be better because of its structure. After all, at least it doesn't process files in individual 900 kB chunks. But I think I've found a case where xz may be worse then bzip2.

This is an ordinary Apache log file, compressed with several compressors:

-rw-r--r--  1 ivoras  ivoras  52300562 May 15 01:02 log
-rw-r--r--  1 ivoras  ivoras   1972651 May 15 01:02 log.bz2
-rw-r--r--  1 ivoras  ivoras   3170087 May 15 01:02 log.gz
-rw-r--r--  1 ivoras  ivoras   5570494 May 15 01:02 log.lzo
-rw-r--r--  1 ivoras  ivoras   2081620 May 15 01:02 log.xz

Bzip2 is actually the best one here, followed by xz and gzip. Lzop is the worst but it's incredibly fast (while xz is incredibly slow).

#1 Re: What to use for log compression?

Added on 2010-07-15T15:54 by aggg

And what is the conclusion? :)

My opinion the best option what you can read back the most easier with zcat or through a pipe, or some other tool.

 On our production servers we have 1+ GB syslogs every month with monthly rotation. We are keeping the actual month uncompressed, and  more three months compressed with 'compress' in .Z format, because it is the fastest format especially while reading back with zcat. We are speaking about AIX based systems.

#2 Re: What to use for log compression?

Added on 2010-07-15T16:04 by Ivan Voras

The conclusion is that there are non-obvious cases where xz isn't the best in terms of compression efficiency :)

Xz has been imported to FreeBSD base as one of the default compressors alongside compress, gzip and bzip2 and it looked like it would be excellent (due to its very large dictionary) for compressing highly redundant data like web logs.

Of course, this test case probably simply runs into an edge case, xz should wiin overall.

#3 Re: What to use for log compression?

Added on 2010-07-15T16:08 by voretaq7

My inclination would be to stay with bzip2 -- I find it is a reasonable compromise between speed (compression/decompression), ease of use (bzcat to review the log files without having to uncompress them to disk) & size (on average pretty small).

It also doesn't beat the server up too much in terms of CPU/memory utilization when rolling a bunch of logs @T00 which is an important consideration for some of my systems.

 

xz compression has the potential to be better than bzip2 (space-wise), but at the expense of a more resource- and time-intensive process, which isn't what I look for in my log compression (YMMV) :-)

#4 Re: What to use for log compression?

Added on 2010-07-15T17:48 by jlaffaye

I'd say gzip when you want to compress "on the fly" due to its good performance (ala zfs).

Bzip2 when space is important and you can afford more cpu cycles.

Lzma/xz when it is a "compress one time, uncompress x times" strategy (therefore not for logs, but say, for packages? ;p) because uncompressing xz is far faster than bzip2

#5 Re: What to use for log compression?

Added on 2010-07-15T18:58 by Adam Stylinski

Jlaffaye: syslog in freebsd already compresses the logs by default for most logs (except apache's), so having zfs's gzip underneath of syslog's gzip would kind add storage overhead.  There's probably a way to turn off syslog's compression rotation with a simple configuration for either syslog or periodic that I don't know about, though.  

#6 Re: What to use for log compression?

Added on 2010-07-20T05:31 by edogawaconan

Adam: use "-" flag (instead of J or Z) in newsyslog.conf to skip compression.

#7 Re: What to use for log compression?

Added on 2010-07-22T00:36 by db

After doing some evaluation for myself I started using both ppmd-7z and 7z (manually selecting ppmd by command line option; both available in ports).  I started with ppmd via 7z a couple years ago and then wrote new scripts that utilized ppmd-7z instead after I found it.  It's certainly not the best option for everybody, but on my systems it's fast enough and the compression ratios I'm getting are insane.  It allows me to keep logs around for much longer.

Post your comment here!

Your name:
Comment title:
Text:
Type "xxx" here:

Comments are subject to moderation and will be deleted if deemed inappropriate. All content is © Ivan Voras. Comments are owned by their authors... who agree to basically surrender all rights by publishing them here :)