So I was asked to look at compression on log servers and to see if changing to xz would save us some space. My test is not comprehensive but showed what might happen.
Basic summary. XZ may save us up to 2% over what we are currently saving but its real advantage is in speed of uncompressing files over bzip2. [compression may be faster for some files also.]
File | Size | Gzip | G% | Bunzip2 | B% | XZ | X% messages.log | 644568 | 10992 | 98.3 | 4856 | 99.3 | 5940 | 99.1 mail.log | 610816 | 65060 | 89.3 | 40836 | 93.3 | 35536 | 94.5 TOTAL | 1255384 | 76052 | 93.5 | 45692 | 96.1 | 41476 | 96.5
Program | Compression Time | Uncompression Time GZIP | 00m43.416s | 00m10.033s BZIP | 10m42.296s | 01m02.525s XZ | 10m15.937s | 00m12.565s
Raw data below
root@log01 smooge-b]# du -s messages.log mail.log 644568 messages.log 610816 mail.log [root@log01 smooge-b]# time gzip -v -9 messages.log mail.log messages.log: 98.3% -- replaced with messages.log.gz mail.log: 89.3% -- replaced with mail.log.gz
real 0m43.416s user 0m41.335s sys 0m1.736s [root@log01 smooge-b]# du -s messages.log.gz mail.log.gz 10992 messages.log.gz 65060 mail.log.gz [root@log01 smooge-b]# time gunzip -v messages.log.gz mail.log.gz messages.log.gz: 98.3% -- replaced with messages.log mail.log.gz: 89.3% -- replaced with mail.log
real 0m10.033s user 0m6.948s sys 0m3.004s
[root@log01 smooge-b]# time bzip2 -v -9 messages.log mail.log messages.log: 133.043:1, 0.060 bits/byte, 99.25% saved, 659381328 in, 4956148 out. mail.log: 14.961:1, 0.535 bits/byte, 93.32% saved, 624854215 in, 41766136 out.
real 10m42.296s user 10m36.948s sys 0m1.608s [root@log01 smooge-b]# du -sc messages.log.bz2 mail.log.bz2 4856 messages.log.bz2 40836 mail.log.bz2 45692 total [root@log01 smooge-b]# time bunzip2 -v messages.log.bz2 mail.log.bz2 messages.log.bz2: done mail.log.bz2: done
real 1m2.525s user 0m44.779s sys 0m4.956s
[root@log01 smooge-b]# time xz -v -9 messages.log mail.log messages.log (1/2) 100.0 % 5,923.6 KiB / 628.8 MiB = 0.009 3.1 MiB/s 3:21
mail.log (2/2) 100.0 % 34.7 MiB / 595.9 MiB = 0.058 1.4 MiB/s 6:53
real 10m15.937s user 10m8.550s sys 0m3.552s [root@log01 smooge-b]# du -s messages.log.xz mail.log.xz 5940 messages.log.xz 35536 mail.log.xz [root@log01 smooge-b]# time unxz -v messages.log.xz mail.log.xz messages.log.xz (1/2) 100.0 % 5,923.6 KiB / 628.8 MiB = 0.009 140 MiB/s 0:04
mail.log.xz (2/2) 100.0 % 34.7 MiB / 595.9 MiB = 0.058 74 MiB/s 0:08
real 0m12.565s user 0m8.709s sys 0m3.636s
On Thu, 26 Aug 2010, Stephen John Smoogen wrote:
So I was asked to look at compression on log servers and to see if changing to xz would save us some space. My test is not comprehensive but showed what might happen.
Basic summary. XZ may save us up to 2% over what we are currently saving but its real advantage is in speed of uncompressing files over bzip2. [compression may be faster for some files also.]
File | Size | Gzip | G% | Bunzip2 | B% | XZ | X% messages.log | 644568 | 10992 | 98.3 | 4856 | 99.3 | 5940 | 99.1 mail.log | 610816 | 65060 | 89.3 | 40836 | 93.3 | 35536 | 94.5 TOTAL | 1255384 | 76052 | 93.5 | 45692 | 96.1 | 41476 | 96.5
Program | Compression Time | Uncompression Time GZIP | 00m43.416s | 00m10.033s BZIP | 10m42.296s | 01m02.525s XZ | 10m15.937s | 00m12.565s
Raw data below
root@log01 smooge-b]# du -s messages.log mail.log 644568 messages.log 610816 mail.log [root@log01 smooge-b]# time gzip -v -9 messages.log mail.log messages.log: 98.3% -- replaced with messages.log.gz mail.log: 89.3% -- replaced with mail.log.gz
real 0m43.416s user 0m41.335s sys 0m1.736s [root@log01 smooge-b]# du -s messages.log.gz mail.log.gz 10992 messages.log.gz 65060 mail.log.gz [root@log01 smooge-b]# time gunzip -v messages.log.gz mail.log.gz messages.log.gz: 98.3% -- replaced with messages.log mail.log.gz: 89.3% -- replaced with mail.log
real 0m10.033s user 0m6.948s sys 0m3.004s
[root@log01 smooge-b]# time bzip2 -v -9 messages.log mail.log messages.log: 133.043:1, 0.060 bits/byte, 99.25% saved, 659381328 in, 4956148 out. mail.log: 14.961:1, 0.535 bits/byte, 93.32% saved, 624854215 in, 41766136 out.
real 10m42.296s user 10m36.948s sys 0m1.608s [root@log01 smooge-b]# du -sc messages.log.bz2 mail.log.bz2 4856 messages.log.bz2 40836 mail.log.bz2 45692 total [root@log01 smooge-b]# time bunzip2 -v messages.log.bz2 mail.log.bz2 messages.log.bz2: done mail.log.bz2: done
real 1m2.525s user 0m44.779s sys 0m4.956s
[root@log01 smooge-b]# time xz -v -9 messages.log mail.log messages.log (1/2) 100.0 % 5,923.6 KiB / 628.8 MiB = 0.009 3.1 MiB/s 3:21
mail.log (2/2) 100.0 % 34.7 MiB / 595.9 MiB = 0.058 1.4 MiB/s 6:53
real 10m15.937s user 10m8.550s sys 0m3.552s [root@log01 smooge-b]# du -s messages.log.xz mail.log.xz 5940 messages.log.xz 35536 mail.log.xz [root@log01 smooge-b]# time unxz -v messages.log.xz mail.log.xz messages.log.xz (1/2) 100.0 % 5,923.6 KiB / 628.8 MiB = 0.009 140 MiB/s 0:04
mail.log.xz (2/2) 100.0 % 34.7 MiB / 595.9 MiB = 0.058 74 MiB/s 0:08
real 0m12.565s user 0m8.709s sys 0m3.636s
It does take a while to grep through the bzipped logs. if you want to re-compress them all i say have at it.
-Mike
On Thu, Aug 26, 2010 at 17:44, Mike McGrath mmcgrath@redhat.com wrote:
On Thu, 26 Aug 2010, Stephen John Smoogen wrote:
So I was asked to look at compression on log servers and to see if changing to xz would save us some space. My test is not comprehensive but showed what might happen.
Basic summary. XZ may save us up to 2% over what we are currently saving but its real advantage is in speed of uncompressing files over bzip2. [compression may be faster for some files also.]
It does take a while to grep through the bzipped logs. if you want to re-compress them all i say have at it.
Ok I will look at it after I get the hardware call in tomorrow.
infrastructure@lists.fedoraproject.org