Lzip benchmark

Lzip has been designed, written and tested with great care to be the standard general-purpose compressor for unix-like systems. In this page you can find some (totally unscientific[1]) tests comparing (de)compression speeds and sizes of gzip, bzip2 and lzip. In short, lzip is the perfect replacement for most uses of gzip and bzip2. It can be about as fast as gzip or can compress more than bzip2 (but not at the same time).

Lzip is probably the best compressor for local online documentation (texinfo manuals and man pages). It produces, on average, compressed texinfo manuals a 19% smaller and man pages a 6% smaller than gzip without noticeable differences in decompression speed. It also requires very little memory to decompress. For example, 'lzip.1.lz' is decompressed on my machine in 2 ms using 82 kB of RAM.

In the tests below, times are measured compressing or decompressing from RAM to /dev/null on an idle machine and taking the best of three trials.

The compressors tested are:
gzip-1.8
bzip2-1.0.6
lzip-1.18

The files tested are:
cantrbry.tar
gcc-4.7.2.tar
gmp-5.0.1.tar
hawaii-c                Digital elevation map (DEM) of Hawaii from the
                        USGS database, "HAWAII - C HI NE05-01W"
icecat-3.5.3-x86.tar
solfege-3.14.6.tar

Decompression speed

Gzip decompresses faster than lzip, but lzip is fast enough that when reading from, or writing to, storage media the speed difference is significantly reduced. Using multimember files, plzip can decompress as fast as gzip using just two processors. (This test does not preload the compressed files into RAM):

$ ls -go linux-libre-3.12.5-gnu.tar*
-rw-r--r-- 1 535347200 Dec 12  2013 linux-libre-3.12.5-gnu.tar
-rw-r--r-- 1 112399638 Dec 12  2013 linux-libre-3.12.5-gnu.tar.gz
-rw-r--r-- 1  74330266 Dec 12  2013 linux-libre-3.12.5-gnu.tar.lz
-rw-r--r-- 1  75127916 Dec 12  2013 linux-libre-3.12.5-gnu-mm.tar.lz

$ time gzip -t linux-libre-3.12.5-gnu.tar.gz
real    0m5.271s
user    0m3.430s
sys     0m0.190s

$ time lzip -t linux-libre-3.12.5-gnu.tar.lz
real    0m7.782s
user    0m7.730s
sys     0m0.050s

$ time plzip -t linux-libre-3.12.5-gnu-mm.tar.lz
real    0m5.473s
user    0m8.810s
sys     0m0.240s

Lzip vs gzip

The following table shows that "lzip -0" is comparable both on compression ratio and compression speed with gzip's default compression level. Lzip decompression is slower here that in the bzip2 table below because it speeds up with compression ratio.

file       cantrbry        gcc       gmp  hawaii-c    icecat   solfege
size        2821120  529940480  12687360   9840640  32419840  15964160

gzip
  size       739064  107838136   2652995   1772672  12085988   3606713
  time       0.217s    23.652s    0.582s    0.852s    2.521s    0.571s
  time -d    0.023s     3.452s    0.087s    0.067s    0.304s    0.097s

lzip -0
  size       589704  106353465   2607594   1595903  11234174   3321985
  time       0.147s    22.820s    0.558s    0.396s    2.232s    0.721s
  time -d    0.062s    10.137s    0.247s    0.165s    1.055s    0.337s

Lzip vs bzip2

Bzip2, having an algorithm very different from those of gzip and lzip, is more difficult to match. "lzip -3" seems to be the closest replacement for "bzip2 -9", even if variations between the two are notable. Note that lzip decompresses about 3 times faster than bzip2.

file       cantrbry        gcc       gmp  hawaii-c    icecat   solfege
size        2821120  529940480  12687360   9840640  32419840  15964160

bzip2 -9
  size       570856   82994239   2006109    708873  11162963   3047512
  time       0.385s      2m10s    2.890s    6.735s    7.519s    7.871s
  time -d    0.139s    27.817s    0.673s    0.563s    2.431s    0.802s

lzip -3
  size       519202   86981371   2063379   1261624  10070541   2811227
  time       0.773s      1m58s    2.766s    1.744s   13.454s    2.633s
  time -d    0.059s     8.593s    0.201s    0.131s    0.944s    0.296s

Compression ratio

Lzip goes beyond gzip and bzip2 on compression ratio. Here is the complete range of compressed sizes produced from the files above.

file       cantrbry        gcc       gmp  hawaii-c    icecat   solfege
size        2821120  529940480  12687360   9840640  32419840  15964160

gzip -9      736221  106713313   2632726   1574440  12041750   3561394
bzip2 -9     570856   82994239   2006109    708873  11162963   3047512

lzip -0      589704  106353465   2607594   1595903  11234174   3321985
lzip -1      583538  100356610   2393196   1664538  10881157   2977186
lzip -2      554104   94105269   2254242   1564685  10529621   2900473
lzip -3      519202   86981371   2063379   1261624  10070541   2811227
lzip -4      498387   78981585   1877795   1133250   9596939   2758507
lzip -5      488380   73417637   1748853    905509   9323190   2625566
lzip -6      486875   68613820   1679403    749030   9118578   2583494
lzip -7      484132   63126266   1653092    710414   9046260   2477311
lzip -8      482663   61649754   1642662    702582   8980468   2462646
lzip -9      481413   60880185   1639086    700891   8975976   2455262

Lzip vs xz

Xz has a complex format, partially specialized in the compression of executables and designed to be extended by proprietary formats. Of the four compressors tested here, xz is the only one alien to the Unix concept of "doing one thing and doing it well". It is inadequate for data sharing and also for long-term archiving.

In general, the more complex the format, the less probable that it can be decoded in the future. But the xz format, just as its infamous predecessor lzma-alone, is specially badly designed. Xz copies almost all the defects of gzip and then adds some more. Using xz for anything other than compressing short-lived executables is not advisable. And even for this use one must be careful to choose the right integrity check.

Don't interpret me wrong. I am very grateful to Igor Pavlov for inventing/discovering LZMA, but xz is the third attempt of his followers to take advantage of the popularity of 7-zip and replace gzip and bzip2 with inappropriate or badly designed formats. In particular, it is regrettable that support for lzma-alone was implemented in both GNU and Linux.

But some users have asked about how does lzip compare with xz, so I have added some tests.

First test: Lzip compresses tarballs more than xz.

I have downloaded the latest version of all the projects I could find in ftp.gnu.org that are distributing xz tarballs but are not (yet) distributing lzip tarballs. Below is a directory listing containing the downloaded tar.xz files and their tar.lz versions produced with "lzip -9". [Note: since 2015-07-08 I just add or remove projects to this list as needed. Keeping all projects updated to the latest version is too much work.]

Total size of tar.lz files = 146076183 bytes.
Total size of tar.xz files = 150702548 bytes.

More than 3% of bandwidth could be saved if only the maintainers of these projects changed their automake setting from "dist-xz" to "dist-lzip".

Note that each and every one of the 44 tar.lz files is smaller than its tar.xz version. This shows the real value of buzzwords like LZMA2. ;-)

The case of glibc-2.20.tar.lz is specially interesting. Lzip compressed it better than xz in spite of xz being invoked with the non-standard "--extreme" option and using twice the RAM as lzip.

-rw-r--r-- 1  1209917 Apr 25  2012 autoconf-2.69.tar.lz
-rw-r--r-- 1  1214744 Apr 25  2012 autoconf-2.69.tar.xz
-rw-r--r-- 1   611691 Mar 20 13:27 autoconf-archive-2016.03.20.tar.lz
-rw-r--r-- 1   613612 Mar 20 13:27 autoconf-archive-2016.03.20.tar.xz
-rw-r--r-- 1  1014605 Aug 30  2014 autogen-5.18.4.tar.lz
-rw-r--r-- 1  1017936 Aug 30  2014 autogen-5.18.4.tar.xz
-rw-r--r-- 1  1485345 Dec 24  2013 automake-1.14.1.tar.lz
-rw-r--r-- 1  1488984 Dec 24  2013 automake-1.14.1.tar.xz
-rw-r--r-- 1   584053 Mar 30  2013 barcode-0.99.tar.lz
-rw-r--r-- 1   586028 Mar 30  2013 barcode-0.99.tar.xz
-rw-r--r-- 1  1849101 Dec  5  2013 bison-3.0.2.tar.lz
-rw-r--r-- 1  1927296 Dec  5  2013 bison-3.0.2.tar.xz
-rw-r--r-- 1   183037 May 12  2015 bool-0.2.2.tar.lz
-rw-r--r-- 1   183576 May 12  2015 bool-0.2.2.tar.xz
-rw-r--r-- 1   521980 Oct 11  2011 cflow-1.4.tar.lz
-rw-r--r-- 1   526880 Oct 11  2011 cflow-1.4.tar.xz
-rw-r--r-- 1   793511 Aug  1  2013 combine-0.4.0.tar.lz
-rw-r--r-- 1   794716 Aug  1  2013 combine-0.4.0.tar.xz
-rw-r--r-- 1   399832 Nov  2  2013 complexity-1.1.tar.lz
-rw-r--r-- 1   401220 Nov  2  2013 complexity-1.1.tar.xz
-rw-r--r-- 1  5364984 Jul 19  2014 coreutils-8.23.tar.lz
-rw-r--r-- 1  5375612 Jul 19  2014 coreutils-8.23.tar.xz
-rw-r--r-- 1   513690 Mar 16  2013 cppi-1.18.tar.lz
-rw-r--r-- 1   515664 Mar 16  2013 cppi-1.18.tar.xz
-rw-r--r-- 1  1423335 Mar  4  2012 dico-2.2.tar.lz
-rw-r--r-- 1  1445224 Mar  4  2012 dico-2.2.tar.xz
-rw-r--r-- 1  1192566 Mar 24  2013 diffutils-3.3.tar.lz
-rw-r--r-- 1  1197832 Mar 24  2013 diffutils-3.3.tar.xz
-rw-r--r-- 1 34000192 Mar 11  2013 emacs-24.3.tar.lz
-rw-r--r-- 1 35565352 Mar 11  2013 emacs-24.3.tar.xz
-rw-r--r-- 1  1464600 Jun  2  2010 gcal-3.6.tar.lz
-rw-r--r-- 1  1516104 Jun  2  2010 gcal-3.6.tar.xz
-rw-r--r-- 1 13824725 Mar  4  2012 gcide-0.51.tar.lz
-rw-r--r-- 1 14343984 Mar  4  2012 gcide-0.51.tar.xz
-rw-r--r-- 1 17512274 Jul 29  2014 gdb-7.8.tar.lz
-rw-r--r-- 1 17664316 Jul 29  2014 gdb-7.8.tar.xz
-rw-r--r-- 1 12267027 Sep  7  2014 glibc-2.20.tar.lz
-rw-r--r-- 1 12283992 Sep  7  2014 glibc-2.20.tar.xz
-rw-r--r-- 1 14275212 Jan  1  2013 gnu-ghostscript-9.06.0.tar.lz
-rw-r--r-- 1 15659620 Jan  1  2013 gnu-ghostscript-9.06.0.tar.xz
-rw-r--r-- 1   702944 Mar 23  2014 gnu-pw-mgr-1.2.tar.lz
-rw-r--r-- 1   705448 Mar 23  2014 gnu-pw-mgr-1.2.tar.xz
-rw-r--r-- 1  1232290 Jun  3  2014 grep-2.20.tar.lz
-rw-r--r-- 1  1237196 Jun  3  2014 grep-2.20.tar.xz
-rw-r--r-- 1  5058491 Jun 28  2012 grub-2.00.tar.lz
-rw-r--r-- 1  5136412 Jun 28  2012 grub-2.00.tar.xz
-rw-r--r-- 1   925244 Aug 12  2014 gtypist-2.9.5.tar.lz
-rw-r--r-- 1   929356 Aug 12  2014 gtypist-2.9.5.tar.xz
-rw-r--r-- 1   722360 Jun 10  2013 gzip-1.6.tar.lz
-rw-r--r-- 1   725084 Jun 10  2013 gzip-1.6.tar.xz
-rw-r--r-- 1   157458 Jul 26  2014 help2man-1.46.1.tar.lz
-rw-r--r-- 1   158796 Jul 26  2014 help2man-1.46.1.tar.xz
-rw-r--r-- 1   997137 Feb  3  2012 idutils-4.6.tar.lz
-rw-r--r-- 1  1001496 Feb  3  2012 idutils-4.6.tar.xz
-rw-r--r-- 1  1327183 Jan 13  2014 inetutils-1.9.2.tar.lz
-rw-r--r-- 1  1331608 Jan 13  2014 inetutils-1.9.2.tar.xz
-rw-r--r-- 1   854695 Oct 18  2011 libtool-2.4.2.tar.lz
-rw-r--r-- 1   868760 Oct 18  2011 libtool-2.4.2.tar.xz
-rw-r--r-- 1  1860152 Jul  8  2015 libunistring-0.9.6.tar.lz
-rw-r--r-- 1  1960488 Jul  8  2015 libunistring-0.9.6.tar.xz
-rw-r--r-- 1  1144715 Sep 22  2013 m4-1.4.17.tar.lz
-rw-r--r-- 1  1149088 Sep 22  2013 m4-1.4.17.tar.xz
-rw-r--r-- 1  2059431 Sep  8  2010 mailutils-2.2.tar.lz
-rw-r--r-- 1  2268636 Sep  8  2010 mailutils-2.2.tar.xz
-rw-r--r-- 1  1071213 Mar 13  2013 mpfr-3.1.2.tar.lz
-rw-r--r-- 1  1074388 Mar 13  2013 mpfr-3.1.2.tar.xz
-rw-r--r-- 1  1145822 Jul 16  2011 myserver-0.11.tar.lz
-rw-r--r-- 1  1176472 Jul 16  2011 myserver-0.11.tar.xz
-rw-r--r-- 1  1591986 Jul 29  2014 parted-3.2.tar.lz
-rw-r--r-- 1  1655244 Jul 29  2014 parted-3.2.tar.xz
-rw-r--r-- 1   672538 Sep 12  2012 patch-2.7.tar.lz
-rw-r--r-- 1   674544 Sep 12  2012 patch-2.7.tar.xz
-rw-r--r-- 1   593559 Jul  7  2010 rush-1.7.tar.lz
-rw-r--r-- 1   600248 Jul  7  2010 rush-1.7.tar.xz
-rw-r--r-- 1  1161654 Jan  4 09:44 sed-4.3.tar.lz
-rw-r--r-- 1  1167168 Jan  4 09:44 sed-4.3.tar.xz
-rw-r--r-- 1  1082263 Oct 19  2013 sharutils-4.14.tar.lz
-rw-r--r-- 1  1089052 Oct 19  2013 sharutils-4.14.tar.xz
-rw-r--r-- 1  3443366 Apr  8  2013 smalltalk-3.2.5.tar.lz
-rw-r--r-- 1  3513508 Apr  8  2013 smalltalk-3.2.5.tar.xz
-rw-r--r-- 1  1938267 Jul 27  2014 tar-1.28.tar.lz
-rw-r--r-- 1  1966884 Jul 27  2014 tar-1.28.tar.xz
-rw-r--r-- 1   222604 Jun 12  2013 teseq-1.1.tar.lz
-rw-r--r-- 1   223360 Jun 12  2013 teseq-1.1.tar.xz
-rw-r--r-- 1  3960770 Jun 26  2015 texinfo-6.0.tar.lz
-rw-r--r-- 1  4086712 Jun 26  2015 texinfo-6.0.tar.xz
-rw-r--r-- 1  1658364 Jan 19  2014 wget-1.15.tar.lz
-rw-r--r-- 1  1679908 Jan 19  2014 wget-1.15.tar.xz

Second test: Lzip compresses large tarballs more than xz.

"xz -9" uses a dictionary size twice as large as "lzip -9" (and twice as large as "lzma -9"). This makes it appear as if xz could compress large files a little more than lzip. To find the truth just pass to lzip the arguments equivalent to those of "xz -9" (or to xz the arguments equivalent to those of "lzip -9"), and lzip will usually compress more than xz:

  linux-libre-3.12.5-gnu.tar (size 535347200)
  "lzip -m64 -s64MiB"               74192464   9m18s
  "xz -9"                           74306080   9m 7s

  "lzip -9"                         74330266  10m55s
  "xz --lzma2=nice=273,dict=32MiB"  74563636  10m15s

Note that using plain "-9" on both compressors, lzip usually compresses large files about as much as xz, but using half the RAM and requiring half the RAM to decompress.

(This test was made using xz-5.2.1).

Third test: Lunzip decompresses faster than unxz in busybox.

If your unxz applet in busybox seems to decompress faster than the lunzip applet, it may be because you are trying to decompress standard xz files, whose integrity the unxz applet can't verify. Creating a xz file with the correct check type for the unxz applet (CRC32) usually makes it decompress slower than lunzip:

  "busybox unxz -t linux-libre-3.12.5-gnu.tar.xz"        8.331s
  "busybox lunzip -t linux-libre-3.12.5-gnu.tar.lz"      8.714s
  "busybox unxz -t linux-libre-3.12.5-gnu.tar.crc32.xz"  9.723s

Note that error detection in the xz format is silently broken. Section 2.1.1.2 'Stream Flags' of the xz format specification allows decompressors to produce garbage output without issuing any warning. It is unsafe to decompress standard xz files with busybox; even unsafer than decompressing lzma-alone files. Corruption in compressed LZMA2 packets is detected about as unsafely as in lzma-alone, but the integrity of the uncompressed LZMA2 packets can't be verified at all, making corruption undetectable in a potentially large fraction of the file.

For example, sampling the files in the first test above with unzcrash[2] at 16 KiB intervals has found that unxz does not detect corruption in the following fractions:

  automake-1.14.1.tar.xz           4%
  combine-0.4.0.tar.xz            15%
  emacs-24.3.tar.xz              0.7%
  gcide-0.51.tar.xz               25%
  gnu-ghostscript-9.06.0.tar.xz   10%
  grub-2.00.tar.xz                 3%
  octave-4.0.0.tar.xz            3.5%
  texinfo-6.0.tar.xz               3%

A quick search revealed some more xz files unsafe for busybox:

  cairo-1.14.6.tar.xz                                43%
  firefox-47.0.1.source.tar.xz                       14%
  firefox-kde-opensuse-47.0.1-1-x86_64.pkg.tar.xz    26%
  gimp-2.8.18-i586-1.txz                            1.4%
  gtk+-3.21.4.tar.xz                                1.4%
  libvorbis-1.3.5.tar.xz                            9.6%
  linux-3.16.35.tar.xz                              0.3%
  MPlayer-1.2_20160125-i586-3.txz                   1.3%
  php-7.0.9.tar.xz                                    5%
  Python-3.5.2.tar.xz                               4.6%
  ruby-2.3.1.tar.xz                                 2.6%

And here is a quick test of the (lack of) interoperability between xz-utils and xz-embedded (busybox):

# First create a xz file containing an uncompressed LZMA2 chunk.

$ echo 'The quick brown fox jumps over the lazy dog.' | xz > fox.xz

# Now open fox.xz with an hex editor and modify any character in the
# sentence above (which xz stores uncompressed). When you try to
# decompress the modified file you'll notice that xz-utils detects the
# corruption, but busybox's xz does not:

$ xz -t fox.xz
xz: fox.xz: Compressed data is corrupt
$ echo $?
1
$ busybox unxz -t fox.xz
$ busybox unxz -cd fox.xz
The quick brown fox jumps over the lazy fog.
$ echo $?
0

If you download xz-compressed tarballs and then decompress them with busybox (or with any other xz-embedded implementation), please, write to the maintainers of the corresponding projects requesting them to either use --check=crc32 when creating the tarballs or switch to a safe-by-default compressed format, like lzip, bzip2 or gzip.

(This test was made using busybox-1.25.0 with lzip support).

Notes

[1] Paraphrasing John von Neumann, there's no sense in being precise when you don't even know what kind of hardware or compiler will use the person reading this. But in case you need a reference, this test was run on an AMD Athlon 64 X2 Dual Core Processor 5200+ running in 64 bit mode, and lzip was compiled out of the box with gcc-6.1.0.

[2] The unzcrash tool is included in the lziprecover package.


Copyright © 2017 Antonio Diaz Diaz.

You are free to copy, modify and distribute all or part of this article without limitation.

Updated: 2017-03-17