Lzip benchmark

Lzip has been designed, written, and tested with great care to replace gzip and bzip2 as general-purpose compressed format for Unix-like systems. In this page you can find some (totally unscientific[1]) tests comparing (de)compression speeds and sizes of gzip, bzip2, and lzip.

Lzip is probably the best compressor for local online documentation (texinfo manuals and man pages). It produces, on average, compressed texinfo manuals a 19% smaller and man pages a 6% smaller than gzip without noticeable differences in decompression speed. It also requires little memory to decompress. For example, 'lzip.1.lz' is decompressed on my machine in 2 ms using 82 kB of RAM.

In the tests below, times are measured compressing or decompressing from RAM to /dev/null on an idle machine and taking the best of three trials.

The compressors tested are:
gzip-1.11
bzip2-1.0.6
lzip-1.23

The files tested are:
cantrbry.tar
gcc-4.7.2.tar
gmp-5.0.1.tar
hawaii-c                Digital elevation map (DEM) of Hawaii from the
                        USGS database, "HAWAII - C HI NE05-01W"
icecat-3.5.3-x86.tar
solfege-3.14.6.tar

Listing file sizes

The bzip2 format does not store neither the uncompressed nor the compressed size of each block in the file. Therefore it can't list the file sizes. Bzip2 does not even provide a '--list' option.

The gzip format only stores the uncompressed size truncated to 32 bits. Therefore it can only list accurately files with an uncompressed size smaller than 4 GiB, and with at most one non-empty member. In all other cases, the uncompressed size reported is wrong:

$ gzip-1.11 --list 4GiBzeros.gz
         compressed        uncompressed  ratio uncompressed_name
            4168175                   0   0.0% 4GiBzeros

Latest versions of the gzip tool try to overcome this limitation of the format by making '--list' decompress the file and report the full sizes. This works, and for small files it is not very inefficient. But for files of about 4 GiB of uncompressed size it is about 10_000 times slower than reading the sizes from the member trailer (21 seconds instead of 2 ms):

$ gzip-1.13 --list 4GiBzeros.gz
         compressed        uncompressed  ratio uncompressed_name
            4168175          4294967296  99.9% 4GiBzeros

The lzip format provides a distributed index with 64-bit fields allowing it to efficiently print correct uncompressed and compressed sizes even for multimember files. Lzip provides an efficient and reliable '--list' option:

$ lzip --list 2x4GiBzeros.lz 4GiBzeros.lz
  uncompressed     compressed   saved  name
    8589934592        1211910  99.99%  2x4GiBzeros.lz
    4294967296         605955  99.99%  4GiBzeros.lz
   12884901888        1817865  99.99%  (totals)

Decompression speed

Gzip decompresses most files faster than lzip, but lzip is fast enough that when reading from, or writing to, storage media the speed difference is significantly reduced. Thanks to its distributed index, multimember lzip files can be decompressed in parallel, allowing plzip to decompress as fast as gzip when plzip uses two processors, or faster if more processors are available. (This test does not preload the compressed files into RAM):

$ ls -go linux-libre-3.12.5-gnu.tar*
-rw-r--r-- 1 535347200 Dec 12  2013 linux-libre-3.12.5-gnu.tar
-rw-r--r-- 1 112399638 Dec 12  2013 linux-libre-3.12.5-gnu.tar.gz
-rw-r--r-- 1  74330266 Dec 12  2013 linux-libre-3.12.5-gnu.tar.lz
-rw-r--r-- 1  75127916 Dec 12  2013 linux-libre-3.12.5-gnu-mm.tar.lz

$ time gzip -t linux-libre-3.12.5-gnu.tar.gz
real    0m5.295s
user    0m3.580s
sys     0m0.050s

$ time lzip -t linux-libre-3.12.5-gnu.tar.lz
real    0m7.227s
user    0m7.080s
sys     0m0.070s

$ time plzip -t linux-libre-3.12.5-gnu-mm.tar.lz
real    0m5.642s
user    0m8.450s
sys     0m0.220s

Lzip vs gzip

The following table shows that 'lzip -0' is comparable both on compression ratio and compression speed with gzip's default compression level. Lzip decompression is slower here that in the bzip2 table below because it speeds up with compression ratio.

file       cantrbry        gcc       gmp  hawaii-c    icecat   solfege
size        2821120  529940480  12687360   9840640  32419840  15964160

gzip
  size       739064  107838136   2652995   1772672  12085988   3606713
  time       0.212s    23.112s    0.568s    0.843s    2.457s    0.556s
  time -d    0.023s     3.457s    0.087s    0.067s    0.305s    0.097s

lzip -0
  size       589704  106353465   2607594   1595903  11234174   3321985
  time       0.144s    22.265s    0.543s    0.384s    2.196s    0.699s
  time -d    0.057s     8.939s    0.221s    0.149s    0.921s    0.284s

Lzip vs bzip2

Bzip2, having an algorithm very different from those of gzip and lzip, is more difficult to match. 'lzip -3' seems to be the closest replacement for 'bzip2 -9', even if variations between the two are notable. Note that lzip decompresses about 3 times faster than bzip2.

file       cantrbry        gcc       gmp  hawaii-c    icecat   solfege
size        2821120  529940480  12687360   9840640  32419840  15964160

bzip2 -9
  size       570856   82994239   2006109    708873  11162963   3047512
  time       0.383s      2m10s    2.882s    6.712s    7.488s    7.859s
  time -d    0.138s    27.838s    0.673s    0.548s    2.430s    0.801s

lzip -3
  size       519202   86981371   2063379   1261624  10070541   2811227
  time       0.760s      1m55s    2.716s    1.712s   13.197s    2.576s
  time -d    0.054s     7.710s    0.184s    0.122s    0.848s    0.256s

Compression ratio

Lzip goes beyond gzip and bzip2 on compression ratio. Here is the complete range of compressed sizes produced from the files above.

file       cantrbry        gcc       gmp  hawaii-c    icecat   solfege
size        2821120  529940480  12687360   9840640  32419840  15964160

gzip -9      736221  106713313   2632726   1574440  12041750   3561394
bzip2 -9     570856   82994239   2006109    708873  11162963   3047512

lzip -0      589704  106353465   2607594   1595903  11234174   3321985
lzip -1      583538  100356610   2393196   1664538  10881157   2977186
lzip -2      554104   94105269   2254242   1564685  10529621   2900473
lzip -3      519202   86981371   2063379   1261624  10070541   2811227
lzip -4      498387   78981585   1877795   1133250   9596939   2758507
lzip -5      488380   73417637   1748853    905509   9323190   2625566
lzip -6      486875   68613820   1679403    749030   9118578   2583494
lzip -7      484132   63126266   1653092    710414   9046260   2477311
lzip -8      482663   61649754   1642662    702582   8980468   2462646
lzip -9      481413   60880185   1639086    700891   8975976   2455262

Lzip vs xz

Xz has a complex format, partially specialized in the compression of executables and designed to be extended by proprietary formats. Of the four compressors tested here, xz is the only one alien to the Unix concept of "doing one thing and doing it well". It is inadequate for long-term archiving and inadvisable for data sharing and for free software distribution. If you are distributing software in xz format, please consider using lzip instead. See Xz format inadequate for long-term archiving

Xz is poorly designed and should not be used, but some users have asked how does lzip compare with xz, so I have added some tests.

First test: Lzip compresses tarballs more than xz.

I have downloaded the latest version of all the projects I could find in ftp.gnu.org that are distributing xz tarballs but are not yet distributing lzip tarballs. Below is a directory listing containing the downloaded tar.xz files and their tar.lz versions produced with 'lzip -9'. [Note: since 2015-07-08 I just add or remove projects to this list as needed. Keeping all projects updated to the latest version is too much work.]

Total size of tar.lz files = 228_324_089 bytes.
Total size of tar.xz files = 233_953_780 bytes.

More than 2% of bandwidth could be saved if only the maintainers of these projects changed their automake setting from 'dist-xz' to 'dist-lzip'.

Note that each and every one of the 50 tar.lz files is smaller than its tar.xz version. The case of glibc-2.20.tar.lz is specially interesting. Lzip compressed it better than xz in spite of xz being invoked with the non-standard '--extreme' option and using twice the RAM as lzip.

-rw-r--r-- 1  1209917 Apr 25  2012 autoconf-2.69.tar.lz
-rw-r--r-- 1  1214744 Apr 25  2012 autoconf-2.69.tar.xz
-rw-r--r-- 1   611691 Mar 20  2016 autoconf-archive-2016.03.20.tar.lz
-rw-r--r-- 1   613612 Mar 20  2016 autoconf-archive-2016.03.20.tar.xz
-rw-r--r-- 1  1014605 Aug 30  2014 autogen-5.18.4.tar.lz
-rw-r--r-- 1  1017936 Aug 30  2014 autogen-5.18.4.tar.xz
-rw-r--r-- 1  1485345 Dec 24  2013 automake-1.14.1.tar.lz
-rw-r--r-- 1  1488984 Dec 24  2013 automake-1.14.1.tar.xz
-rw-r--r-- 1   584053 Mar 30  2013 barcode-0.99.tar.lz
-rw-r--r-- 1   586028 Mar 30  2013 barcode-0.99.tar.xz
-rw-r--r-- 1   183037 May 12  2015 bool-0.2.2.tar.lz
-rw-r--r-- 1   183576 May 12  2015 bool-0.2.2.tar.xz
-rw-r--r-- 1   521980 Oct 11  2011 cflow-1.4.tar.lz
-rw-r--r-- 1   526880 Oct 11  2011 cflow-1.4.tar.xz
-rw-r--r-- 1   793511 Aug  1  2013 combine-0.4.0.tar.lz
-rw-r--r-- 1   794716 Aug  1  2013 combine-0.4.0.tar.xz
-rw-r--r-- 1   399832 Nov  2  2013 complexity-1.1.tar.lz
-rw-r--r-- 1   401220 Nov  2  2013 complexity-1.1.tar.xz
-rw-r--r-- 1  5364984 Jul 19  2014 coreutils-8.23.tar.lz
-rw-r--r-- 1  5375612 Jul 19  2014 coreutils-8.23.tar.xz
-rw-r--r-- 1   513690 Mar 16  2013 cppi-1.18.tar.lz
-rw-r--r-- 1   515664 Mar 16  2013 cppi-1.18.tar.xz
-rw-r--r-- 1  1423335 Mar  4  2012 dico-2.2.tar.lz
-rw-r--r-- 1  1445224 Mar  4  2012 dico-2.2.tar.xz
-rw-r--r-- 1  1192566 Mar 24  2013 diffutils-3.3.tar.lz
-rw-r--r-- 1  1197832 Mar 24  2013 diffutils-3.3.tar.xz
-rw-r--r-- 1 34000192 Mar 11  2013 emacs-24.3.tar.lz
-rw-r--r-- 1 35565352 Mar 11  2013 emacs-24.3.tar.xz
-rw-r--r-- 1  1887433 Aug 29  2019 findutils-4.7.0.tar.lz
-rw-r--r-- 1  1895048 Aug 29  2019 findutils-4.7.0.tar.xz
-rw-r--r-- 1  1464600 Jun  2  2010 gcal-3.6.tar.lz
-rw-r--r-- 1  1516104 Jun  2  2010 gcal-3.6.tar.xz
-rw-r--r-- 1 75482104 Jul  5  2017 gcc-6.4.0.tar.lz
-rw-r--r-- 1 76156220 Jul  5  2017 gcc-6.4.0.tar.xz
-rw-r--r-- 1 13824725 Mar  4  2012 gcide-0.51.tar.lz
-rw-r--r-- 1 14343984 Mar  4  2012 gcide-0.51.tar.xz
-rw-r--r-- 1 17512274 Jul 29  2014 gdb-7.8.tar.lz
-rw-r--r-- 1 17664316 Jul 29  2014 gdb-7.8.tar.xz
-rw-r--r-- 1   582706 Jun  4  2019 gengetopt-2.23.tar.lz
-rw-r--r-- 1   584860 Jun  4  2019 gengetopt-2.23.tar.xz
-rw-r--r-- 1 12267027 Sep  7  2014 glibc-2.20.tar.lz
-rw-r--r-- 1 12283992 Sep  7  2014 glibc-2.20.tar.xz
-rw-r--r-- 1 14275212 Jan  1  2013 gnu-ghostscript-9.06.0.tar.lz
-rw-r--r-- 1 15659620 Jan  1  2013 gnu-ghostscript-9.06.0.tar.xz
-rw-r--r-- 1   702944 Mar 23  2014 gnu-pw-mgr-1.2.tar.lz
-rw-r--r-- 1   705448 Mar 23  2014 gnu-pw-mgr-1.2.tar.xz
-rw-r--r-- 1  1232290 Jun  3  2014 grep-2.20.tar.lz
-rw-r--r-- 1  1237196 Jun  3  2014 grep-2.20.tar.xz
-rw-r--r-- 1  5058491 Jun 28  2012 grub-2.00.tar.lz
-rw-r--r-- 1  5136412 Jun 28  2012 grub-2.00.tar.xz
-rw-r--r-- 1  1134733 Oct  3  2015 gslip-1.0.2.tar.lz
-rw-r--r-- 1  1136628 Oct  3  2015 gslip-1.0.2.tar.xz
-rw-r--r-- 1   925244 Aug 12  2014 gtypist-2.9.5.tar.lz
-rw-r--r-- 1   929356 Aug 12  2014 gtypist-2.9.5.tar.xz
-rw-r--r-- 1   722360 Jun 10  2013 gzip-1.6.tar.lz
-rw-r--r-- 1   725084 Jun 10  2013 gzip-1.6.tar.xz
-rw-r--r-- 1   157458 Jul 26  2014 help2man-1.46.1.tar.lz
-rw-r--r-- 1   158796 Jul 26  2014 help2man-1.46.1.tar.xz
-rw-r--r-- 1   997137 Feb  3  2012 idutils-4.6.tar.lz
-rw-r--r-- 1  1001496 Feb  3  2012 idutils-4.6.tar.xz
-rw-r--r-- 1   613281 Sep  6  2018 indent-2.2.12.tar.lz
-rw-r--r-- 1   620280 Sep  6  2018 indent-2.2.12.tar.xz
-rw-r--r-- 1  1327183 Jan 13  2014 inetutils-1.9.2.tar.lz
-rw-r--r-- 1  1331608 Jan 13  2014 inetutils-1.9.2.tar.xz
-rw-r--r-- 1  4206085 Nov  8  2019 libredwg-0.9.2.tar.lz
-rw-r--r-- 1  4582968 Nov  8  2019 libredwg-0.9.2.tar.xz
-rw-r--r-- 1   854695 Oct 18  2011 libtool-2.4.2.tar.lz
-rw-r--r-- 1   868760 Oct 18  2011 libtool-2.4.2.tar.xz
-rw-r--r-- 1  1860152 Jul  8  2015 libunistring-0.9.6.tar.lz
-rw-r--r-- 1  1960488 Jul  8  2015 libunistring-0.9.6.tar.xz
-rw-r--r-- 1  1144715 Sep 22  2013 m4-1.4.17.tar.lz
-rw-r--r-- 1  1149088 Sep 22  2013 m4-1.4.17.tar.xz
-rw-r--r-- 1  2059431 Sep  8  2010 mailutils-2.2.tar.lz
-rw-r--r-- 1  2268636 Sep  8  2010 mailutils-2.2.tar.xz
-rw-r--r-- 1  1071213 Mar 13  2013 mpfr-3.1.2.tar.lz
-rw-r--r-- 1  1074388 Mar 13  2013 mpfr-3.1.2.tar.xz
-rw-r--r-- 1  1145822 Jul 16  2011 myserver-0.11.tar.lz
-rw-r--r-- 1  1176472 Jul 16  2011 myserver-0.11.tar.xz
-rw-r--r-- 1  1407848 Mar 31  2017 nano-2.8.0.tar.lz
-rw-r--r-- 1  1413796 Mar 31  2017 nano-2.8.0.tar.xz
-rw-r--r-- 1  1591986 Jul 29  2014 parted-3.2.tar.lz
-rw-r--r-- 1  1655244 Jul 29  2014 parted-3.2.tar.xz
-rw-r--r-- 1   672538 Sep 12  2012 patch-2.7.tar.lz
-rw-r--r-- 1   674544 Sep 12  2012 patch-2.7.tar.xz
-rw-r--r-- 1   593559 Jul  7  2010 rush-1.7.tar.lz
-rw-r--r-- 1   600248 Jul  7  2010 rush-1.7.tar.xz
-rw-r--r-- 1  1161654 Jan  4  2017 sed-4.3.tar.lz
-rw-r--r-- 1  1167168 Jan  4  2017 sed-4.3.tar.xz
-rw-r--r-- 1  1082263 Oct 19  2013 sharutils-4.14.tar.lz
-rw-r--r-- 1  1089052 Oct 19  2013 sharutils-4.14.tar.xz
-rw-r--r-- 1  3443366 Apr  8  2013 smalltalk-3.2.5.tar.lz
-rw-r--r-- 1  3513508 Apr  8  2013 smalltalk-3.2.5.tar.xz
-rw-r--r-- 1  2052694 Dec 17  2017 tar-1.30.tar.lz
-rw-r--r-- 1  2108028 Dec 17  2017 tar-1.30.tar.xz
-rw-r--r-- 1   222604 Jun 12  2013 teseq-1.1.tar.lz
-rw-r--r-- 1   223360 Jun 12  2013 teseq-1.1.tar.xz
-rw-r--r-- 1  3960770 Jun 26  2015 texinfo-6.0.tar.lz
-rw-r--r-- 1  4086712 Jun 26  2015 texinfo-6.0.tar.xz
-rw-r--r-- 1   326754 Jan 14  2017 vc-dwim-1.8.tar.lz
-rw-r--r-- 1   327492 Jan 14  2017 vc-dwim-1.8.tar.xz

Second test: Why 'xz -9' compresses large tarballs more than 'lzip -9'?

Because 'xz -9' uses a dictionary size twice as large as 'lzip -9' (and twice as large as 'lzma -9'). This makes it appear as if xz could compress large files a little more than lzip. But if you pass to lzip the arguments equivalent to those of 'xz -9' (or to xz the arguments equivalent to those of 'lzip -9'), lzip will usually compress more than xz:

  linux-libre-3.12.5-gnu.tar (size 535347200)
  "lzip -m64 -s64MiB"               74192464   9m16s
  "xz -9"                           74306080   9m 7s

  "lzip -9"                         74330266  10m53s
  "xz --lzma2=nice=273,dict=32MiB"  74563636  10m15s

Note that using plain '-9' on both compressors, lzip usually compresses large files about as much as xz, but using half the RAM and requiring half the RAM to decompress.

(This test was made using lzip-1.19 and xz-5.2.1).

Third test: Why unxz decompresses faster than lunzip in busybox?

Because xz-utils produces by default files whose integrity the unxz applet can't check. Creating a xz file with the correct check type for the unxz applet (CRC32) usually makes it decompress slower than lunzip:

  "busybox unxz -t linux-libre-3.12.5-gnu.tar.xz"        8.331s
  "busybox lunzip -t linux-libre-3.12.5-gnu.tar.lz"      8.714s
  "busybox unxz -t linux-libre-3.12.5-gnu.tar.crc32.xz"  9.723s

Note that error detection in the xz format is silently broken. Both xz-utils and the unxz applet ignore the recommendations of the xz format specification. Xz-utils uses by default an optional check type (CRC64) in the files it produces, preventing decompressors that do not support the optional check types from checking the integrity of the data. The unxz applet does not warn if it finds an unsupported check type, which greatly increases the probability of corruption going unnoticed. It is unsafe to decompress standard xz files with busybox; even unsafer than decompressing lzma-alone files. Corruption in compressed LZMA2 packets is detected about as unsafely as in lzma-alone, but the integrity of the uncompressed LZMA2 packets can't be checked at all, making corruption undetectable in a potentially large fraction of the file.

For example, sampling the files in the first test above with unzcrash[2] at 16 KiB intervals has found that unxz does not detect corruption in the following fractions of each file:

  automake-1.14.1.tar.xz           4%
  combine-0.4.0.tar.xz            15%
  emacs-24.3.tar.xz              0.7%
  gcc-6.4.0.tar.xz               0.2%
  gcide-0.51.tar.xz               25%
  gnu-ghostscript-9.06.0.tar.xz   10%
  grub-2.00.tar.xz                 3%
  octave-4.0.0.tar.xz            3.5%
  texinfo-6.0.tar.xz               3%

A quick search revealed some more xz files unsafe for busybox:

  cairo-1.14.6.tar.xz                                43%
  firefox-47.0.1.source.tar.xz                       14%
  firefox-kde-opensuse-47.0.1-1-x86_64.pkg.tar.xz    26%
  gimp-2.8.18-i586-1.txz                            1.4%
  gtk+-3.21.4.tar.xz                                1.4%
  libvorbis-1.3.5.tar.xz                            9.6%
  linux-3.16.35.tar.xz                              0.3%
  MPlayer-1.2_20160125-i586-3.txz                   1.3%
  php-7.0.9.tar.xz                                    5%
  Python-3.5.2.tar.xz                               4.6%
  ruby-2.3.1.tar.xz                                 2.6%

And here is a quick test of the lack of safe interoperability between xz-utils and xz-embedded (busybox):

# First create a xz file containing an uncompressed LZMA2 chunk.

$ echo 'The quick brown fox jumps over the lazy dog.' | xz > fox.xz

# Now open fox.xz with an hex editor and modify any character in the
# sentence above (which xz stores uncompressed). When you try to
# decompress the modified file you'll notice that xz-utils detects the
# corruption, but busybox's xz does not:

$ xz -t fox.xz
xz: fox.xz: Compressed data is corrupt
$ echo $?
1
$ busybox unxz -t fox.xz
$ busybox unxz -cd fox.xz
The quick brown fox jumps over the lazy fog.
$ echo $?
0

(This test was made using busybox-1.25.0 with lzip support).

Notes

[1] Paraphrasing John von Neumann, there's no sense in being precise when you don't even know what kind of hardware or compiler will use the person reading this. But in case you need a reference, this test was run on an AMD Athlon 64 X2 Dual Core Processor 5200+ running in 64 bit mode, and lzip was compiled out of the box with gcc-6.1.0.

[2] The unzcrash tool is included in the lziprecover package.


Copyright © 2024 Antonio Diaz Diaz.

You are free to copy, modify, and distribute all or part of this article without limitation.

Updated: 2024-11-23

This page does not use javascript.