Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Compression Algorithms

News

See Also Recommended Books Recommended Links Recommended Papers Magazines eBooks FAQs eBooks
Huffman LZW LZH Bijective RLE JPEG History Humor Etc

These page is very incomplete and other pages give a much better overview of the various compression algorithms. The following types of compression are well documented elsewhere:

Bijective compression means that for any file X, F( F'( X ) ) == X. (F is either the compressor or decompressor, and F' is its opposite number.) This is definitely not the case for most conventional compression algorithms. This type of compression is important in crypto algorithms.

Compression always rely on non-randomness of the text: random sequence of bytes is poorly compressible. 

In general the compression can be viewed as a conversion of static text into a special program, In other words compression consists of converting a static test into some kind of code for a specially constructed (may be unique for this case) abstract machine.  Programs in this abstract machine are shorter then the original text while output of the program after processing by an interpreter (decompressor) is exactly the same (original) text.  For example if we first process the text and construct a dictionary of words used and then replace each word with the index to a dictionary that will be a primitive compression that might work well on text with a small dictionary and frequently repeated words. For example it might work well for the answer to some lengthy form like "Yes No n/a Yes No n/a n/a Yes No No No Yes n/a n/a No". In this case the dictionary consists of three entries and each reply can be encoded in two bits. 

It is important to understand that the more you know about the structure of the text, the more specialized processor you and construct and the higher level of compression you can achieve. For example, if you know that a particular text is, say, an http log then you can compress it several times better then using any generic algorithms.

Compression is also connected with the pattern matching. The notion of a longest common subsequence (LCS) of two strings is widely used to compare files and can be used in compression by splitting text into several parts if diffing them against each other. In the most primitive case such diffing can be based on lines or paragraphs. The "diff" command of UNIX system implement an algorithm of finding the shortest sequence of editing command able to convert one string to another. Informally, the result of a diff algorithm gives the minimum number of operations (insert a symbol, or delete a symbol) to transform one string into the other. This is related to an edit distance, called the Levenshtein distance, with the additional operation of substitution, and with weights associated to operations. Hirschberg (1975) presents the computation of the LCS in linear space.  This ideas can be very efficiently can be used for compression of hhtp logs (for example Apache logs) and, say, windows of the last 1K lines often contains at least one string that is very close to the current (in Levenshhtein metric) so that you can encode the whole string by the reference to the "etalon" string and a few operations needs to convert "etalon" to the current string.

In its turn, the search of LCS is connected with efficient string matching algorithms. The first discovered linear-time string-matching algorithm is from Morris and Pratt (1970). It has been improved by Knuth, Morris, and Pratt (1976). The search behaves like a recognition process by automaton, and a character of the text is compared to a character of the pattern no more than \log_\Phi(m+1) (\Phi is the golden ratio (1+\sqrt 5)/2).  The Boyer and Moore's algorithm (1977) is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it (or the entire algorithm) is often implemented in text editors for the "search" and "substitute" commands. Several variants of Boyer and Moore's algorithm avoid the quadratic behavior when searching for all occurrences of the pattern.  In 1975, Aho and Corasick designed an O(n\log\sigma) algorithm to solve this problem, with a running time independent of the number of patterns. It is implemented by the "fgrep" command under the UNIX operating system. In applications where the text is to be searched for several patterns, it is the text that needs to be preprocessed. Even if no further information is known on their syntactic structure, it is possible and indeed extremely efficient to built an index that supports searches. Data structures to represent indexes on text files are: suffix trees (Weiner 1973, McCreight 1976, Ukkonen 1994), direct acyclic word graph (Blumer et al., 1985), suffix automata (Crochemore, 1986), and suffix arrays (Manber and Myers, 1993). All algorithms (except for suffix arrays) build the index in time O(n\log\sigma).

References

  1. Aho, A.V. 1990. Algorithms for finding patterns in strings. In: Handbook of Theoretical Computer Science, Algorithms and Complexity, Vol. A, ch. 5, pp 255--330. J. van Leeuwen ed., Elsevier, Amsterdam.
  2. Bell, T.C., Cleary J.G. and Witten, I.H. 1990. Text Compression. Prentice Hall, Englewood Cliffs, New Jersey.
  3. Cormen, T.H., Leiserson C.E. and Rivest, R.L. 1990. Introduction to Algorithms, ch. 34, pp 853--885. MIT Press.
  4. Crochemore, M. and Rytter W. 1994. Text Algorithms. Oxford University Press.
  5. Gonnet, G.H. and Baeza-Yates, R.A. 1991. Handbook of Algorithms and Data Structures, ch. 7, pp 251--288. Addison-Wesley.
  6. Nelson, M. 1992. The Data Compression Book. M&T Books.
  7. Sedgewick R. 1990. Algorithms in C, ch. 19 and 22. Addison-Wesley.
  8. Stephen, G.A. 1994. String Searching Algorithms. World Scientific Press

Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index


Old News ;-)

[Jan 30, 2008] History of Data Compression in Japan

In 1987, I was asked by a magazine editor to write an article about data compression. I wrote a manuscript and an accompanying program, sent them to the editor, and forgot about them. The next time I heard from him I was told that the magazine was discontinued. So I uploaded my program, a simple implementation of the LZSS compression algorithm (see below) to PC-VAN, a big Japanese BBS run by NEC. That was May 1, 1988.

Soon a number of hobby programmers gathered and began improving on that program. The project culminated in Kazuhiko Miki's archiver LArc, which was fairly widey used in Japan. (Dr. Miki was then a medical specialist working at a governmental office. I heard he left office and began work on freeware/shareware promotion.)

The LZSS algorithm is based on a very simple idea. Suppose I'm going to write "compression" here. But probably I've already used that word before in this file. If I used that word 57 characters before, I might as well write "go 57 characters backward, and read 11 characters," or <57,11> for short. In general, when I've already used the string of characters among the recent 4096 characters, say, I encode the string by a <position,length> pair.

In Storer's [8] terminology, this is a sliding dictionary algorithm, analyzed first by Ziv and Lempel [14] and then by Storer and Szymanski [9], among others.

Later versions of my LZSS implementations and Miki's LArc used binary search trees to make string search faster; see Bell [1].

Incidentally, there are two distinct Ziv-Lempel (LZ) methods: sliding dictionary [14] and dynamic dictionary [15] in Storer's [8] terminology. The LZW algorithm [12] belongs to the latter. Most pre-LHarc compression tools, such as 'compress', 'ARC', and 'PKARC', used LZW.

During the summer of 1988, I wrote another compression program, LZARI. This program is based on the following observation: Each output of LZSS is either a single character or a <position,length> pair. A single character can be coded as an integer between 0 and 255. As for the <length> field, if the range of <length> is 2 to 257, say, it can be coded as an integer between 256 and 511. Thus, I can say that there are 512 kinds of "characters," and the "characters" 256 through 511 are accompanied by a <position> field. These 512 "characters" can be Huffman-coded, or better still, algebraically coded. The <position> field can be coded in the same manner. In LZARI I used an adaptive algebraic compression [13], [2] to encode the "characters," and static algebraic compression to encode the <position> field. (There were several versions of LZARI; some of them were slightly different from the above description.) The compression of LZARI was very tight, though rather slow.

Haruyasu Yoshizaki (Yoshi), a physician and guru hobby programmer, worked very hard to make LZARI faster. Most importantly, he replaced LZARI's algebraic compression by dynamic Huffman coding.

His program, LZHUF, was very successful. It was much faster than my LZARI. As for compression ratio, Huffman cannot beat algebraic compression, but the difference turned out to be very small.

Yoshi rewrote the compression engine of LZHUF in assembler, and added a nifty user interface. His archiver, LHarc, soon became the de facto standard among Japanese BBS users. After Prof. Kenjirou Okubo, a mathematician, introduced LHarc to the United States, it became world-famous. Other vendors began using similar techniques: sliding dictionary plus statistical compressions such as Huffman and Shannon-Fano. (I wondered why they used Shannon-Fano rather than Huffman which is guaranteed to compress tighter than Shannon-Fano. As it turned out, a then-popular book on compression published in U.S. contained a wrong description and buggy sample programs, such that Shannon-Fano outperformed (buggy) Huffman on many files.)

Although LHarc was much faster than LZARI, we weren't quite satisfied with its speed. Because LHarc was based on dynamic Huffman, it had to update Huffman tree every time it received a character. Yoshi and I tried other dynamic Huffman algorithms [5], [10], [11], but improvements were not as great as we desired.

So I took a different step: replacing LHarc's dynamic Huffman by a static Huffman method.

Traditional static Huffman coding algorithm first scans the input file to count character distribution, then builds Huffman tree and encodes the file. In my approach, the input file is read only once. It is first compressed by a sliding dictionary method like LZARI and LHarc, and at the same time the distributions of the "characters" (see above) and positions are counted. The output of this process is stored in main memory. When the buffer in memory is full (or the input is exhausted), the Huffman trees are constructed, and the half-processed content of the buffer is actually compressed and output.

In static Huffman, the Huffman tree must be stored in the compressed file. In the traditional approach this information consumes hundreds of bytes. My approach was to standardize Huffman trees so that (1) each left subtree is no deeper than its right counterpart, and (2) the leaves at the same level are sorted in ascending order. In this way the Huffman tree can be uniquely specified by the lengths of the codewords. Moreover, the resulting table is again compressed by the same Huffman algorithm.

To make the decoding program simpler, the Huffman tree is adjusted so that the codeword lengths do not exceed 16 bits. Since this adjusting is rarely needed, the algorithm is made very simple. It does not create optimal length-limited Huffman trees; see e.g. [6] for an optimal algorithm. Incidentally, my early program had a bug here, which was quickly pointed out and corrected by Yoshi.

The sliding dictionary algorithm is also improved by Yoshi using a "PATRICIA tree" data structure; see McCreight [7] and Fiala and Greene [4].

After completing my algorithm, I learned that Brent [3] also used a sliding dictionary plus Huffman coding. His method, SLH, is simple and elegant, but since it doesn't find the most recent longest match, the distribution of match position becomes flat. This makes the second-stage Huffman compression less efficient.

On the basis of these new algorithms, Yoshi began to rewrite his LHarc, but it took him so long (remember he was a busy doctor!) that I decided to write my own archiver. My archiver was quite recklessly named 'ar'. (Actually I appended version numbers as in 'ar002' for version 0.02.) I should have named it 'har' (after my name), say, because 'ar' collides with the name of UNIX's archiver. I didn't want my program to compete with LHarc, but I wanted many people to try the algorithm, so I wrote it in pure ANSI C. This is the reason 'ar' lacked many bells and whistles necessary for a real archiver.

Note: The version of 'ar002' most often found in the U.S. had a bug. Line 24 of maketbl.c should read, of course,
    while (i <= 16) {
        weight[i] = 1U << (16 - i);  i++;
    }
Somehow the bug didn't show up when compiled by Turbo C.

Yoshi finally showed us his new archiver written in C. It was tentatively named LHx. He then rewrote the main logic in assembler. Yoshi and I wrote an article describing his new archiver, which would be named LH, in the January, 1991, issue of "C Magazine" (in Japanese). The suffix 'arc' of LHarc was deliberately dropped because the people who sold ARC did not want others to use the name.

Then we learned that for the new DOS 5.0, LH meaned LoadHigh, an internal command. We decided to rename LH to LHA.

Also, I was told that the algorithm described in Fiala and Greene [4] got patented ("Textual Substitution Data Compression With Finite Length Search Windows," U.S. Patent 4,906,991, Mar. 6, 1990. Actually they got three patents! The other two were: "Start, Step, Stop Unary Encoding for Data Compression," Application Ser. No. 07/187,697, and "Search Tree Data Structure Encoding for Textual Substitution Data Compression Systems," Application Ser. No. 07/187,699.)

Furthermore, I learned that the original Ziv-Lempel compression method (Eastman et al., U.S. Patent 4,464,650, 8/1984) and the LZW method (Welch, 4,558,302, 12/1985) were patented. I also heard that Richard Stallman, of the Free Software Foundation, author of the EMACS editor and leader of the GNU project, ceased to use 'compress' program any more because its LZW algorithm got patented.

Are algorithms patentable? (See [16].) If these patents should turn out to be taken seriously, all compression programs now in use may infringe some of these patents. (Luckily, not all claims made by those algorithm patents seems to be valid.)


The foregoing is a slight modification of what I wrote in 1991. The year 1991 was a very busy year for me. In 1992, I joined the faculty of Matsusaka University. This opportunity should have given me more free time, but as it turned out I got ever busier. I stopped hacking on my compression algorithms; so did Yoshi.

Luckily, all good things in LHA were taken over, and all bad things abandoned, by the new great archiver zip and the compression tool gzip. I admire the efforts of Jean-loup Gailly and others.

A brief historical comment on PKZIP: At one time a programmer for PK and I were in close contact. We exchanged a lot of ideas. No wonder PKZIP and LHA are so similar.

Another historical comment: LHICE and ICE are definitely not written by Yoshi (or me or anyone I know). I think they are faked versions of LHarc.

REFERENCES

[1]
Timothy C. Bell. Better OPM/L text compression. IEEE Transactions on Communications, COM-34(12):1176--1182, 1986.
[2]
Timothy C. Bell, John G. Cleary, and Ian H. Witten. Text Compression. Prentice Hall, 1990.
[3]
R. P. Brent. A linear algorithm for data compression. The Australian Computer Journal, 19(2):64--68, 1987.
[4]
Edward R. Fiala and Daniel H. Greene. Data compression with finite windows. Communications of the ACM, 32(4):490--505, 1989.
[5]
Donald E. Knuth. Dynamic Huffman coding. Journal of Algorithms, 6:163--180, 1985.
[6]
Lawrence L. Larmore and Daniel S. Hirschberg. A fast algorithm for optimal length-limited Huffman codes. Journal of the Association for Computing Machinery, 37(3):464--473, 1990.
[7]
Edward M. McCreight. A space-economical suffix tree construction algorithm. Journal of the Association for Computing Machinery, 23(2):262--272, 1976.
[8]
James A. Storer. Data Compression: Methods and Theory. Computer Science Press, Rockville, MD., 1988.
[9]
James A. Storer and Thomas G. Szymanski. Data compression via textual substitution. Journal of the Association for Computing Machinery, 29(4):928--951, 1982.
[10]
Jeffrey Scott Vitter. Design and analysis of dynamic Huffman codes. Journal of the Association for Computing Machinery, 34(4):825--845, 1987.
[11]
Jeffrey Scott Vitter. Algorithm 673: Dynamic Huffman coding. ACM Transactions on Mathematical Software, 15(2):158--167, 1989.
[12]
Terry A. Welch. A technique for high-performance data compression. IEEE Computer}, 17(6):8--19, 1984.
[13]
Ian H. Witten, Radford M. Neal, and John G. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520--540, 1987.
[14]
Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337--343, 1977.
[15]
Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, IT-24(5):530--536, 1978.
[16]
Edward N. Zalta. Are algorithms patentable? Notices of the American Mathematical Society, 35(6):796--799, 1988.

Haruhiko Okumura,
okumura@matsusaka-u.ac.jp

Last modified: Tue Mar 17 17:02:03 1998

Arturo Campos home page

The column "html" shows the size of the article in html format and if clicked opens it in a new window. The "zip" column contains the zipped article and images or source if any. Note that brighter rows mean new or updated article. If you take the time to read the articles, please take the time to tell me what do you think of them.
 
Title  Html  Zip Last update
Arithmetic coding, entropy coder 20k 7k 22-Jul-1999
Bwt, block reduction 21k  8k 22-Jul-1999
Canonical huffman. 15k  5k 23-Jul-1999
Crc-32, the standard Crc 6k 7k 10-Aug-1999
Finite context modeling 37k 12k 16-Nov-1999
Flexible parsing, improvement of lz  10k 4k 24-Jul-1999
Lz77, also called lzss 40k 14k 23-Jul-1999
Lzw, for gif decoding 27k 9k 23-Jul-1999
Mtf, a transformation scheme 9k 3k 24-Jul-1999
Implementing ppmc with hash tables  111k 327k 21-March-2000 
Quasi Static model 19k 6k 13-Aug-1999
Range coder 24k 8k 17-Nov-1999
Rle, Basic scheme 7k 3k 24-Jul-1999

As you can read in the index I'm going to publish as soon as possible new articles, if you want to get an email when this happens, please fill this form, moreover you can use it to tell me which file format do you prefer. If you have any question about the articles feel free to email me.

My goal is to offer the best articles about compression for free. Thus I need your feedback on the articles to further improve them: What you don't know. What was confusing or unclear. What difficulties you found while implementing it. Or what you miss. Teeling me so is the only way of further improving articles for you, and depends only of you.

Note that since my first articles both my knowledge about compression and english have greatly improved, thus you can notice a difference in quality between old articles or the latest ones. Unfortunately I don't have the time to rewrite all the articles (as I would like to) so instead please tell me which articles do you find worth rewriting, and I'll do so.

[Oct 31, 2004] Data Compression -- mini-tutorial by Greg Goebel.

The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for "data compression", that is, the ability to reduce the amount of storage or Internet bandwidth required to handle data.

This document provides a survey of data compression techniques. The focus is on the most prominent data compression schemes, particularly popular archivers, JPEG image compression, and MPEG video and audio compression.

[April 22 2000] Slashdot Phillip W. Katz, Creator Of PKZIP, Dead At 37

  • by zpengo (99887) on Saturday April 22 2000, @06:31PM (#1116033) Homepage
    Oh, the memories...

    I used to go down to the local computer store, which had bins and bins of the latest shareware, all on precious 5 1/4 disks. Each one held some sort of magic that would transform my XT with Hercules graphics into a completely absorbing experience.

    Video games, clones of major applications, dinky little Pascal compilers, my first version of Spacewar....

    But there was a key to all of that magic. Back then, there were no auto-installing CDs. There was no "setup.exe" There would just be a single file, with that ever-familiar extension: ".ZIP"

    I had been on the scene long enough to know what was up, so I not only had PKZIP/PKUNZIP installed on my 4 meg harddrive, but I even had it in the PATH.

    A few keystrokes later, the magic was unlocked.

    We don't know how much we owe to this great man. I genuinely mourn his passing.

    37 (Score:4)

    by roman_mir (125474) on Saturday April 22 2000, @06:36PM (#1116035) Homepage Journal
    So many celebrities, poets, actors, revolutionaries, wariers, politicians etc have died on 33 and 37, I tell you, if you pass 37 you'll probably live a long life.

    (to those of us who remember Vladimir Visotskiy) Na zifre 37, kovaren bog, rebrom vopros postavil: ili, ili
    Na etom rubeje legli i Bairon i Rembo a nineshnie kak-to proskochili...
  • Sad day.... by maelstrom (Score:1) Saturday April 22 2000, @06:37PM
  •  

    *sigh* (Score:5)

    by Seumas (6865) on Saturday April 22 2000, @06:37PM (#1116040)
    This is the first I've heard of his death and I have to say that it really makes me feel sad. I'm not aware of much that he's done outside of PKZIP, but I sure remember using ZIP for everything online (especially when a 2400 baud modem was considered fast and a zipped file could half your online time).

    Huffman, Postel, Stevens . . . Now P.W. Katz. I feel guilty for not ever considering any of these people beyond what their program does or does not do for me -- or how I benefitted from their books, until after their death. To think that while we're all out there unzipping our latest copy of the Jargon file or stashing a bunch of porn in a password protected ZIP file, this guy was suffering a serious problem which eventually took his life at the age of *thirty-seven*.

    I'm only 22. I spend all my time working at a desk. I haven't been in-shape for almost six years. I could be next. I could be next and I haven't offered a damn thing to the computer or internet community. These people -- and many others, have.

    I hope that we'll remember these things in subsequent posts in reply to this article. The last thing we need is another disgustingly barbaric replay of the posts we saw when W. Richard Stevens died.

    I hope you have peace, Phillip.

    W. Richard Stevens Slashdot Article [slashdot.org]
    W. Richard Stevens Home Page [kohala.com]
    David Huffman Slashdot Article [slashdot.org]
    Jon Postel Slashdot Article [slashdot.org]
    Jon Postel's Home Page [isi.edu]

  • by farrellj (563) on Saturday April 22 2000, @07:04PM (#1116054) Homepage Journal
    O.K., here is the story as I remember it.

    Phil wrote a better compression program that was compatible with System Enhancements Associates (SEA) program called ARC. So they litigated. And so Phil went off and found a better algorithem for compression, and brought out PKZIP.Many people in the BBS community thought that SEA was a little heavyhanded (Perception, I don't know the reality), and moved to PKZIP. Others moved over for the speed and the better compression. The rest is history.

    See also "arc wars" [cs.hut.fi]MIT Jargon File ver 299. This story seems to have been dropped from the current Jargon File [jargon.org] for some reason.

    ttyl
    Farrell McGovern
    Former Sysop, Data/SFnet (One of the first few hundred Fidonet BBSs!) and Solsbury Hill, founding member of PODSnet.

  • by DeepDarkSky (111382) on Saturday April 22 2000, @07:04PM (#1116055)
    I definitely remember Phil Katz and all the controversy surrounding him, and how grateful I was to have discovered his programs. I remember the first compression program which was SEA's ARC program. It was very slow. Then my friend and I discovered PKARC and PKXARC, which were much faster than ARC. As PKARC gained popularity because of its overall superiority, SEA sued Phil Katz, and he in turn created PKPAK/PKUNPAK (I think it was still paired like that). Tha PKPAK series didn't last long. The PKZIP series came out next, and that was the series that created the ubiquitous ZIP format that we see today. If I remember correctly, PKZ204G was the last official DOS version of the program, and there were plenty of trojans, etc. that were going around, and Phil created self-authenticating zip files, etc. Lots of neat little cool things. I also remember that other programs were giving PK a run for his money, such as ARJ and LHARC, but they never achieved the overall speed/performance/compression that PKZIP ever did (they were often better in one thing or another but not overall). Then WINZIP came out, and I kind lost sight of PK.

    I still have thousands of ZIP files that were zipped with PKZIP. If it wasn't for him, I wouldn't have been as into computers as I am, it was because of those early days of playing around with PKARC and PKXARC that really got me started. I am terribly sad to see him go and in such (I think) indignant way.

  • Well, PKZip seems to have stopped development around 1993, well before WinZip became popular. PKZIp v2.04g was pretty much the last version I know of, and it came out february 1, 1993. Up until then there had been fairly frequent updates, but throughout 1993, 1994, and 1995, PKZIp v2.04g for DOS remained the standard compression tool. Only then, after 2-3 years of no updates, did other tools like WinZip become popular. PKZIp finally made a Windows product in 1997 or 1998, but they were long gone by then. I'm not sure what led to the development halt, but the original stuff is fine coding...
  • by Harinath (26296) on Saturday April 22 2000, @07:28PM (#1116067) Homepage
    IIRC, the ZIP file format was made public domain, thus allowing the Info-ZIP people to write a program that reads ZIP archives, which in turn allowed WinZip to not have to license software from PKware.

    Unlike LZW, the ZIP "deflate" algorithms (LZ77 + Shannon Fano encoding) are unemcumbered. These compression algorithms are used in GNU Zip (gzip) partly for that reason. I think gzip can even read .zip archives with only one file inside. The zip algorithm is also in the zlib library, which is used in the PNG format, for one. The "deflate" algorithm is also described in RFC 1951.

    So, thanks PK, for providing one of the tools that enable us to thumb our noses at Unisys :-)
  • by Megane (129182) on Saturday April 22 2000, @07:31PM (#1116069)
    From the Milwaukee Journal Sentinel:
    The original obituary notice [jsonline.com]
    A more complete article [jsonline.com]

    One interesting quote:
    "It was just a hobby," he said. "I didn't expect it to turn into a business."

    I had a moderately successful shareware program myself during the '80s, and it sure didn't help my life much. Fortunately I have no interest in booze or drugs -- they just get in the way of hacking. And also fortunately, I let it go when it wasn't successful any more. Maybe a little later than I should have, but I did move on.
  • by ContinuousPark (92960) on Saturday April 22 2000, @07:32PM (#1116071)
    Couldn't find much so far about him but I came across this page where several patents on data compression algorithms are mentioned and this led me to one of his patents [ibm.com], a so-called string searcher and compressor.

    It would be interesting to know if what the patent decribes is the technology behind PKZIP.

    I first found out about this on CNET Download.com's front page; there was this little message in memoriam of PK but I don't think it was mentioned on News.com; that was strange. This is a sad event and I think it would be more convenient and respectful if we didn't get to know the details of his death just because it turned out to be a morbid and attention-attracting story for the media.

 

News - Paid Death Notices - Katz, Phillip W.

Katz, Phillip W.

Publication Date: April 19, 2000
Age 37. Passed away unexpectedly on Fri., April 14, 2000. Beloved son of Hildegard and beloved brother of Cynthia. Also survived by other relatives and friends. Phil was a graduate of UWM Computer Science Engineering Program. He was the author of the PKZIP/PKUNZIP software and owner of PKWARE Inc. Co. Private services have been held. Memorials to the charity of your choice would be appreciated.

ZWASKA FUNERAL HOME 354-5330 Serving the Family

The Rise and Fall of a Software Star - Phil Katz Loved Code -- and Liquor

 

 

 


Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     


FAQs

compression FAQ


eBooks

Data Compression Debra A. Lelewer and Daniel S. Hirschberg

Sometimes you can find old O'Reilly graphic formats book on the web too:

Encyclopedia of Graphics File Formats, 2nd Edition
The Complete Reference on CD-ROM with Links to Internet Resources

By James D. MurrayWilliam vanRyper
2nd Edition May 1996 

It contains a chapter about graphic algorithms, see TOC


Recommended Papers

AnandTech - All About Compression, Part I

Games++ Games & Game Programming-Algorithms, Tutorials, Source Code, Direct X, C-C++, Java, and Much More.

E186-1 Michael Schindler


Huffman code

Queue, Huffman, Compression and Encryption Applets in C++

Forums - Efficient Huffman implementation

...it is certainly possible to construct the Huffman code without an explicit tree. The main observation is that there is a canonical Huffman code entirely defined by the code lengths of the letters. And, there is a simple data structure to compute the latter in linear time after  a sorting phase without explicitly constructing the tree. This does  save half the memory required for an explicit tree. For example, see my Huffman implementation in the Vcodex package at www.research.att.com/sw/tools.


LZH


LZW

LZW(Lempel-Ziv-Welch) is the most common algorithm used in computer graphics. This lossless method of data compression is found in several image file formats, such as GIF and TIFF, and is also part of the V.42bis modem compression standard and PostScript Level 2.

In 1977, Abraham Lempel and Jakob Ziv created the first of what we now call the LZ family of substitutional compressors. The LZ77 compression algorithms are commonly found in text compression and archiving programs, such as compress, zoo, lha, pkzip, and arj. The LZ78 compression algorithms are more commonly used to compress binary data, such as bitmaps.

In 1984, while working for Unisys, Terry Welch modified the LZ78 compressor for implementation in high-performance disk controllers. The result was the LZW algorithm that is commonly found today.  It is covered by U.S. Patent 4,558,302 (plus its foreign counterparts, issued or pending).  All patents are held by Unisys Corporation. That means that without obtaining a license from Unisys you formally cannot read or write a GIF file :-(. See TF_WP_unisys for more information.


Bijective  compression

I recommend first check  DataCompression.info - Tutorials, Reference, Presentations

Creating a ''One to One Adaptive Huffman Compression-Decompression Program for the real world of 8 bit byte files

The Mandala Centre - Compression and Security - One on one compression FAQ

One to One Compression -- This site discusses a characteristic of some compression algorithms that the author refers to as One to One compression. In a nutshell, this property means that for any file X, F( F'( X ) ) == X. (F is either the compressor or decompressor, and F' is its opposite number.) This is definitely not the case for most conventional compression algorithms.

Bijective Arithmetic Encoding with Really Good End Treatment

DataCompression.info - Adaptive Huffman Coding
... Adaptive Huffman Encoding, Rate, A library to perform adpative ... an implementation of
Vitter's dynamic Huffman compressor, adapted so that it is bijective. ...


History

patenting gifs, lzw compression, unisys gif patent, software for gifs, products using lzw compression algorithm, png file format, development of png file format

Gifs and patents

The patent for LZW compression algorithm is held by Unisys since 1985. Compuserve Gif uses this algorithm. In 1994, Unisys implemented their patent rights by asking for license fees from developers of products that read or write gifs. This announcement took developers by surprise and caused quite a stir on the Internet. There were rumours that Unisys might charge web developers for usage of gifs on their sites. At the same time, it was argued that Gifs are the products of LZW compression and the patent does not extend to the usage of the end product. Actually, web developers had nothing to fear since Unisys planned to collect license fees only from software companies whose products employ LZW algorithm.

Web developers should not be worried. They are free to use gifs on their web sites. However, if you've developed a software that creates or modifies gifs, you would have to pay licensing fees to Unisys.

The business acumen of the people at Unisys has to be admired. It seems that they had waited for Gifs to become popular and beneficial (from a web developers' point of view) before implementing the patent rights. However, there was an interesting and fortunate (?) side effect of this story. It lead to the development of the PNG file format. PNG is a much better and more versatile image format than both JPG and GIF. It has all the bells and whistles of these two file formats.

At the time of writing the browser support for PNG is still quite patchy, though the future does look bright.

LZW compression: conjugations patented

Google Groups View Thread LZW Patent Expiry

Can the LZW algorithm be used freely in all countries after the US
patent expiry in June 2003? I have gathered this information from
newsgroups:

1. The main US patent expires the 20 June 2003. The valid time can not
be extended, if they do not make adjustments/extensions to the patent.
US patent:
http://l2.espacenet.com/espacenet/viewer?PN=US4558302&CY=ep&LG=en&DB=EPD

2. There are other US patents covering this field, but these are filed
in 1999, and are therefore not enforceable. Are there other US
patents?

3. Unisys claim that they have patents in Canada, France, Germany,
U.K.  and Italy, but they have never sued any company in European
court though they have had chances. There are some references in the
US patent: CA1223965, DE3476617D, EP0129439. Are these patents
enforceable? Are there other patents?

4. Patents are pending in Japan. The US patent refers to these:
JP2610084B2, JP5068893B, JP60116228, JP7249996, JP8237138. Do we have
to worry about these?


Can anyone confirm the information above and answer the questions?
Does this mean that we can use the lzw algorithm freely in all
countries after June 2003?
 

 

Etc

Metalinks


Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Last modified: August 14, 2009