RLE - Run Length EncodingWritten by Paul BourkeAugust 1995 |
Source code
Standard compression C source: rle.c |
Introduction
Run length encoding is a straightforward way of encoding data so that it takes up less space. It is relies on the string being encoded containing runs of the same character. Consider storing the following short string.
abcddddddcbbbbabcdef
There are 20 letters above, if each is stored as a single byte that is 20 bytes in all. However, the runs of "d" and "b' above can be stored as two bytes each, the first indicating how many letters in the run. For example, the following run length encoded string takes only 14 bytes.
abc6dc4babcdef
In general of course it needs to be a bit more sophisticated than the above. For example, there is no way in the above encoding to encode strings with numbers, that is, how would one know whether the number was the length of the run or part of the string content. Also, one would not want to encode runs of length 1 so how does one tell when a run starts and when a literal sequence starts.
The common approach is to use only 7 of the 8 bits to indicate the run length, this is normally interpreted as a signed byte. If the length byte is positive it indicates a replicated run (run of the following byte). If the number is negative then it indicates a literal run, that is, that number of following bytes is copied as is. To illustrate this the following sequence of bytes would encode the example string given above, it requires 17 bytes.
-3 a b c 6 d -1 c 4 b 6 a b c d e f
While 17 bytes to encode what would take 20 without RLE may not sound like much, but as the frequency and length of the repeating characters increases the compression ratio gets better.
Worst caseOf course RLE will not always result in a compression, consider a string where the next character is different from the current character. Every 127 bytes will require a extra byte to indicate a new literal run length.
Best caseThe best case is when 128 identical characters follow each other, this is compressed into 2 bytes instead of 128 giving a compression ratio of 64.
ExampleFor this reason RLE is most often used to compress black and white or 8 bit indexed colour images where long runs are likely. RLE compression is therefore what was used for the original low colour images expected for the Macintosh PICT file format. RLE is not generally used for high colour images such as photographs where in general each pixel will vary from the last.