Jump to content

Projects/Liberty/Appendix 3

From KDE Community Wiki
Revision as of 02:50, 10 January 2013 by Purple-bobby (talk | contribs) (Huffman Style Processing: omission from Sliding-Window Pointer decode)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Appendix 3 - HUS/VIP Decompression

In the .hus and .vip format files, the attribute, X displacement, and Y displacement of each stitch are held compressed using a version of the ARJ compression. An attribute has the values: 0x80 (stitch), 0x81 (jump), 0x84 (thread stop), and 0x90 (end stop); that is bit 7 is always set, bit 0 indicates jump without stitch, bit 2 indicates thread change stop, and bit 4 end of pattern stop; bits 1, 3, 5, 6 appear unused.

The compression uses a sliding window of past uncompressed values to source repeated strings of bytes, instead of outputting the string of bytes, a pointer (and length) back to an earlier version of the string is (coded and) output. A feature of this is, a self repeating string can be copied before it is fully output, for example, if 'ABC' has just been output and there is now a string 'ABCABCABC', the length would be 9, but the pointer would only point back 3 bytes (to the 'A') as the 'ABC' is copied to the output, they become available to copy again.

When there are no bytes to copy in the recent output, the new byte has to be output as is, a length of zero (or less than 3) indicates verbatim byte and no pointer. This is encoded as values 0 to 255 indicate verbatim byte and no pointer, values 256 or greater indicate length, but the true length is the value minus 256 plus 3.

How to Decompress

Allocate an output buffer of enough bytes to hold the uncompressed data, access is needed to some of the past decoded bytes. Either fetch bytes and add them to the output, or fetch length and pointer, then copy past bytes to the output. Repeat until output buffer is full.

Pseudo-code of Sliding Window Decompression:

 Let uncompressed := new vector[0 to uncompressed-size - 1] of byte 
     -- allocate output buffer
 Let count := 0 -- the number of bytes output, index to output buffer
 While count < uncompressed-size Do
   Let code := Fetch-Byte-or-Length
   If code < 256 Then -- is byte
     Let uncompressed[count] := code
     Let count := count + 1
   Else
     Let length := code - 256 + 3 -- min length is 3
     Let pointer := Fetch-Pointer
     For length Times Do
       If count < uncompressed-size Then -- not buffer overrun
         If count > pointer Then -- not buffer under-run
           Let uncompressed[count] := 
       	uncompressed[count - pointer - 1]
         Else
           Let uncompressed[count] := 0
         End if
       End if
       Let count := count + 1
     End for
   End if
 End while

Huffman Coding of Bytes or Lengths and Pointers

The sliding-window byte/length and pointer codes are packaged, the first 16 bits gives the package length in sliding-window codes.

To reduce the bit length of the byte or length symbols and the pointers, they are Huffman coded to reduce the length of the most frequent values. The Huffman tree is passed to the compressed output by sending the length of the Huffman code for each symbol in the symbol-set, run of zero encoding is used to fill in the length table of the many unused codes (of zero length). For the byte/length symbol Huffman code lengths, these are themselves Huffman coded.

The number of Huffman code byte/length symbol lengths is given in the next 5 bits (value 0 to 31, but actually limited to 19). If the number is zero then there is only one symbol which is contained in the next 5 bits and its code length is zero. After the third symbol, the next 2 bits provide a 0 to 3 length run of zeros. After the 'number' of symbols the rest of the lengths are set to zero.

Each symbol is processed thus, the next 3 bits give a value 0 to 7; the values 0 to 6 are used directly, but if the value is 7, then a series of next bits, up to 13. The series ends after the first 0 value bit or after the 13th bit, each 1 value bit increments the (length) value (giving a range 7 to 20); I suspect any value over 18 is an error, so a maximum of eleven 1 bits followed by a twelfth bit of 0. The value 0 to 18 is used to set the length of the next code.

A (virtual) Huffman tree is built using these 19 ordered length and symbol values. The next variable length codes can be used to uniquely identify a value.

The next 9 bits give the number byte/length symbols defined (and is typically 511 since the end code is coded as 510 - a bug?), again a 'number' of zero implies the next 9 bits give the only code used and its Huffman code does not need to be transmitted. For 'number' of times, a variable length Huffman code appear in the next bits and gives a value 0 to 18.

There are up to 19 symbols used to compress the byte/length symbol lengths. The symbol values 0 to 2 allow for a run of zeros: 0 implies run-length of one, 1 implies 3 plus the value of the next 4 bits, and 2 implies 20 plus the value of the next 9 bits. The remaining 16 values give the Huffman code length 1 to 16 bits (that is symbol value minus 2).

A (virtual) Huffman tree is built using these 511 ordered length and symbol values.

There are 511 byte/length symbols, byte ranges from 0 to 255 and usable length ranges from 3 to 256, where a length of 257 marks the end of the compressed data.

The pointer values are Huffman encoded and appear next, similar to above, the next 5 bits gives the number pointer symbols ranging from 0 to 19 (or 15). Again, 0 implies there is only one symbol and it is in the next 5 bits. Otherwise, the next 3 bits give a code length 0 to 6, and 7 again is used to increment length by each 1 bit until a 0 bit. The end of the table is filled with 0.

Fetch Byte or Length

Use the next variable length code to find the value of the Byte or Length, by descending the Huffman tree.

Fetch Pointer

Use the next variable length code to find the value of the Pointer, by descending the Huffman tree.

Huffman Style Processing

Initialize

 Let NT = 19
 Let NC = 511
 Let NP = 15
 Let package-remaining := 0
 -- symbol/length code-length code-lengths
 Let T-len := new vector[0 to NT - 1] of (0 to 16)
 -- symbol/length code-lengths
 Let C-len := new vector[0 to NC - 1] of (0 to 16)
 -- pointer code-lengths
 Let P-len := new vector[0 to NP - 1] of (0 to 16)
 

Decode Sliding-window Byte/Length

 -- prepare Huffman decoding trees
 If package-remaining = 0 Then
   Let package-remaining := Get-Bits( 16 )
   Let T-tree := Build-T-tree
   Let C-tree := Build-C-tree( T-tree )
   Let P-tree := Build-P-tree
 End if
 Let package-remaining := package-remaining - 1
 Let byte-length := Huff-Decode( C-tree )
 Return byte-length
 

Decode Sliding-window Pointer

 Let pointer := Huff-Decode( P-tree )
 If pointer <> 0 Then
   Let pointer := pointer - 1
   Let pointer := 2^pointer + Get-Bits( pointer )
 End if
 Return pointer
 

Build T tree

 T-tree.Delete-All
 Let n := Get-Bits( 5 )
 Let n := Min( n, NT )
 If n = 0 Then
   Let c := Get-Bits( 5 )
   T-tree.Append( length := 0, bits := 0, symbol := c )
 Else
   Let i := 0
   While i < n Do
     Let c := Get-Bits( 3 )
     If c = 7 Then
       Let b := Get-Bits( 1 )
       While b = 1 Do
         Let c := c + 1
         Let b := Get-Bits( 1 )
       End while
     End if
     Let T-len[i] := c
     Let i := i + 1
     If i = 3 Then
       Let c := Get-Bits( 2 )
       While c > 0 Do
         Let T-len[i] := 0
         Let i := i + 1
         Let c := c - 1
       End while
     End if
   End while
   While i < NT Do
     Let T-len[i] := 0
     Let i := i + 1
   End while
   Let b := 0
   For len := 1 to 16 Do
     Let b := b * 2 -- shift a 0 into LSB
     For c := 0 to NT-1 Do
       If T-len[c] = len Then
         T-tree.Append( length := len, bits := b, symbol := c )
         Let b := b + 1
       End If
     End for
   End for
 End If
 Return T-tree
 

Build C tree ( T-tree )

 C-tree.Delete-All
 Let n := Get-Bits( 9 )
 Let n := Min( n, NC )
 If n = 0 Then
   Let c := Get-Bits( 9 )
   C-tree.Append( length := 0, bits := 0, symbol := c )
 Else
   Let i := 0
   While i < n Do
     Let c := Huff-Decode( T-tree )
     If c <= 2 Then
       If c = 0 Then
         Let c := 1
       Else If c = 1 Then
         Let c := 3 + Get-Bits( 4 )
       Else
         Let c := 20 + Get-Bits( 9 )
       End if
       While c > 0 Do
         Let C-len[i] := 0
         Let i := i + 1
         Let c := c - 1
       End while
     Else
       Let C-len[i] := c - 2
       Let i := i + 1
     End if
   End while
   While i < NC Do
     Let C-len[i] := 0
     Let i := i + 1
   End while
   Let b := 0
   For len := 1 to 16 Do
     Let b := b * 2 -- shift a 0 into LSB
     For c := 0 to NC-1 Do
       If C-len[c] = len Then
         C-tree.Append( length := len, bits := b, symbol := c )
         Let b := b + 1
       End If
     End for
   End for
 End If
 T-tree.Delete-All
 Return C-tree
 

Build P tree

 P-tree.Delete-All
 Let n := Get-Bits( 5 )
 Let n := Min( n, NP )
 If n = 0 Then
   Let c := Get-Bits( 5 )
   P-tree.Append( length := 0, bits := 0, symbol := c )
 Else
   Let i := 0
   While i < n Do
     Let c := Get-Bits( 3 )
     If c = 7 Then
       Let b := Get-Bits( 1 )
       While b = 1 Do
         Let c := c + 1
         Let b := Get-Bits( 1 )
       End while
     End if
     Let P-len[i] := c
     Let i := i + 1
   End while
   While i < NP Do
     Let P-len[i] := 0
     Let i := i + 1
   End while
   Let b := 0
   For len := 1 to 16 Do
     Let b := b * 2 -- shift a 0 into LSB
     For c := 0 to NP-1 Do
       If P-len[c] = len Then
         P-tree.Append( length := len, bits := b, symbol := c )
         Let b := b + 1
       End If
     End for
   End for
 End If
 Return P-tree
 

Huff Decode ( tree )

 Let sym := 0
 Let b := 0 -- bits
 Let i := 0 -- current tree entry
 Let n := tree.length
 Let l := 0 -- code length
 For i := 0 to n - 1
   If l < tree[i].length Then
     Let b := b * 2^(tree[i].length - l)
     Let b := b + Get-Bits( tree[i].length - l )
     Let l := tree[i].length
   End if
   If tree[i].length = l And tree[i].bits = b Then
     Let sym := tree[i].symbol
     Exit for
   End if
 End for
 Return sym