ULZ is a compression format.
This LZ lossless compression format is designed to be mildly better than RLE but not too difficult to host on Uxn systems. The compressed file contains a stream of commands, not unlike a virtual machine bytecode. There are two types of instructions LIT and CPY, the CPY opcode has a short and a longer mode. Decoding works by reading the commands from the input until there's no more input.
Byte | Byte | Byte | ||
---|---|---|---|---|
0 | LIT(length, 7 bits) | Bytes to copy at pointer... | ||
1 | 0 | CPY1(length, 6 bits) | Offset from pointer | |
1 | 1 | CPY2(length, 14 bits) | Offset from pointer |
As the output file is being assembled, a pointer moves along, and the program appends previously written data at the pointer's position up to a maximum of 256 bytes ago. When the writing length overflows the distance from the output pointer, the bytes loop over the available length.
Encoded Data
2842 6c75 6520 6c69 6b65 206d 7920 636f 7276 6574 7465 2069 7473 2069 6e20 616e 6420 6f75 7473 6964 650a 8128 2361 7265 2074 6865 2077 6f72 6473 2049 2073 6179 0a41 6e64 2077 6861 7420 4920 7468 696e 6b8a 2909 6665 656c 696e 6773 0a54 8022 066c 6976 6520 696e 8050 1720 6d65 0a49 276d 2062 6c75 650a 4461 2062 6120 6465 6520 6482 0900 69b5 12
The LIT Instruction
The LIT instruction appends a number of bytes to the output equal to the 7 lower bits of the instruction byte, plus 1. The output pointer is moved by that same distance.
Blue like my corvette its in and outside are the words I say And what I thinkfeelings Tlive in me I'm blue Da ba dee di
The CPY Instruction
The CPY instruction copies a length of bytes, plus 4, at a negative offset from the output pointer, plus 1. In other words, an offset of 0 means go back by 1 bytes into the history. The offsets should be treated as the distance from the end of last byte that was written.
Blue like my corvette its in and outside -----are the words I say And what I think--------------feelings T----live in---- me I'm blue Da ba dee d------i--------------------------------------------------------
The resulting 209 bytes of data from the 137 bytes of compressed data. Note that this short example is not long enough to include usage of the CPY2 instruction.
Blue like my corvette its in and outside Blue are the words I say And what I think Blue are the feelings That live inside me I'm blue Da ba dee da ba di Da ba dee da ba di Da ba dee da ba di Da ba dee da ba di
Image Compression
The compression works best with tiled assets in the icn or chr formats.
Original: 4096 bytes Compressed: 2430 bytes, 59.32% |
Implementation
Here's an implementation in Uxntal.
@decode_ulz ( str* -- ) ;mem .ptr STZ2 .File/name DEO2 &stream ( -- ) #0001 .File/length DEO2 ;&b DUP2 .File/read DEO2 .File/success DEI2 ORA ?{ POP2 JMP2r } [ LIT &b $1 ] decode_ulz_byte !&stream @decode_ulz_byte ( byte -- ) DUP #80 AND ?op-cpy @op-lit ( byte -- ) #00 SWP INC2 DUP2 .File/length DEO2 .ptr LDZ2 DUP2 .File/read DEO2 ADD2 .ptr STZ2 JMP2r @op-cpy ( byte -- ) #7f AND DUP #40 AND ?&long #00 SWP !© &long ( byte -- ) #3f AND getc © ( length* -- ) .ptr LDZ2 #00 getc INC2 SUB2 STH2 #0004 ADD2 #0000 &l ( -- ) ( get ) DUP2 STH2kr ADD2 LDA ( put ) .ptr LDZ2 STAk INC2 .ptr STZ2 POP INC2 GTH2k ?&l POP2 POP2 POP2r JMP2r
And an implementation in C89.
char *mem, *ptr; int decode_ulz(FILE *src) { char c, *copy; short i, length; ptr = mem = malloc(0x10000); while((c = getc(src)) != EOF) { if(c & 0x80) { /* CPY */ if(c & 0x40) length = (c & 0x3f) << 8 | getc(src); else length = c & 0x3f; copy = ptr - (getc(src) + 1); for(i = 0; i < length + 4; i++) *(ptr++) = *(copy++); } else /* LIT */ for(i = 0; i < c + 1; i++) *(ptr++) = getc(src); } return ptr - mem; }