XXIIVV

ULZ is a compression format.

This LZ lossless compression format is designed to be mildly better than RLE but not too difficult to host on Uxn systems. The compressed file contains a stream of commands, not unlike a virtual machine bytecode. There are two types of instructions LIT and CPY, the CPY opcode has a short and a longer mode. Decoding works by reading the commands from the input until there's no more input.

ByteByteByte
0LIT(length, 7 bits)Bytes to copy at pointer...
10CPY1(length, 6 bits)Offset from pointer
11CPY2(length, 14 bits)Offset from pointer

As the output file is being assembled, a pointer moves along, and the program appends previously written data at the pointer's position up to a maximum of 256 bytes ago. When the writing length overflows the distance from the output pointer, the bytes loop over the available length.

Encoded Data

2842 6c75 6520 6c69 6b65 206d 7920 636f
7276 6574 7465 2069 7473 2069 6e20 616e
6420 6f75 7473 6964 650a 8128 2361 7265
2074 6865 2077 6f72 6473 2049 2073 6179
0a41 6e64 2077 6861 7420 4920 7468 696e
6b8a 2909 6665 656c 696e 6773 0a54 8022
066c 6976 6520 696e 8050 1720 6d65 0a49
276d 2062 6c75 650a 4461 2062 6120 6465
6520 6482 0900 69b5 12

The LIT Instruction

The LIT instruction appends a number of bytes to the output equal to the 7 lower bits of the instruction byte, plus 1. The output pointer is moved by that same distance.

Blue like my corvette its in and outside
are the words I say
And what I thinkfeelings
Tlive in me
I'm blue
Da ba dee di

The CPY Instruction

The CPY instruction copies a length of bytes, plus 4, at a negative offset from the output pointer, plus 1. In other words, an offset of 0 means go back by 1 bytes into the history. The offsets should be treated as the distance from the end of last byte that was written.

Blue like my corvette its in and outside
-----are the words I say
And what I think--------------feelings
T----live in---- me
I'm blue
Da ba dee d------i--------------------------------------------------------

The resulting 209 bytes of data from the 137 bytes of compressed data. Note that this short example is not long enough to include usage of the CPY2 instruction.

Blue like my corvette its in and outside
Blue are the words I say
And what I think
Blue are the feelings
That live inside me
I'm blue
Da ba dee da ba di
Da ba dee da ba di
Da ba dee da ba di
Da ba dee da ba di

Image Compression

The compression works best with tiled assets in the icn or chr formats.

Original: 4096 bytes
Compressed: 2430 bytes, 59.32%

Implementation

Here's an implementation in Uxntal.

@decode_ulz ( str* -- )
	;mem .ptr STZ2
	.File/name DEO2
	&stream ( -- )
		#0001 .File/length DEO2
		;&b
			DUP2 .File/read DEO2
			.File/success DEI2 ORA ?{ POP2 JMP2r }
			[ LIT &b $1 ] decode_ulz_byte
		!&stream

@decode_ulz_byte ( byte -- )
	DUP #80 AND ?op-cpy

@op-lit ( byte -- )
	#00 SWP INC2
		DUP2 .File/length DEO2
	.ptr LDZ2
		DUP2 .File/read DEO2
		ADD2 .ptr STZ2
	JMP2r

@op-cpy ( byte -- )
	#7f AND
		DUP #40 AND ?&long
	#00 SWP !&copy
&long ( byte -- )
	#3f AND getc
&copy ( length* -- )
	.ptr LDZ2 #00 getc INC2 SUB2 STH2
	#0004 ADD2 #0000
	&l ( -- )
		( get ) DUP2 STH2kr ADD2 LDA
		( put ) .ptr LDZ2 STAk INC2 .ptr STZ2 POP
		INC2 GTH2k ?&l
	POP2 POP2 POP2r
	JMP2r

And an implementation in C89.

char *mem, *ptr;

int
decode_ulz(FILE *src)
{
	char c, *copy;
	short i, length;
	ptr = mem = malloc(0x10000);
	while((c = getc(src)) != EOF) {
		if(c & 0x80) { /* CPY */
			if(c & 0x40)
				length = (c & 0x3f) << 8 | getc(src);
			else
				length = c & 0x3f;
			copy = ptr - (getc(src) + 1);
			for(i = 0; i < length + 4; i++)
				*(ptr++) = *(copy++);
		} else /* LIT */
			for(i = 0; i < c + 1; i++)
				*(ptr++) = getc(src);
	}
	return ptr - mem;
}