Well for a game, I think you have already achieved your goal. I only did 2 CG (game-type) images, but they both compressed very well (over 75%). Also you reported a test image with excellent compression (around 65% as I recall). I mean sure, you might want to tweak the bit allocation for your different codes, and this would improve things for CG-images (who knows how much until you try). But the whole principle is just not adequate for highly dithered (real-life) images. This is mainly due to the random narture of the dithering algorithms... this destroys "easy" compressibility.
So unless you have a game with lots of real-life photos, then you are done (aside from tweaking, if you care to do so).
If you do want real-life (highly dithered) photos, and you want fast decompression, then you'll need to use lossy encoding. Two ways to do this are
1. what I call "linear" / "bitmap" approximation
2. vector quantization
#2 has been written about in many technical articales. It is the main reason I made ImageWork... to develop VQ routines beginning with single images (a much simpler environment then my full-blown video encoder). In no particular order, here are some of my bookmarks...
A. "
Seafloor Image Compression with Large Tilesize Vector Quantization" (PDF, lots of math, abstract in general, but nice photos)
B. "
Vector Quantization" (a "just the facts" web page, lots of math symbols, some diagrams, no real code))
C. "
Qcc Pack" (this a Source Forge project, so it has a lot of working C code... I wish I knew about this BEFORE starting my ImageWork)
D. "
VECTOR QUANTIZATION" (a web page somewhere between the last two... more in depth than B but less real code then C [only psudo code])
E. "
Algorithms in cluster analysis" (short web page with several links)
F. "
The Enhanced LBG algorithm" (PDF, lots of math, abstract in general, but nice photos)
Sorry about link F. That is one of my favorites (working with it now to improve VQ in my ImageWork so I can use it for video). Most annoyingly, I can not find a direct link to the article. It seems everything on Google takes you to a site that wants you to sign up and/or pay money to download the file. Well it is a university paper, so I'm pretty sure it is public domain, and the PDF that I have (did not pay for it or sign-up for anything) does not have any copyright notice. So it would probably be safe to upload if anybody really wants it.
Well #2 can be quite complex as you might gather from the university papers or peeking at the C code for hours
#1 is not very-well written about because it is something I just made up for my original video encoder. Some sort of cross between VQ and MPEG. Basically it searches for common ("visually almost the same") byte sequences near each byte sequence. Because it doesn't search very far, it is kind of like MPEG motion vectors... and because it searches for a byte sequence it is like a block used in VQ. However the search is more broad than MPEG and not fixed in size like VQ.
In comparison, #1 only works (well) for true bitmap images. My original/old video encoder also tried using it for text-mode, which gave very questionable results... which is why I have been working to develop #2 (VQ). VQ works for either real bitmap images, or custom-font-text-mode (emulated bitmap) images... however VQ is "best" with text mode which works quite well for VIC-II Multi-Color Text, but sucks for VDC text mode (so you probably don't care)... but VQ also works with real bitmap images (so you may be interested). However for real bitmap mode, you do need to "translate" the fake characters (VQ codes) into real bitmap raster segments. It slows things down a bit, but should be okay for a game... I mean it works about 7 frames per second in video (an estimation based on no audio... someday I'll re-allow silent videos, but at the moment audio is required which reduces max frame rate).
Oh yeah, my time estimations are based on VIC-II because I *still* don't have working VDC video, but the VDC can run the CPU at full 2MHz, while VIC-II is crippled by 1MHz CPU *and* bad lines. In other words, there is naturally a bottle-neck with moving data into the VDC, but the CPU can run much faster so I think overall the frame rates will be similar.
To avoid boring everyone, I'll shut up now, but I can elaborate on any of that upon request.