|
Post by VDC 8x2 on Jul 25, 2014 4:23:06 GMT
I think once this is rock solid, the hard part begins. how to make the code to compress the pictures to use this depress code.
|
|
|
Post by hydrophilic on Jul 25, 2014 12:16:13 GMT
Well for encoding, I would recommend writing it in high-level language like C or VisualBASIC (or Java or PHP, but I don't like debugging them). The good thing is it should be fast because compression is not complex, and the search is constrained to a maximum of 2K if you used the "big range" long copy... and it only has to search backwards for copy. If you are brave (or maybe just insane) you could try writing/testing it in CBM BASIC For my video compressor, I have a build switch to add "debug" info... this will keep track of which types were most useful (raw/rle/skip/copy) the count of bytes used (like 6 for an RLE instance or 12 for a copy instance), and for (long)copy the range (how far it had to search to find a match)... it tracks the min, max, count, and total (average would obviously be total/count) for each type. Based on that info, I could decide which codes were more useful, and how to allocate available bits to each code. Of course the ultimate result, compression ratio, is easy by comparing file sizes and doesn't need "debug" info. Once it works in high-level language, it is trivial to make into ML if you need it. If you're confident with the decoder so far, here are few optimizations you could make... change this Parse ldy #$00 lda (source),y lsr bcs SecondTest ;is it long or short copy lsr bcs Fill ;got fill bcc Skip ;skip x bytes SecondTest: lsr bcc Copy jmp LongCopy
Fill ... with something like Parse ldy #$00 lda (source),y lsr bcs SecondTest ;is it long or short copy lsr bcc Skip ;skip x bytes
Fill ... Skip ...
SecondTest: lsr bcs LongCopy Copy ...
Assuming the branch would reach of course. Fill and Skip are pretty simple so it should. You can also remove the LDA in this sequence... sbc WorkTemp+1 sta WorkTemp+1 ;saved the results ldy #$20 ;source for copy sty vdcadr lda WorkTemp+1 waitvdc sta vdcdat ;store high byte Were you going to use this in your GoldBox game translation, or is this for something else?
|
|
|
Post by VDC 8x2 on Jul 25, 2014 13:15:21 GMT
I am going to use it to compress the pictures it will be loading in the game.
Aside from the compression of the title page graphic, there is no compression at all in the original game. And that is just a simple rle compression.
It is also going to be for all vdc images. So it was started for GoldBox but, wanted it to stand alone too.
|
|
|
Post by VDC 8x2 on Jul 25, 2014 14:55:00 GMT
would skip over be better option then long copy? Or, should I have a forked code. one for skip over the other for long copy?
|
|
|
Post by hydrophilic on Jul 25, 2014 15:19:23 GMT
Cool, glad to hear some progress with the game too, despite this diversion. Thanks for sharing. Helps me think about ways to improve MediaPlayer which I've been thinking about for awhile but haven't actually started.
Umm, actually in the last code snippet that I quoted, I think you could remove both LDA and STA to WorkTemp+1, because it is never used again by the CPU, just handed to the VDC.
I was thinking long copy, isn't really a good name. I think a better name would be random copy... because it can read from anywhere in the past 1K or 2K (depending on bit allocation), as opposed to standard copy which is like sequential copy (always from the immediately preceding bytes).
Anyway, one way that might work ...
LongCopy: sta WorkTemp+1 ;offset high wordinc Source lda (source),y tax ;save length lsr WorkTemp+1 ror lsr WorkTemp+1 ror lsr WorkTemp+1 ror sta WorkTemp ;offset low txa and #%111 ;isolate length bits clc adc #3 ;minimum copy length tax ;bytes to copy ;to save code bytes, you could just jump to end of normal copy... ldy #$13 sty vdcadr lda vdcdat sec sbc WorkTemp ;low bite - value. sta WorkTemp dey sty vdcadr waitvdc ;wait because reg 12 lda vdcdat sbc WorkTemp+1 ;calc. high adrs
ldy #$20 ;source for copy sty vdcadr waitvdc sta vdcdat ;store high adrs iny sty vdcadr lda WorkTemp sta vdcdat ;store low adrs
lda #$18 sta vdcadr lda vdcdat ;going set to copy ora #$80 sta vdcdat ;set bit 7 lda #$1e sta vdcadr stx vdcdat ;copy x bytes rts
That uses 3-bit copy size and 11-bit distance: %hhhLLL11 , first byte has 3 high bits of distance, and 3 low bits of distance, plus 2 bit code (%11) %LLLLLsss, second byte has remaing 5 bits of low distance, plus 3 bits of size
Size is actually 3+encoded value; so using above method (3-bit size), 3 to 10 bytes can be copied
You might find 4-bit size more useful (copy 3 to 18 bytes), but it would reduce distance from 11-bit (2K) to 10-bit (1K). To do this, all you need is add another LSR WorkTemp+1, ROR pair near the top (and fix the AND #%111 mask).
Nope, not unless you plan on animated images. Well, I guess you *could* clear the screen first (all mem zero), and then skip might be useful. Yeah, you might need to fork the code and try both options. I know it works well for video, but except for that I example I just gave, I don't think it would be handy for static images.
|
|
|
Post by VDC 8x2 on Jul 25, 2014 15:56:03 GMT
I decided to go with the 3 to 10 random copy. Here is the revised code: Source = $fd destination = $fb WorkTemp = $fb vdcadr = $d600 vdcdat = $d601 defm WAITVDC bit $d600 bpl *-3 endm
defm WordInc inc /1 bne *+4 inc /1+1 endm *=$1300 Start lda destination+1 ;high byte ldy #$12 sty vdcadr WAITVDC ;vdc ready? sta vdcdat ;high byte dest iny sty vdcadr lda destination ;low byte sta vdcdat ;low byte set. destination is now set jmp First
MAINLOOP wordinc Source First jsr Parse waitvdc ;wait for copy or fill to finish bmi Mainloop ;we know it is always mi after vdcwait so use ;it. Parse ldy #$00 lda (source),y lsr bcs SecondTest ;is it long or short copy lsr bcc Skip ;skip x bytes
Fill cmp #$01 ;is it 1 bne @not1 ror ;turn it into 128 @not1 tax dex ;number of bites to fill. 2 to 63 or 128 or 256 lda #$18 sta vdcadr lda vdcdat ;going to set to fill and #$7f sta vdcdat ;clear bit 7 wordinc source lda #$1f sta vdcadr WaitVdc ;wait for ready lda (source),y ;byte to repeat sta VDCDAT lda #$1e sta vdcadr stx vdcdat ;start fill rts Skip tax ;transfer to x for counter bne @notzero pla ;pull return off stack pla rts ;done so peace out. @notzero lda #$1f sta vdcadr @loop wordinc Source ;get next byte lda (source),y WAITVDC ;vdc ready? sta vdcdat ;store value dex bne @loop rts SecondTest lsr bcs LongCopy
Copy cmp #$01 ;is it 1 bne @not1 ror ;turn it into 128 @not1 tax ;copies -x bytes forwords +bytes bne @not2 ;is it 0 ldy #$01 ;set high byte to 1 if zero @not2 sta WorkTemp ;low byte sty WorkTemp+1 ;high byte Rdy2Subt ldy #$13 sty vdcadr lda vdcdat sec sbc WorkTemp ;low bite - value. sta WorkTemp dey sty vdcadr waitvdc ;wait because reg 12 lda vdcdat sbc WorkTemp+1 ldy #$20 ;source for copy sty vdcadr waitvdc sta vdcdat ;store high byte iny sty vdcadr lda WorkTemp sta vdcdat ;store low byte lda #$18 sta vdcadr lda vdcdat ;going set to copy ora #$80 sta vdcdat ;set bit 7 lda #$1e sta vdcadr stx vdcdat ;copy x bytes rts
LongCopy sta WorkTemp+1 ;offset high wordinc Source lda (source),y tax ;save length lsr WorkTemp+1 ror lsr WorkTemp+1 ror lsr WorkTemp+1 ror sta WorkTemp ;offset low txa and #%111 ;isolate length bits clc adc #3 ;minimum copy length tax ;bytes to copy jmp Rdy2Subt Now to Test it some more. Attachments:nuvdcpress2.prg (256 B)
|
|
|
Post by hydrophilic on Jul 25, 2014 17:36:16 GMT
Nice start, I guess you can't really tell what would be best until you have working encoder. In my encoder, when debug/analyze mode is turned on, it not only records all the codes it does use, but all the codes it wanted to use but couldn't (for example RLE length over 63, or copy distance over 2K), that way I know if I should change things.
Of course I imagine you would want to optimize it for GoldBox game, but to optimize it for general images, you need to test lots of different images. Only testing 6 to 12 images can lead to bad results with many other images if you optimize too tightly with a small sample set.
Actually I wouldn't worry too much, this is suppose to be a simple/fast scheme... you can get much better compression using variable codes but is much slower to decompress.
Speaking of speed, if your tests work out, another thing that would save 9 cycles for every code is replace the JSR Parse / RTS with a loop... remove the JSR Parse and replace all RTS with JMP MAINLOOP. Something like
jmp Parse ; was jmp First
MAINLOOP wordinc Source waitvdc ;wait for copy or fill to finish
Parse ldy #$00 ... Skip ... jmp MainLoop Fill ... jmp MainLoop etc. Somewhere in there (end of data), you would need an RTS but without the PLA PLA.
I would only use JSR if implementing more complex compression, for example where the decompressor can call itself to decompress a code (for pattern fill or other advanced things)
|
|
|
Post by VDC 8x2 on Jul 25, 2014 20:32:57 GMT
Compiled with the changes you suggested. now to test it again.
|
|
|
Post by VDC 8x2 on Jul 25, 2014 23:19:33 GMT
I was looking at my code and thinking 2 was a break even point in compression. We don't want to break even, we want to gain ground! I reworked to this for rle and copy values: 3 to 63 normal, 2=64, 1=128 and 0=256. The gains outweigh the lose of 2 value. I reworked the LongCopy size values to this: 3 to 7 normal, 2=64, 1=128 and 0=256. The gains are the most here, I think. Source = $fd destination = $fb WorkTemp = $fb vdcadr = $d600 vdcdat = $d601 defm WAITVDC bit $d600 bpl *-3 endm
defm WordInc inc /1 bne *+4 inc /1+1 endm *=$1300 Start lda destination+1 ;high byte ldy #$12 sty vdcadr WAITVDC ;vdc ready? sta vdcdat ;high byte dest iny sty vdcadr lda destination ;low byte sta vdcdat ;low byte set. destination is now set jmp Parse
MAINLOOP wordinc Source waitvdc ;wait for copy or fill to finish Parse ldy #$00 lda (source),y lsr bcs SecondTest ;is it long or short copy lsr bcc Skip ;skip x bytes
Fill cmp #$01 ;is it 1 bne @not1 lda #$80 ;1 becomes 128 bne mainfill ;we know it is nonzero so branch with it @not1 cmp #$02 bne mainfill lda #$40 ;2 becomes 64 mainfill tax ;0 will become 256 dex ;number of bites to fill. 3to63 or 64,128,256 lda #$18 sta vdcadr lda vdcdat ;going to set to fill and #$7f sta vdcdat ;clear bit 7 wordinc source lda #$1f sta vdcadr WaitVdc ;wait for ready lda (source),y ;byte to repeat sta VDCDAT lda #$1e sta vdcadr stx vdcdat ;start fill jmp mainloop Skip tax ;transfer to x for counter bne @notzero rts ;done so peace out. @notzero lda #$1f sta vdcadr @loop wordinc Source ;get next byte lda (source),y WAITVDC ;vdc ready? sta vdcdat ;store value dex bne @loop jmp mainloop SecondTest lsr bcs LongCopy
Copy cmp #$01 ;is it 1 bne @not1 lda #$80 ;set to 128 bne PutInX ;we know it not zero so can use it to branch @not1 cmp #$02 bne PutInx lda #$40 ;set to 64 PutInx tax ;copies -x bytes forwords +bytes bne @not2 ;is it 0 ldy #$01 ;set high byte to 1 if zero @not2 sta WorkTemp ;low byte sty WorkTemp+1 ;high byte Rdy2Subt ldy #$13 sty vdcadr lda vdcdat sec sbc WorkTemp ;low bite - value. sta WorkTemp dey sty vdcadr waitvdc ;wait because reg 12 lda vdcdat sbc WorkTemp+1 ldy #$20 ;source for copy sty vdcadr waitvdc sta vdcdat ;store high byte iny sty vdcadr lda WorkTemp sta vdcdat ;store low byte lda #$18 sta vdcadr lda vdcdat ;going set to copy ora #$80 sta vdcdat ;set bit 7 lda #$1e sta vdcadr stx vdcdat ;copy x bytes jmp mainloop
LongCopy sta WorkTemp+1 ;offset high wordinc Source lda (source),y tax ;save length lsr WorkTemp+1 ror lsr WorkTemp+1 ror lsr WorkTemp+1 ror sta WorkTemp ;offset low txa and #%111 ;isolate length bits cmp #$02 bne @skip1 lda #$80 ;set to 128 bytes bne @copybyte ;we know its not zero so use to branch @skip1 cmp #$02 bne @copybyte lda #$40 ;set to 64 bytes @copybyte tax ;if zero will become 256 jmp Rdy2Subt Now more stuff to test! Attachments:nuvdcpress2.prg (284 B)
|
|
|
Post by hydrophilic on Jul 26, 2014 5:25:26 GMT
I worry you're doing too much customization without testing it against an encoder.
Like before it was doing RLE 2~63 or 128 or 256... but now it is doing 3~64 or 128 or 256. I doubt that one extra byte will make any significant gain in most files, and no difference in many files... but the more complex code makes all RLE operations slower.
Also an RLE of 2 is sometimes useful. Now you may find that hard to believe, I know that I did when I saw the statistics. It took some work to figure out why... I can't explain it in words, so here is an example: 63 uncompressible bytes, run length of 2(some byte), run length of 6(some other byte)
Now *without* RLE 2 it encodes as 1 byte(code) + 63 bytes(raw data) + 1 byte(code) + 2 bytes(raw data) + 1 byte(code) + 1 byte(RLE data) = 69 bytes
With RLE 2, it encodes as 1 byte(code) + 63 bytes(raw data) + 1 byte(code) + 1 byte(RLE data) + 1 byte(code) + 1 byte(RLE data) = 68 bytes
I remember reading about some compressor (BASIC 8 or of iPaint ?) that made similar "mistake" with Copy. They never allowed 3 bytes for Copy, because normally it would not save any bytes (only break even)... but a similar example could be applied. Sometimes breaking even is good! If you have to add a raw(skip) packet instead, then you would loose a byte!
Well now hopefully you believe that it *can* happen, but if you're like I was, you're probably thinking it isn't very important / unlikely. All I can say is, after testing thousands of images, having a break-even code is better (in general) than extending a code by +1. I guess it is because the break-even case can happen on a semi-regular basis; but the chance that Big Blocks can be RLE/Copied will either be well inside the limits (for example 50 out of 63) or way outside the limits (like 100 of 63)... the chance it would be exactly the alterante +1 code (like 64 out of 63+1) is very very slim.
Don't get me wrong, your ideas aren't all bad. Like having different sizes for LongCopy would be good; I like your idea of 64 copy bytes! I wouldn't allocate code values for 128 and 256 because they are rare. When they do occur, you just need 2 or 4 of the CopyX64's... you still get amazing compression... MUCH more common however are the shorter lengths, so you should prioritize them.
Anyway, I wouldn't spend much time trying to tweek it until there is real data available from an encoder.
|
|