|
Post by VDC 8x2 on Mar 6, 2015 23:17:53 GMT
Here is some code for byte double. It was posted on the old board, not on this board yet.
So, with out further ado:
reslo = $9b ;low byte of answer reshi = $9c ;high byte of answer *=$1300 ; takes one byte and makes it into 2. doubling the pattern in it. ; takes .a and puts results into .a and .x ; low in .a and high in .x start sta reslo jsr Rolling sta reshi jsr Rolling ldx reshi rts
Rolling lda #%00010000 clc @loop asl reslo rol bcc @loop ;4 bits in a now goto index tax lda table,x rts Table byte $00,$03,$0c,$0f,$30,$33,$3c,$3f byte $c0,$c3,$cc,$cf,$f0,$f3,$fc,$ff
It use the traveling bit technique for the loop
|
|
|
Post by hydrophilic on Mar 9, 2015 6:45:55 GMT
Very interesting code... thanks for sharing! But... I am not sure what you mean by "byte double" by reading code. Can you give examples? Or a download .PRG / .BIN that we can test? Call me stoopid, but it seems like a sneeky nibble copy ?... ... obviously I fail to understand
|
|
|
Post by VDC 8x2 on Mar 9, 2015 12:52:00 GMT
lets say you have a letter on the screen and you want it to be 2 cells wide. This routine would double the bit pattern preserving the letter.
It is doubling the pattern in each nibble into a byte to achieve that.
|
|
|
Post by VDC 8x2 on Mar 9, 2015 20:52:45 GMT
|
|
|
Post by hydrophilic on Mar 11, 2015 7:40:01 GMT
Based on your description (older post) this basically 2x expansion... similar to converting 320x200 VIC image into 640x200 VDC image (but obviously only 8 pixels -> 16 pixels in this case).
Yeah, I see how that can relate to your (much anticipated) VDC Game!
Based on that, I can see why you would work on nibbles (each 4 pixels -> 8 pixels).
Now I try your code (newest post)... so input is .A (8 bits) and output is .XA (16 bits)... cool , I understand more Based on my tests, each call to $130d ("Rolling") takes 60 cycles typical
Easy optimization: remove CLC from "Rolling"... now it takes 58 cycles per nibble (or [58+12]*2 + 6 = 146 cycles per byte) Easy optimization save 4 cycles (about 4/146 = 2.7%)
Complex optimization:
sta $9b lsr lsr lsr lsr jsr nib2byt sta $9c lda $9b and #$0f jsr nib2byt ldx $9c rts
nib2byt: tax lda n2bTab,x rts
n2bTab: .byte $00, $03, $0c, $0f .byte $30, $33, $3c, $3f .byte $c0, $c3, $cc, $cf .byte $f0, $f3, $fc, $ff
So... each call to "nib2byt" = 6 [main] + 12 [JSR/RTS] = 18 cycles... and thus total "complex code" is 3 + 4*2 + _18_ + 3 + 3 + 2 + _18_ + 3 = 58 cycles
Note that 58 cycles (complex code) versus 146 cycles (simply optimized code) is about 60% faster! The code size is about the same.
Finally, you can make UBER-FAST code, but with code/data size MUCH more than before... Imagine...
ldy source lda lowTable,y ldx highTable,y rts
The UBER-FAST code only takes 3 + 4 + 4 = 11 cycles (raw) or 23 cycles (including JSR/RTS)... but it requires an extra 512 bytes (256 for 'lowTable' and 256 for 'highTable')
Which is best for you?? I can not say for sure! It all truly depends on "need for speed" versus "free memory" !! Sorry there is no definite answer... it all depends on your situation
|
|