I would guess 16*5. Like maybe LDA #0 STA high LDA low STA temp ;save *1 ASL ROL high ;*2 ASL ROL high ;*4 ;carry is clear ADC temp BCC + INC high ;*5 + ;now *16 ASL ROL high ;*2 ASL ROL high ;*4 ASL ROL high ;*8 ASL ROL high ;*16 STA low
Well for low-level stuff, where the multiply will presumably called many times, like in a loop (or nested loops), you might want something faster. If you're multiplying an 8-bit number by 80, and you have 2 pages (512 bytes) of RAM available somewhere, then a look-up table might be better. (definately faster) ;takes 10 to 12 cycles ; .A contains value to multiply tay ;index tables lda loTab80,y ;low byte ldx hiTab80,y ;high byte
For the tables, you would probably want to use .REPT / .FOR / .DO psudo-ops, which vary by assembler.
Because the tables would use 512 bytes, you might want to think of ways to reuse the tables. For example, you could make the tables actually hold value*10 (instead of *80) then you would look up a value in the table and multiply it by 8.
I think *this* table method, or the previous posted code, are rather simple which is what you asked for. There is also generic n-bit multiply routines, and fast-multiply routine, but they are more complex and the fastest (general) multiply method uses 1024 bytes for tables.
I was just thinking, my code in a prior post only uses the .A register, so pretty good for loops, although a bit slow. The code above (in this post) uses all general-purpose registers. Which means you would need to save and restore the X and Y registers in the general case. But below is an alternate version of the table method that only uses .A, it is a bit slower than the table method using X and Y, but faster than adding code to save and restore X and Y. The main restriction is the tables need to be page-aligned; additional issues are the code can't be in ROM, and it is not thread-safe (both because self-modifying code).
;takes 19 or 20 cycles (if 'high' is ZP or not; code not in ZP) ; .A contains value to multiply sta getLo+1 sta getHi+1 getHi: lda hiTab80 sta high getLo: lda loTab80
For comparison, it looks like the original post using only .A takes about 64 cycles, assuming all temporaries are in zero page (or 75 cycles if none are ZP).
Last Edit: Jul 11, 2014 19:31:04 GMT by hydrophilic: added comparison time
Nice idea, C128Man, but I think the OP was interested in speed. Calling the BASIC ROM is *much* slower. But if the goal if minimum code bytes, then calling ROM is a good idea -- thank you!
Thanks for the links, wegi ! The a^2-b^2 code was my first thought too, but for a constant (like 80) a custom solution (above) is faster. But thanks for sharing links... that method is superior in the general case!
I don't have a specific link... I wanted to publish an eBook about the C128 ROM years ago, but the technology at the time was too primitive (and now I'm busy with other things). So I can only suggest looking at some books on BombJack, like the Commodore 128 Programmers Reference Guide (128PRG) written by Commodore or perhaps Commodore BASIC 7.0 Internals.
In short, there is a set of JMP codes into BASIC ROM located in the $af00 page of ROM. These routines can do thinks like math, graphics, and running programs. Obviously you are interested in the math... Anyway, hope this helps!
By the way, the math routines are generally easy to use, but they are VERY difficult to debug, if you are using the built-in Machine Language MONITOR. This is because the MONITOR uses the floating-point-accumulator (FAC) for internal/temporary use. If you want to use the MONITOR while testing the BASIC ROM math code, then you should copy the results of FAC into a safe spot of RAM.