|
Post by jusalak on Jan 29, 2024 17:18:59 GMT
I decided to use CIA timers to measure the time taken by CPU switch from Z80 to 8502. I created a test program 1, 2 and 3. They are the same except that program 2 and 3 add 4 and 8 Z80 NOPs to the code. (And yes, it was the Z80 timer test which inspired this, thanks to the author).
main .org $1c01 .byte $0c,$08,$0a,$00,$9e,$37,$31,$38,$31,$00,$00,$00 lda #$3e sta $ff00 lda #$00 ;disable 2 Mhz sta $d030 _loop1: ;wait for one frame lda $d011 bpl _loop1 _loop2: lda $d011 bmi _loop1 _loop3: lda $d011 bpl _loop2 sei lda #$0b ;disable VIC screen sta $d011 lda #$00 sta $d01a ;disable VIC interrupt lda #$7f sta $dc0d ; disable CIA interrupt lda #$00 sta $dc0e ; stop CIA timer lda #$ff sta $dc04 ; load the timer with FFFFh sta $dc05 lda #$c3 sta $ffee ; store JP instruction for Z80 mode start lda #<z80code ; store lo-byte address sta $ffef lda #>z80code ; and hi-byte address sta $fff0 ;of Z80 code lda #$b0 ; load Z80 configuration sta $d505 ; to MMU MCR register nop lda #$08 ;stop the timer sta $dc0e lda $dc04 ;load timer values and put them to stack pha lda $dc05 pha lda #$cf ;store back RST instruction sta $ffee lda #$00 ;set MMU configuration to ROM sta $ff00 cli jsr $e056 ;execute RUN/STOP-RESTORE sequence jsr $e109 ;IOINIT - initialize I/O, incl. CIA timers jsr $c000 ;initialize screen pla ;pull timer value from stack tax pla jsr $b89f ;output timer value lda #$0d ;insert one line jsr $ffd2 jmp ($0a00) ;return to BASIC z80code
.byte $01,$0e,$dc ;ld bc,dc0eh .byte $3e,$19 ;ld a,19h ;start the timer .byte $ed,$79 ;out (c),a
.byte $00,$00,$00,$00 ;4x nop ;added in test 3 .byte $00,$00,$00,$00 ;4x nop ;added in tests 2,3
.byte $c3,$e0,$ff ;jp ffe0h - give control back to 8502
The program is rather simple, it pushes the timer value into stack, performs IOINIT to restore CIA default values, then outputs the final timer value to the screen. Now, the results, on a real PAL C128, the final timer values for tests 1,2 and 3 are FFD6, FFCE, FFC6, respectively (there was no fluctuation). That makes 41 1MHz clock ticks for test 1, 49 ticks for test 2 and 57 ticks for test 3. Z80 takes 4 Z80 cycles = 2 1 MHz cycles. So 4 Z80 NOPs make 8 1 MHz cycles, matches nicely with the results. For the test 1, I counted the between timer start - stop, including the Z80 code executed at ffe0.
JP ffe0h - 10 z80 cycles - 5 1MHz cycles DI - 4 z80 cycles - 2 1MHz cycles LD a,3eh - 7 z80 cycles - 3,5 1MHz cycles LD (ff00h),a - 13 z80 cycles - 6,5 1MHz cycles LD BC,d505h - 10 z80 cycles - 5 1MHz cycles LD a,b1h - 7 z80 cycles - 3,5 1MHz cycles OUT (c),a - 12 z80 cycles - 6 1MHz cycles
Z80 part: 31,5 1MHz cycles
NOP - 2 1MHz cycles LDA #$08 - 2 1MHz cycles STA $DC04 - 4 1MHz cycles
8502 part: 8 1MHz cycles
Total: 39,5 1MHz cycles
So by my calculation, there is 1,5 1MHz cycles lost in the switching of the CPU (test 1 real hw 41 - calculated 39,5 gives 1,5).
Z64K gives FFD9-FFDA for test 1, FFD1-FFD2 for test 2 and FFC9-FFCA for test 3. That makes 37-38, 45-46 and 53-54 cycles, respectively. There is a consistent 3-4 cycle difference between Z64K and real hardware test results. Z64K somehow is that much faster.
VICE gives FFD3, FFCB and FFC3, so it is consistently 3 cycles slower than real C128. Interestingly the difference is in another direction than with Z64K.
test program 1
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 87 8d ef ff a9 1c 8d >01c50 f0 ff a9 b0 8d 05 d5 ea >01c58 a9 08 8d 0e dc ad 04 dc >01c60 48 ad 05 dc 48 a9 cf 8d >01c68 ee ff a9 00 8d 00 ff 58 >01c70 20 56 e0 20 09 e1 20 00 >01c78 c0 68 aa 68 20 9f b8 a9 >01c80 0d 20 d2 ff 6c 00 0a 01 >01c88 0e dc 3e 19 ed 79 c3 e0 >01c90 ff 00 00 00 00 00 00 00
test program 2
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 87 8d ef ff a9 1c 8d >01c50 f0 ff a9 b0 8d 05 d5 ea >01c58 a9 08 8d 0e dc ad 04 dc >01c60 48 ad 05 dc 48 a9 cf 8d >01c68 ee ff a9 00 8d 00 ff 58 >01c70 20 56 e0 20 09 e1 20 00 >01c78 c0 68 aa 68 20 9f b8 a9 >01c80 0d 20 d2 ff 6c 00 0a 01 >01c88 0e dc 3e 19 ed 79 00 00 >01c90 00 00 c3 e0 ff 00 00 00
test program 3
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 87 8d ef ff a9 1c 8d >01c50 f0 ff a9 b0 8d 05 d5 ea >01c58 a9 08 8d 0e dc ad 04 dc >01c60 48 ad 05 dc 48 a9 cf 8d >01c68 ee ff a9 00 8d 00 ff 58 >01c70 20 56 e0 20 09 e1 20 00 >01c78 c0 68 aa 68 20 9f b8 a9 >01c80 0d 20 d2 ff 6c 00 0a 01 >01c88 0e dc 3e 19 ed 79 00 00 >01c90 00 00 00 00 00 00 c3 e0 >01c98 ff 00 00 00 00 00 00 00
Dumps can be copied and pasted to the MONITOR in VICE and Z64K.
|
|
|
Post by jusalak on Jan 29, 2024 19:35:16 GMT
Attached is the converse 8502 > Z80 switch duration test. Test 2 and test 3 are the same as test 1, except that they have 4 and 8 Z80 NOPs added to the code. Tests 2 and 3 again run consistently 8 and 16 1 MHz cycles slower.
main .org $1c01 .byte $0c,$08,$0a,$00,$9e,$37,$31,$38,$31,$00,$00,$00 lda #$3e sta $ff00 lda #$00 ;disable 2 Mhz sta $d030 _loop1: ;wait for one frame lda $d011 bpl _loop1 _loop2: lda $d011 bmi _loop1 _loop3: lda $d011 bpl _loop2 sei lda #$0b ;disable VIC screen sta $d011 lda #$00 sta $d01a ;disable VIC interrupt lda #$7f sta $dc0d ; disable CIA interrupt lda #$00 sta $dc0e ; stop CIA timer lda #$ff sta $dc04 ; load the timer with FFFFh sta $dc05 lda #$c3 sta $ffee ; store JP instruction for Z80 mode start lda #<z80code ; store lo-byte address sta $ffef lda #>z80code ; and hi-byte address sta $fff0 ;of Z80 code lda #$19 ;start the timer sta $dc0e lda #$b0 ; load Z80 configuration sta $d505 ; to MMU MCR register nop lda $dc04 ;load timer values and put them to stack pha lda $dc05 pha lda #$cf ;store back RST instruction sta $ffee lda #$00 ;set MMU configuration to ROM sta $ff00 cli jsr $e056 ;execute RUN/STOP-RESTORE sequence jsr $e109 ;IOINIT - initialize I/O, incl. CIA timers jsr $c000 ;initialize screen pla ;pull timer value from stack tax pla jsr $b89f ;output timer value lda #$0d ;insert one line jsr $ffd2 jmp ($0a00) ;return to BASIC z80code
.byte $00,$00,$00,$00 ;4x nop ;added in test 3 .byte $00,$00,$00,$00 ;4x nop ;added in tests 2,3
.byte $01,$0e,$dc ;ld bc,dc0eh .byte $3e,$08 ;ld a,08h ;stop the timer .byte $ed,$79 ;out (c),a .byte $c3,$e0,$ff ;jp ffe0h ;give control back to 8502
This time there is fluctuation with a real C128. A first run after power-on or reset gives consistently 2 cycles slower values than all subsequent runs - otherwise there were no fluctuation.
Results:
test 1 - test 2 - test 3
Real PAL C128: (FFE4) FFE6 - (FFDC) FFDE - (FFD4) FFD6
Z64K: FFE4 - FFDC - FFD4
VICE: FFE7 - FFDF - FFD7 this translates to durations in cycles as follows:
test 1 - test 2 - test 3
Real PAL C128: (27) 25 - (35) 33 - (43) 41 1MHz cycles
Z64K: 27 - 35 - 43 1MHz cycles
VICE: 24 - 32 - 40 1MHz cycles
For test 1, I counted the expected cycles as follows:
lda #$b0 ; 2 1MHz cycles sta $d505 ; 4 1MHz cycles
nop ; 4 z80 cycles > 2 1MHz cycles jp <z80code>; 10 z80 cycles > 5 1MHz cycles ld bc,dc0eh - 10 z80 cycles > 5 1MHz cycles ld a,08h - 7 z80 cycles > 3,5 1MHz cycles out (c),a - 12 z80 cycles > 6 1MHz cycles
Total: 27,5 1MHz cycles
Edit: include Z80 NOP.
So it seems that in this case real C128 runs apparently 0,5-2,5 cycles faster when switching from 8502 to Z80.
With regard to the emulators,
Z64K: (0-)2 cycles slower than real C128
VICE: 1(-3) cycles faster than real C128
test program 1
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 87 8d ef ff a9 1c 8d >01c50 f0 ff a9 19 8d 0e dc a9 >01c58 b0 8d 05 d5 ea ad 04 dc >01c60 48 ad 05 dc 48 a9 cf 8d >01c68 ee ff a9 00 8d 00 ff 58 >01c70 20 56 e0 20 09 e1 20 00 >01c78 c0 68 aa 68 20 9f b8 a9 >01c80 0d 20 d2 ff 6c 00 0a 01 >01c88 0e dc 3e 08 ed 79 c3 e0 >01c90 ff 00 00 00 00 00 00 00
test program 2
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 87 8d ef ff a9 1c 8d >01c50 f0 ff a9 19 8d 0e dc a9 >01c58 b0 8d 05 d5 ea ad 04 dc >01c60 48 ad 05 dc 48 a9 cf 8d >01c68 ee ff a9 00 8d 00 ff 58 >01c70 20 56 e0 20 09 e1 20 00 >01c78 c0 68 aa 68 20 9f b8 a9 >01c80 0d 20 d2 ff 6c 00 0a 00 >01c88 00 00 00 01 0e dc 3e 08 >01c90 ed 79 c3 e0 ff 00 00 00
test program 3
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 87 8d ef ff a9 1c 8d >01c50 f0 ff a9 19 8d 0e dc a9 >01c58 b0 8d 05 d5 ea ad 04 dc >01c60 48 ad 05 dc 48 a9 cf 8d >01c68 ee ff a9 00 8d 00 ff 58 >01c70 20 56 e0 20 09 e1 20 00 >01c78 c0 68 aa 68 20 9f b8 a9 >01c80 0d 20 d2 ff 6c 00 0a 00 >01c88 00 00 00 00 00 00 00 01 >01c90 0e dc 3e 08 ed 79 c3 e0 >01c98 ff 00 00 00 00 00 00 00
|
|
|
Post by jusalak on Jan 29, 2024 21:30:09 GMT
So, a summary of the results of what 8502 - Z80 - 8502 turnaround would take:
theoretical: 27,5 + 39,5 = 67 cycles, or 68 if half-cycles are rounded up real C128 result: (27) 25 + 41 = (68) 66 cycles Z64 result: 27 + 37-38 = 64-65 cycles VICE result: 24 + 44 = 68 cycles
|
|
|
Post by jusalak on Jan 29, 2024 22:06:51 GMT
So I decided to measure the duration of the simple 8502 > Z80 > 8502 turnaround. So there is not the Z80 OUT instruction introducing any ambiguity when toggling the timer.
main .org $1c01 .byte $0c,$08,$0a,$00,$9e,$37,$31,$38,$31,$00,$00,$00 lda #$3e sta $ff00 lda #$00 ;disable 2 Mhz sta $d030 _loop1: ;wait for one frame lda $d011 bpl _loop1 _loop2: lda $d011 bmi _loop1 _loop3: lda $d011 bpl _loop2 sei lda #$0b ;disable VIC screen sta $d011 lda #$00 sta $d01a ;disable VIC interrupt lda #$7f sta $dc0d ; disable CIA interrupt lda #$00 sta $dc0e ; stop CIA timer lda #$ff sta $dc04 ; load the timer with FFFFh sta $dc05 lda #$c3 sta $ffee ; store JP instruction for Z80 mode start lda #$e0 ; simple turnaround sta $ffef lda #$ff sta $fff0 lda #$19 ;start the timer sta $dc0e lda #$b0 ; load Z80 configuration sta $d505 ; to MMU MCR register nop lda #$08 ;stop the timer sta $dc0e lda $dc04 ;load timer values and put them to stack pha lda $dc05 pha lda #$cf ;store back RST instruction sta $ffee lda #$00 ;set MMU configuration to ROM sta $ff00 cli jsr $e056 ;execute RUN/STOP-RESTORE sequence jsr $e109 ;IOINIT - initialize I/O, incl. CIA timers jsr $c000 ;initialize screen pla ;pull timer value from stack tax pla jsr $b89f ;output timer value lda #$0d ;insert one line jsr $ffd2 jmp ($0a00) ;return to BASIC
And the results:
a real PAL C128: FFCF - 48 cycles Z64K: FFD1 - 46 cycles VICE: FFCF - 48 cycles All the results above are stable without fluctuation.
theoretical calculation:
lda #$b0 ; 2 1MHz cycles sta $d505 ; 4 1MHz cycles
NOP ; 4 z80 cycles > 2 1MHz cycles JP FFE0h; 10 z80 cycles > 5 1MHz cycles DI - 4 z80 cycles - 2 1MHz cycles LD a,3eh - 7 z80 cycles - 3,5 1MHz cycles LD (ff00h),a - 13 z80 cycles - 6,5 1MHz cycles LD BC,d505h - 10 z80 cycles - 5 1MHz cycles LD a,b1h - 7 z80 cycles - 3,5 1MHz cycles OUT (c),a - 12 z80 cycles - 6 1MHz cycles
NOP - 2 1MHz cycles LDA #$08 - 2 1MHz cycles STA $DC04 - 4 1MHz cycles
Total: 47,5 cycles; rounded up = 48 cycles And the dump:
>01c00 00 0b 1c 0a 00 9e 37 31 >01c08 38 31 00 00 00 a9 3e 8d >01c10 00 ff a9 00 8d 30 d0 ad >01c18 11 d0 10 fb ad 11 d0 30 >01c20 f6 ad 11 d0 10 f6 78 a9 >01c28 0b 8d 11 d0 a9 00 8d 1a >01c30 d0 a9 7f 8d 0d dc a9 00 >01c38 8d 0e dc a9 ff 8d 04 dc >01c40 8d 05 dc a9 c3 8d ee ff >01c48 a9 e0 8d ef ff a9 ff 8d >01c50 f0 ff a9 19 8d 0e dc a9 >01c58 b0 8d 05 d5 ea a9 08 8d >01c60 0e dc ad 04 dc 48 ad 05 >01c68 dc 48 a9 cf 8d ee ff a9 >01c70 00 8d 00 ff 58 20 56 e0 >01c78 20 09 e1 20 00 c0 68 aa >01c80 68 20 9f b8 a9 0d 20 d2 >01c88 ff 6c 00 0a 00 00 00 00
|
|
|
Post by nikoniko on Jan 30, 2024 4:06:07 GMT
Very interesting findings!
|
|
|
Post by willymanilly on Jan 30, 2024 8:22:04 GMT
Very interesting indeed! Z64K has been updated to match real hardware for majority of the tests. Only one where Z64K doesn't behave the same as real hardware is for the power up one where first run is off by 2 cycles on real hardware. ==>"A first run after power-on or reset gives consistently 2 cycles slower values..."
I've confirmed all tests on my real hardware as well.
|
|
|
Post by willymanilly on Feb 1, 2024 18:59:32 GMT
Thanks for all these tests and your observations. It seems even/odd cycles does have an impact if an extra z80 command is executed before switching to 8502. I've updated Z64K to match this model and all your tests including the 2-cycle difference on power up one behaves like real hardware now!
|
|
|
Post by jusalak on Feb 4, 2024 16:39:08 GMT
Some updated analysis of the results: EVEN 8502>Z80 TEST
LDA #$B0 ; 2 1MHz cycles STA $D505 ; 4 1MHz cycles
NOP ; 4 Z80 cycles > 2 1MHz cycles JP <Z80code>; 10 Z80 cycles > 5 1MHz cycles LD BC,DC0Eh - 10 Z80 cycles > 5 1MHz cycles LD A,08h - 7 Z80 cycles > 3,5 1MHz cycles 4 x NOP - 16 Z80 cycles > 8 1Mhz cycles OUT (C),A - 12 Z80 cycles > 6 1MHz cycles
Total: 35,5 1MHz cycles
Measurement after previous ODD state: FFFFh - FFDBh = 24h = 36 1MHz cycles Measurement after previous EVEN state: FFFFh - FFDBh = 22h = 34 1MHz cycles ODD 8502>Z80 TEST
LDA #$B0 ; 2 1MHz cycles STA $D505 ; 4 1MHz cycles
NOP ; 4 Z80 cycles > 2 1MHz cycles JP <Z80code>; 10 Z80 cycles > 5 1MHz cycles LD BC,DC0Eh - 10 Z80 cycles > 5 1MHz cycles XOR A - 4 Z80 cycles > 2 1Mhz cycles SCF - 4 Z80 cycles > 2 1Mhz cycles 4 x RLA - 16 Z80 cycles > 8 1Mhz cycles OUT (C),A - 12 Z80 cycles > 6 1MHz cycles
Total: 36 1MHz cycles
Measurement after previous ODD state: FFFFh - FFDCh = 23h = 35 1MHz cycles Measurement after previous EVEN state: FFFFh - FFDBh = 21h = 33 1MHz cycles Z80>8502 TEST
JP FFE0h - 10 Z80 cycles - 5 1MHz cycles DI - 4 Z80 cycles - 2 1MHz cycles LD A,3Eh - 7 Z80 cycles - 3,5 1MHz cycles LD (FF00h),A - 13 Z80 cycles - 6,5 1MHz cycles LD BC,D505h - 10 Z80 cycles - 5 1MHz cycles LD A,B1h - 7 Z80 cycles - 3,5 1MHz cycles OUT (C),A - 12 Z80 cycles - 6 1MHz cycles
Z80 part: 31,5 1MHz cycles
NOP - 2 1MHz cycles LDA #$08 - 2 1MHz cycles STA $DC04 - 4 1MHz cycles
8502 part: 8 1MHz cycles
Total: 39,5 1MHz cycles
Measurement = FFFFh - FFD6h = 29 h = 41 1MHz cycles 8502>Z80>8502 TEST
LDA #$B0 ; 2 1MHz cycles STA $D505 ; 4 1MHz cycles
NOP ; 4 Z80 cycles > 2 1MHz cycles JP FFE0h; 10 Z80 cycles > 5 1MHz cycles DI - 4 Z80 cycles - 2 1MHz cycles LD A,3Eh - 7 Z80 cycles - 3,5 1MHz cycles LD (FF00h),a - 13 Z80 cycles - 6,5 1MHz cycles LD BC,D505h - 10 Z80 cycles - 5 1MHz cycles LD a,B1h - 7 Z80 cycles - 3,5 1MHz cycles OUT (C),A - 12 Z80 cycles - 6 1MHz cycles
NOP - 2 1MHz cycles LDA #$08 - 2 1MHz cycles STA $DC04 - 4 1MHz cycles
Total: 47,5 cycles
Measurement after previous ODD state: FFFFh - FFCFh = 30h = 48 1MHz cycles Measurement after previous EVEN state: FFFFh - FFD1h = 2Eh = 46 1MHz cycles The results speak for themselves. At the EVEN state Z80 has an edge on code execution over the Z80 code execution beginning at the ODD state.
|
|