|
Post by oziphantom on Nov 16, 2018 6:26:23 GMT
Do we know when and how the VDC DMA will operate? Does it only work in VBLank, does it also tick along in HBLank. Can it use the extra bandwidth offered by Double pixel mode to interleave?
Some Logic Analyser traces would be really handy I feel
|
|
|
Post by oziphantom on Nov 16, 2018 7:40:31 GMT
Also has anybody done tests on the versions, how they handle overlap? are they overlap safe, does it always shift up, down etc
|
|
|
Post by willymanilly on Nov 17, 2018 9:14:27 GMT
Do we know when and how the VDC DMA will operate? Does it only work in VBLank, does it also tick along in HBLank. Can it use the extra bandwidth offered by Double pixel mode to interleave?
Some Logic Analyser traces would be really handy I feel I created the block fill test in the VDC split program to analyse aspects of this. It's still a work in progress and I plan on releasing a greatly improved test suite in the near future which will have options to automatically dump the results into a bin file. I've been using all this data to improve the low level emulation of Z64K. Any additional results from other sources are always very welcome.
|
|
|
Post by oziphantom on Nov 23, 2018 4:48:43 GMT
So trying to move 32 bytes via DMA, so I would expect it to take about 64 + some overhead VDC cycles.. during VBLANK.. but it seems to take over 120. Does anybody have a good heuristic for working out roughly how many clocks it takes for a DMA to complete during VBlank? if its going to take that long for small copies, then it would be worth making an IRQ loading chain to do them. Just need to get a decent guess on the IRQ timer for when the DMA will complete.
|
|
|
Post by bjonte on Nov 23, 2018 21:07:05 GMT
My plan to tackle it is to break several times during the frame to do a bit of VDC work and continue later if it starts to take time.
|
|
|
Post by willymanilly on Nov 23, 2018 21:40:30 GMT
It depends when you commence the DMA how long it takes. Immediately after VBLANK is set the VDC still requires VDC cycles to buffer the character and attribute data (only if attributes are enabled) for the 1st line of the next frame. The priority for how VDC cycles are consumed seem to be in order:- - Bitmap data (every line in the display region for both text and bitmap modes) Default 80. There seems to be a few extra cycles consumed here. I assume the VDC requires a few cycles to setup the memory pointers. I think this applies to the character and attribute fetches mentioned below as well...
- refresh DRAM (every scanline). Register 36 determines the number of cycles
- Character data (text mode only). Initiated every Character total height number of lines for display total number of rows+1.
- Attribute data (text and bitmap modes but only when attributes enabled). Initiated after character data fetching has completed (text mode) otherwise initiated every Character total height number of lines for display total number of rows+1.
- VDC DMA. (block write, block fill)
In the case where there are not enough VDC cycles to complete the buffering of character and attribute before the next VDC row, the current fetching continues until complete. It does not automatically commence fetching the next row of characters. This ensure the VDC does not get stuck and frees up cycles for DMA transfers. This has the side affect of a corrupt display.
The above is simplified but gives you an idea. As discussed previously, the tests I'm doing on this are still a work in progress. I will update the observed model as new information is discovered.
|
|
|
Post by jmpff3d on Nov 24, 2018 0:53:46 GMT
Below, some text on VDC musings from 22 years ago .. in the hopes that this somehow adds a little to the topic at hand.
I suspect most of this (if not all of it) is old hat by now.
From: phdss@netins.net (Brett Tabke) Newsgroups: comp.sys.cbm Subject: Re: 2 VDC questions Date: Fri, 06 Dec 1996 16:50:10 -0500 Organization: Professional Hybrid Development Software Systems
Eric Christopherson <Draximus@fab.net> wrote:
EC> 1. Is it possible to get a smaller color cell size than 8x2 with EC> the 128's VDC? If so, how? EC> 2. Is it possible to use raster interrupts with the VDC? If so, how? EC> Eric Christopherson
Sorry Eric, AFAIK, it is a big *no* to both questions (a real bummer).
However you can test the vertical blanking period (VBP) and make some screen adjustments while the VDC is not actively displaying. With a fair to good routine you can sync the VBP and time out screen changes based on the previous VBP. By using a block copy routine to copy predefined shapes within VDC memory to the screen, you can perform fairly good looking sprite imitations. Certainly not as smooth, but block copy is quite fast and can move more data than a simple sprite size chuck.
I don't mean to spew a long reply at you, but here is part of an idea outline for a C=Hacking article I started but never quite got finished.
; -------------------------------------------------------------
Tricks to speeding up overall VDC display speed:
- Reduce the hb of the dram refresh rate (reg 36) to zero (there is still a hard coded low byte of the refresh rate). It results in a 20% overall speed increase. This increase can be seen in ready mode. List a long basic program and time it - then set the HB of the vdc refresh rate to zero and list the program again - instant bolt on speed improvement.
- Reduce the number of visible columns to around 60. Another 20%.
- Reduce the number of visible rows. (major increase).
The above 3 suggestions hold true for both text and graphics modes.
With the high byte dram refresh rate reduced to zero, it has been my experience that you can write or read 8 consecutive bytes to or from the VDC without checking the update status register! After 8 bytes the next byte written is lost.
Block Copying:
The usage of the block copy feature is incredibly fast and not to be under estimated. You can recopy the entire visible screen at near 10-11 frames a second. If you reduce the size of the screen and reduce the size of data segments to be copied (say 10cols by 10 rows or 80x80 pixels) you can approach 30 updates a second (very suitable for animation) Granted, if you restrict yourself to a 16k VDC setup, obviously there is going to be some updating from the 128 side, that would bring everything to a screeching halt.
Using block copy for speed, takes some experimenting. Here are some of the things I've learned through trial and error:
- Try to block copy data in multiples of 8. It takes just as long to copy 9 bytes as it does 15 bytes; but 8 bytes is much faster than 9 bytes. Evidently there is an internal bit mask operation where the VDC must calc the remainder off a multiple of 8.
- It takes longer to setup and copy non-consecutive columns that it does to copy the whole set of rows. For example, suppose you want to copy columns 5 through 10 on rows 15 to 20. The destination is the screen and the source is elsewhere in VDC memory. To copy the 5 rows by 5 columns would take 5 separate block copy commands, but if you were to copy the entire set of rows it would be quicker. This is because all the overhead involved in setting up each of the 5 block copies. To copy all the rows all that is required is to keep resetting the number of bytes to copy.
To use this fact to your advantage, you can copy the screen ram out to a separate vdc location, update the ram there, and then block copy the whole chunk back to the screen in one fell swoop.
- When working with large numbers of rows to be scrolled via block copy routines, be careful not to copy to many at a time without updating the accompanying attribute ram. The screen will flicker with the text moving with out the colors. (watch pocket writer 2)
- If speed is not critical, block copying during the vertical blanking period is much smoother and seamless; however you don't have long to work with.
- Block copying is close to 1 byte per cycle (+- 5% overhead).
- You don't have to wait for an operation to be completed before doing other things. Such as, you initiate a block copy or fill, and then immediately go do something else on the 128 side - no need to set around and wait for the update status register to get ready. If you copy 255 bytes max, (my guess) is that it takes between 300-400 cycles to complete depending on the state of the vertical blanking period. 400 cycles of code at an average of 3-4 cycles per instruction is over 100+ instructions of code you can execute elsewhere while the VDC does the block copy thing.
- Don't get caught in the WRITEREG lie. With 2 exceptions: you do not have to wait for update status to become ready while writing to a register. The first exception is that you can not be in the middle of a block fill or a block copy (must wait till it finishes), and two is you can't do consecutive writes to vdc ram.
If you are say, updating the current VDC ram address (reg18/19), then don't worry, write to reg 18 and rock out on 19 without checking the status reg.
The exception to the exception:
With the dram refresh high byte at zero, IRQ's off, and the VDC status register ready, you can write 8 consecutive bytes to vdc ram without checking the update status! Additionally; if you are doing other things inbetween the writes to ram (such as loading a cpu register with the byte to write) you can usually ignore the update status register entirely!
- Don't need more than one color? Switch the VDC into monocolor mode and watch all screen activity speed up easily 50%. It is breath taking, spectacular, most awesome. If you need color, but it is static on the screen, then don't update attribute ram - just update text or gfx ram [In a word? ZED!].
; ----------------------------------------------------------------
Brett
-- Brett Tabke | phdss@netins.net
|
|
|
Post by nikoniko on Nov 24, 2018 11:26:22 GMT
Below, some text on VDC musings from 22 years ago .. in the hopes that this somehow adds a little to the topic at hand. I suspect most of this (if not all of it) is old hat by now. - When working with large numbers of rows to be scrolled via block copy routines, be careful not to copy to many at a time without updating the accompanying attribute ram. The screen will flicker with the text moving with out the colors. (watch pocket writer 2)
You can use register 27 to your advantage in that case and set your row increment to twice the width of your text screen, then set your attributes start address to the first off-screen byte, creating an interleaved character and attribute setup. So in the case of an 80 column screen starting at $0000, set reg 27 to 80, reg 20 to 0, reg 21 to 80. That gives you 80 bytes of text followed by 80 bytes of attributes. Then when you use block copy to scroll the screen up, you'll be copying both characters and attributes close in time with little effort, greatly reducing the opportunities for flicker. Prior to scrolling you can write the new line into off-screen memory so you can move it into view through the same series of copies. If you also delay scrolling until after the first row of characters is drawn, I think all block copies will happen behind the beam before it can catch up again and the display will be completely flicker-free. In many cases though it may make better sense to use more VDC memory and scroll instantly by adjusting screen and attribute pointers. That also makes it simple to scroll in the other direction. Of course you have to accommodate jumps from one end of the screen buffer to the other but that's not complicated.
|
|
|
Post by oziphantom on Nov 24, 2018 11:34:52 GMT
I did some more samples , I'm doing 24 32byte DMAs during VBlank these are the times 225713727 225714023 = 296 225714026 225714327 = 301 225714330 225714627 = 297 225714630 225714925 = 295 225714928 225715233 = 305 225715236 225715533 = 297 225715536 225715837 = 301 225715840 225716137 = 297 Which is worse than what I had before.. so 32 bytes, Vblank, on double pixel wide screen, showing 32chars per line seems doing 300 cycles would be a good guess for the timer
|
|