VOGONS


Emulating Area5150

Topic actions

Reply 80 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-06-27, 17:15:
It's weird that you have full glyphs showing anywhere during these effects. glyphs.PNG […]
Show full quote
superfury wrote on 2023-06-27, 12:21:

I'm testing the demo again with UniPCemu's latest commit.

It's weird that you have full glyphs showing anywhere during these effects.
glyphs.PNG

If an entire character glyph is drawn that implies that R9 is 7 at some point, which it never should be during these effects.

They might also be leftover data in the CRT framebuffer from earlier operations.
I've just modified the framebuffer to clear after Vertical Retrace, making the CRTC start rendering to a blank canvas after that. It might clear out some of those, unless rendering again (it will take care of differing scanline lengths in the same frame, by clearing them all afterwards (the length cleared being the widest scanline drawn in the buffer)).
Edit: If it was leftover data from previously rendered frames, it should now be properly cleared by the vertical retrace routine (which renders the framebuffer to the internal GPU framebuffer of the app itself).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 81 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member

Thanks to Trixter pointing out that my web version wasn't performing CGA wait states (whoops) the end credits now "work" in the browser.

I was afraid they were the most CPU intensive effect in the demo, and the effect keeps the CGA card in cycle-accurate mode a lot, so they don't run too well, and thus the sound is pretty bad, but hey. There's always more room for optimization.

skip straight to the credits:
https://dbalsom.github.io/martypc/web/areacreds.html

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 82 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

Interestingly, running 8088 MPH with the latest changes (and the mentioned rendering improvements for clearing the framebuffer) to UniPCemu I see two things:
1. The credits once again hang (8088 MPH reports being 10% off at about 1510 cycles when starting up). I see it looping something infinitely (no screen updates/audio anymore though)?
2. The cycle-accurate part in the racing of the beam became worse. It was showing, but keep interrupted by black frames? Perhaps it's because it's too much out of sync on the CPU itself (cycle-inaccurate) that's properly rendered now (that used to be hidden by the framebuffer not clearing as intended)?

Running Area5150 once again with those latest changes, I already can see the houses part having two issues:
1. The first scanline contains weird jumbled pixesl until the ending of the scrolling effect. Those probably roughly double the display width area rendered.
2. All other extra data on the remaining (past first) scanlines is properly black.
So that was simply caused by the framebuffer not clearing old data properly it seems.

Perhaps the credits had the same issue, but I'll need to run it futher to obtain that result.

Edit: 5150's 3D vector image part now works properly! 😁 At least that's fixed now (with the overscan fix). Interestingly, there's a static junk line at the bottom of the second 3D object though? It's at the bottom of the graphics area during that part and moves up and down (from the entire width of active display it seems) with the graphics part containing the 3D vector image?

Edit: Chaplin and co improved:

UniPCemu_area5150_improvedchaplin.png
Filename
UniPCemu_area5150_improvedchaplin.png
File size
28.17 KiB
Views
1892 views
File comment
Chaplin and co improved on UniPCemu
File license
Fair use/fair dealing exception

The noise in that image is indeed gone now, as i suspected (it was leftover rendered data)!
Edit: Made a little video of it running inside UniPCemu:
https://www.dropbox.com/s/ut57udqumf7oszn/Uni … mproved.7z?dl=0

The chaplin part disappearing also shows something interesting (especially at just before loading the next scene):
https://www.dropbox.com/s/l9bzhwi5ypy12x8/Uni … 2-14-36.7z?dl=0

The credits part also seemed to improve, showing a more regular pattern as well:

1666-UniPCemu_area5150credits_moreregularpattern1.png
Filename
1666-UniPCemu_area5150credits_moreregularpattern1.png
File size
46.82 KiB
Views
1890 views
File comment
Credits more regular pattern 1
File license
Fair use/fair dealing exception
1668-UniPCemu_area5150_credits_moreregularpattern2.png
Filename
1668-UniPCemu_area5150_credits_moreregularpattern2.png
File size
44.37 KiB
Views
1890 views
File comment
Credits more regular pattern 2
File license
Fair use/fair dealing exception

Counting the sets of lines vertically, i cleaerly see a 5,5,5,5,4 pattern repeating (looking at the green scanlines (each one scanline blanked)? The 4 one is always pulled left by the same amount? All those sets are always followed by a scanline at the same length it seems.
Perhaps I'm on to something?

Last edited by superfury on 2023-06-28, 00:30. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 83 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-06-27, 23:22:

Edit: Chaplin and co improved:

wow, that's looking close. Just looks like the effect ISR is still firing a bit late.

getting this right is tricky, as the initial position of the effect ISR is determined by a succession of *8* different timer ISRs that schedule the next one in the chain. only reenigne could come up with something like this 😁

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 84 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-06-28, 00:28:
superfury wrote on 2023-06-27, 23:22:

Edit: Chaplin and co improved:

wow, that's looking close. Just looks like the effect ISR is still firing a bit late.

getting this right is tricky, as the initial position of the effect ISR is determined by a succession of *8* different timer ISRs that schedule the next one in the chain. only reenigne could come up with something like this 😁

As a side note, 8088MPH reports 1517 cycles (10% from what it's expecting).
And the Area5150 credits have improved as well (though 8088MPH credits hangs when starting now). Didn't change much of the CPU, just interrupt handling performing a bit better?
And the only other thing after those affecting timing perhaps are the CGA CRTC precalcs being applied better (register 01h, maximum scanline register (9 on CGA), cursor start/end, start address and cursor location registers. Basically all I did was allowing them to be checked in one go (together causing other precalcs to all be executed if their value changes, see the FreeVGA docs for the equivelent registers updated in that case). It's basically exploiting the fact that the VGA precalcs work roughly the same, so it just writes to the VGA registers equivalents and asks the VGA precalcs to update instead (as the basic precalc behaviour is the same for all graphics cards, other than storage location and perhaps some specific additions on the CGA side). So with the exception of the row size (CGA-specific), all are just written to the VGA registers and triggered to manually update their precalculated values from those (it's basically a translation routine).
Looking at the pixels in the credits screen capture (raw capture made during vertical retrace), most final (rightmost) pixel is at location 548. Most shifts left are at location 530. Some at 542. One at 545.
I don't know if that means something. Perhaps Reenigne and co can shed some light on that?

I do know that other than the timings you guys gave for the INT handling in an earlier post, it's behaviour should be exactly as was described. The retrieving of the IRQ from the PIC is instant though (it isn't timed). Other than what was mentioned (and the unknown cycle mentioned being the specific jump into the routine during INT3?) nothing is timed. The PIQ isn't stalled until the location mentioned (so it might be able to retrieve something at that point, stalling once a bus request arrives). Like I said, the basic PIC communication for IRQs is untimed (or you could call it instant, since the CPU isn't ticking any cycles on it and retrieving it's interrupt vector from the PIC).

HLT is the same way btw (2 cycles EU when first executed and putting the CPU in HLT state, then instantaneous once a IRQ arrives(0 cycles to read the IRQ vector from the PIC, mentioned INT timings starting at that point (except the extra cycle mentioned somewhere in here))).

Looking at the display failing it almost makes me think of the weird effect that the ET4000 chips had when the VRAM wasn't properly handling the aperture base address mapping (think the Tseng-specific ways of mapping the aperture combined with the values in the ports to specify extended memory locations). Almost like an AND-mask is applied to the data somehow (masking off upper bits or something related)? See This thread

Edit: Thinking about it, I think I see (just like with Chaplin) roughly the left 1/3 duplicated twice on the screen? Perhaps as a ghost of itself (interleaved onto scanlines) or perhaps just a straight duplicate? The plant with the red bud at the left side of the water seems to appear twice instead of just once?

1671-UniPCemu_5150_duplicatedplantsontheleftwaterside.png
Filename
1671-UniPCemu_5150_duplicatedplantsontheleftwaterside.png
File size
28.87 KiB
Views
1858 views
File comment
Duplicated plant on the start of the scanlines.
File license
Fair use/fair dealing exception

It looks like it's displaying the same plant twice. First a lighter version, then a darker version?
The same thing seems to happen with the Chaplin image? Perhaps related?
Is it doing some specific trick that might cause this?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 85 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

OK. Just noticed something interesting while it's drawing the Ctrix greets part:
I see the text being drawn from left to right onto the first block (matching the left chaplin area range), then once again the same area (left border), followed by the remainder of the area where the sun is located until the right border?
Thinking about it, roughly the size of the black area at the right side of the image fits that size?
I might be onto something here.

Edit: Just confirmed on the second text row: first two blocks are appearing simultaneously (same blocks being drawn), followed by the left to right border of the "active display" (non-black area I mean in this case) being filled with remaining text.

Assuming the drawing always happens from the left border to the right border of the screen, that would mean some kind of weird mirroring writes on the first 2 blocks (each half of the black part), following by correct writes from the left border to the right border (the green border if you can call it that).

Anyone having a clue what might be happening?

https://www.youtube.com/watch?v=fWDxdoRTZPc @ 9:52. That part seems to exhibit slightly different behaviour: The first line is drawn normally from left to right (although shifted left at the first left indent). The second text line draws | || E, followed by writing interleaved VERYONE at the start of the (scan)line (starting from the left border).
Edit: Those 3 flowers in a triangle on the left side (roughly the second block being drawn in parallel with the 1st leftmost block) seem to be missing also? Perhaps a code error in UniPCemu's CPU emulation (wrong image block being drawn)? It seems like a duplicate of the first block (although maybe a bit darker)?

Perhaps an incorrect transition from drawing the first block (with the first plant) to the second block (the three red roses (or something like that) in a triangle formation)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 86 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-06-28, 01:42:

Assuming the drawing always happens from the left border to the right border of the screen, that would mean some kind of weird mirroring writes on the first 2 blocks (each half of the black part), following by correct writes from the left border to the right border (the green border if you can call it that).

With the Chaplin/Wibble effect in particular, it's pretty much a static image that the effect fiddles around with by changing the start address before each scanline is drawn. The effect doesn't have to do anything to draw the graphics besides continually fiddle with the CRTC registers to set up each row, then the CGA card just scans out video memory for that row. The Lake effect is a bit different of course because it has to draw and erase the credits text and play PCM audio, but both effects use basically the same principle.

I recorded a video for you of the end/Lake effect being drawn, stepping through the effect one instruction at a time. We start at the entry into the effect ISR at CS:0400.
You can watch the raster beam go, since we are racing it, after all, and you can see the status of both the CGA's CRTC registers and some internal bookkeeping variables. Keep an eye on the horizontal and vertical character counters hcc and vcc, as well as 'vma', the video address generated by the CRTC.

About halfway through I remember to open the instruction history window. The number right before the mnemonic is the number of cycles that instruction took, if you want to compare to your own code.

https://www.youtube.com/watch?v=bT2fQpz0l1E

I would recommend using a youtube-downloader to grab the mp4 so you can scrub through it at your leisure in VLC. Of course, you could always just use MartyPC and repeat this exact process at your leisure.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 87 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've been checking out the MC6845 documenation a bit to check if something's missing wrt timings emulated in UniPCemu using the CGA.

Eventually found the Interlace Mode register. The documentation is a bit weird on that:

Interlace Mode Regiser (R8) - This 2 bit write-only
register controls the raster scan mode (see Figure 11).
When bit 0 and bit 1 are reset, or bit 0 is reset and bit 1
set, the non-interlace raster scan mode is selected. Two
interlace modes are available. Both are interlaces 2 fields
per frame. When bit 0 is set and bit 1 is reset, the interlace
sync raster scan mode is selected. Also when bit 0 and bit
1 are set, the interlace sync and video raster scan mode is selected.

What's the difference betweeen bit 0 and bit 1 both being set and bit 0 being set without bit 1 being set?
It calls one "interlace sync raster scan mode" and the other "interlace sync and video raster scan mode". What's the difference between those two?
It might be related to the weird behaviour UniPCemu has on the final left to right to left text movement on the final text-mode part of the demo? Although that might also be character height failing somehow?

Edit: Just looked it up: http://dunfield.classiccmp.org//r/6845.pdf
So basically, both interlace modes perform odd/even fields at MAP13. That doesn't change.
The difference is in the scanline address? In the non-video mode (01b) the row addresses increase every 2 scanlines, so on each second time horizontal total is reached (matching VGA MAP13 bit?). But in the video mode it increases on each horizontal total?
Anyone knows how this works?

Last edited by superfury on 2023-06-28, 20:04. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 88 of 113, by VileR

User metadata
Rank l33t
Rank
l33t
superfury wrote on 2023-06-28, 19:43:

What's the difference betweeen bit 0 and bit 1 both being set and bit 0 being set without bit 1 being set?
It calls one "interlace sync raster scan mode" and the other "interlace sync and video raster scan mode". What's the difference between those two?
It might be related to the weird behaviour UniPCemu has on the final left to right to left text movement on the final text-mode part of the demo? Although that might also be character height failing somehow?

Maybe the Hitachi HD6845 datasheet explains it better - it's generally quite a bit more helpful (and wordy) than Motorola's: https://archive.org/details/bitsavers_hitachi … up?view=theater
(note also p. 178 on the comparison between the HD6845R and HD6845S; the "R" variant is the one seen on some IBM CGA cards, and it's equivalent to the MC6845.)

But no, that final part where it drops back to the DOS prompt doesn't use any of the interlace modes, although it does mix character heights (8 and 1, i.e. "max scanline" flipping between 7 and 0).

[ WEB ] - [ BLOG ] - [ TUBE ] - [ CODE ]

Reply 89 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
VileR wrote on 2023-06-28, 20:03:
Maybe the Hitachi HD6845 datasheet explains it better - it's generally quite a bit more helpful (and wordy) than Motorola's: htt […]
Show full quote
superfury wrote on 2023-06-28, 19:43:

What's the difference betweeen bit 0 and bit 1 both being set and bit 0 being set without bit 1 being set?
It calls one "interlace sync raster scan mode" and the other "interlace sync and video raster scan mode". What's the difference between those two?
It might be related to the weird behaviour UniPCemu has on the final left to right to left text movement on the final text-mode part of the demo? Although that might also be character height failing somehow?

Maybe the Hitachi HD6845 datasheet explains it better - it's generally quite a bit more helpful (and wordy) than Motorola's: https://archive.org/details/bitsavers_hitachi … up?view=theater
(note also p. 178 on the comparison between the HD6845R and HD6845S; the "R" variant is the one seen on some IBM CGA cards, and it's equivalent to the MC6845.)

But no, that final part where it drops back to the DOS prompt doesn't use any of the interlace modes, although it does mix character heights (8 and 1, i.e. "max scanline" flipping between 7 and 0).

OK. So MAP13 still applies, which is correct behaviour.

The weird thing is the non-video vs video modes. Is there a VGA setting that does the same as that for those scanlines, or is it a new thing that needs to be added? Hmmm...
Edit: Just checked with the CGA compatibility tester tool 1.1 20160315. The mode seems to run in the interlaced Sync & Video mode (as Hitachi calls it)?

Every other scanline in the compatibility tester is black because the high RAM contains nothing to render?

Edit: Interestingly, even though the MAP13 bit (mapping the scanline counter to the MA13) is instead set by putting the CGA in graphics mode, apparently it's MAP14(corresponding to the VGA CRTC memory mode register bit 1, although using bit 0 of the scanline counter as it's source) instead (although reading data out of range of writable memory)? And according to some sources, MA14 isn't connected on the CGA (the chip does drive it though, as does UniPCemu) and said RAM area isn't reachable (perhaps that's why the memory area used by the CGA has such a small mask?).
UniPCemu does apply the 16KB wrap for memory at the CPU window level (and 4KB wrap on MDA), but doesn't apply any wrapping on the MA side. Perhaps that would need to be enforced on the CGA/MDA?

Edit: As a side note, I think I might know why the cycle-accurate part of the demo is running so incredibly slowly. It's hammering the CGA CRT horizontal timing registers! And UniPCemu currently doesn't check for changes in those registers and just assumes that all timings need to be recalculated! So it then updates all horizontal AND vertical timing precalcs for the entire CRT lookup table (thousands of entries and all video precalcs), which might not be necessary if proper checks for updates are made. That's a new thing to add to UniPCemu.
Profiling reveals that over 80% of CPU time is just spent updating those CRTC precalcs, which is ridiculous!

Interestingly, just as I was checking some more timings, I notice that the flashing image just before the chaplin part ends and after it's zoomed out into the center is a (mostly) correct rendering of the image, just before the red city area appears?

1676-Final frame just before the chaplin part is replaced with the next scene.png
Filename
1676-Final frame just before the chaplin part is replaced with the next scene.png
File size
43.98 KiB
Views
1747 views
File comment
Correct chaplin part just before the next scene replaces the image with the red city area.
File license
Fair use/fair dealing exception

The same at the start of the credits part (as far as I've noticed): for a bit a correct rendering of the upper half is seen, then replaced with the garbled one?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 90 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just was looking at the parts that make UniPCemu crawl to a halt, at the Chaplin effect. It looks like it's simply because it's hammering the horizontal timing registers, causing the entire table of registers to be updated about 6 times each scanline (multiply by amount of scanlines for a frame's timing, which is a lot to be calculating (4096 entries for a vertical timing update, 32768 entries for horizontal timing updates))?

I did make a quick note of the sequencer's x coordinate during each of them (it's at the point the CGA writes to the CRT registers happen in the hardware).
It's at tick T1 of said transfer.

 15,   ",   ",   ",   "  (CRT 0)
63, ", 75, ", 63 (CRT 2)
255, ", 267, ", " (CRT 1)
29d, 28b, 29d, ", " (CRT 0)
42, 54, ", 42, " (CRT 1)
90, a2, ", ", " (CRT 2)

The quotes mean that the value is the same as the value to the left.
The number from top to bottom are in the order I see those CRT writes (At the T1 cycle, before the hardware ticks said cycle). From left to right are consecutive loops it seems (perhaps scanlines, or maybe 2 scanlines after each other, didn't check that precise. I do see that the scanline counter never goes past 1.

It's the <>-forming effect. I do see that the first pattern is most common.
Interestingly, it always seems to execute that first one at horizontal raster position 0x15? The others seem to be mix and match (mostly) for the slots from top to bottom (go from top to the bottom through the table, choosing one of the entries (most of the time) from the horizontal row. Then mix and match.

Edit: Just looked at the INT8(IRQ0) handler being started. It seems to always be at scanline coordinate 0x33C (decimal 828th clock on the 7th scanline (both 0-based, so it's at the 829th pixel of the 7th scanline)).
Edit: The IRQ INT call ends at sequencer clock 159(9Fh),8. Fetching/decoding at A2h, first instruction execution at AEh.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 91 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-06-28, 21:06:

So it then updates all horizontal AND vertical timing precalcs for the entire CRT lookup table (thousands of entries and all video precalcs), which might not be necessary if proper checks for updates are made. That's a new thing to add to UniPCemu.

I don't know what you're precalculating exactly but in theory you'd only need to recalculate if you switched from HCLK to LCLK (80 col mode to 40 col mode) or vice-versa which none of the effects do mid-effect. Chaplin/Wibble itself stays in 80 column text mode the whole time, so the timings are always the same.

I know you've put in a lot of work to make a combined CGA/EGA/VGA adapter, and it's impressive that you've managed it so far, but I wonder if you are causing yourself a bit of grief not treating the CGA as a unique device. A lot of the problems with emulating CGA go away if you treat it as a fixed (or dual) frequency device.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 92 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-06-29, 14:17:
superfury wrote on 2023-06-28, 21:06:

So it then updates all horizontal AND vertical timing precalcs for the entire CRT lookup table (thousands of entries and all video precalcs), which might not be necessary if proper checks for updates are made. That's a new thing to add to UniPCemu.

I don't know what you're precalculating exactly but in theory you'd only need to recalculate if you switched from HCLK to LCLK (80 col mode to 40 col mode) or vice-versa which none of the effects do mid-effect. Chaplin/Wibble itself stays in 80 column text mode the whole time, so the timings are always the same.

I know you've put in a lot of work to make a combined CGA/EGA/VGA adapter, and it's impressive that you've managed it so far, but I wonder if you are causing yourself a bit of grief not treating the CGA as a unique device. A lot of the problems with emulating CGA go away if you treat it as a fixed (or dual) frequency device.

Well, the main issue in this case is that it's not implementing the counters etc. as live comparators. Instead everything horizontal and vertical is stored in a big table, which is read in sequence to obtain the different triggers at various points on the scanline and the same for vertical timings (just a different table).

And each update to any register causes all those tables to be fully updated.
I can change those tables back to single point comparators, but that'll stress the entire system way more, as all those variables need to be checked each clock (instead of precalculating them, although that becomes very heavy on the host CPU when it's done a lot (each modification causes all those settings to be calculated and looped through on every possibility of the scanline or horizontal x coordinate on the precalcs)).

Edit: Also the first instruction (a mov segreg one) enters it's execution phase at sequencer x of 0xAE.

Basically, the CRTC and other render mechanisms is driven by this:

OPTINLINE uint_32 get_display(VGA_Type *VGA, SEQ_DATA *Sequencer, word Scanline, word x) //Get/adjust the current display part for the next pixel (going from 0-total on both x and y)!
{
INLINEREGISTER uint_32 stat; //The status of the pixel!
//We are a maximum of 4096x1024 size!
Sequencer->currentScanline = Scanline; //What scanline!
Scanline >>= VGA->precalcs.CRTCModeControlRegister_SLDIV; //Apply Scan Doubling on the row scan counter: we take effect on content (double scanning)!
Sequencer->currentx = x; //What coordinate on the scanline!
if (unlikely(Scanline > 0x7FFF)) Scanline = 0x7FFF; //Clip to the allowed range!
if (unlikely(x >= 0x7FFF)) x = 0x7FFF; //Clip to the allowed range!
Scanline &= 0x7FFF; //Range safety: 4095 scanlines!
x &= 0x7FFF; //Range safety: 4095 columns!
stat = VGA->CRTC.rowstatus[Scanline]; //Get row status!
stat |= VGA->CRTC.colstatus[x]; //Get column status!
stat |= VGA->precalcs.extrasignal; //Graphics mode etc. display status affects the signal too!
stat |= (blanking<<VGA_SIGNAL_BLANKINGSHIFT); //Apply the current blanking signal as well!
VGA_hblankstart = stat; //Save directly! Ignore the overflow!
VGA_hblankstart >>= 7; //Shift into bit 1 to get the hblank status(small hack)!
return stat; //Give the combined (OR'ed) status!
}

Those rowstatus, colstatus (extrasignal is only a single 32-bit integer) are precalculated like this:

//whatprecalcs: bit0: all timings, bit1: vertical timings, bit2: horizontal timings
void VGA_calcprecalcs_CRTC(void *useVGA, byte whatprecalcs) //Precalculate CRTC precalcs!
{
VGA_Type *VGA = (VGA_Type *)useVGA; //The VGA to use!
uint_32 current;
byte charsize,textcharsize;
//Column and row status for each pixel on-screen!
charsize = getcharacterheight(VGA); //First, based on height!
current = 0; //Init!
if ((whatprecalcs & 3)) //bit0: all timings, bit1: vertical timings
{
for (; current < NUMITEMS(VGA->CRTC.rowstatus);) //All available resolutions!
{
VGA->CRTC.charrowstatus[current << 1] = current / charsize;
VGA->CRTC.charrowstatus[(current << 1) | 1] = current % charsize;
VGA->CRTC.rowstatus[current] = get_display_y(VGA, current); //Translate!
++current; //Next!
}
}

//Horizontal coordinates!
charsize = getcharacterwidth(VGA); //Now, based on width!
textcharsize = gettextcharacterwidth(VGA); //Text character width instead!
current = 0; //Init!
word extrastatus;
byte pixelrate=1;
byte innerpixel;
byte fetchrate=0; //Half clock fetch!
byte pixelticked=0; //Pixel has been ticked?
byte clockrate;
byte firstfetch=1; //First fetch is ignored!
byte graphicshalfclockrate = 0; //Graphics half clock rate!
byte usegraphicsrate;
usegraphicsrate = VGA->precalcs.graphicsmode; //Are we in graphics mode?
clockrate = (((VGA->precalcs.ClockingModeRegister_DCR&1) | (CGA_DOUBLEWIDTH(VGA) ? 1 : 0))); //The clock rate to run the VGA clock at!
byte theshift = 0;
switch (VGA->precalcs.ClockingModeRegister_DCR)
{
case 0:
theshift = 0; //Handle normally? VGA-compatible!
break;
case 1:
theshift = 1; //Handle normally? VGA-compatible!
break;
case 3:
theshift = 1; //Handle normally? VGA-incompatible!
break;
case 2: //Special mode?
theshift = 0; //Handle normally? VGA-comaptible!
break;
}

if ((whatprecalcs & 5)) //bit0: all timings, bit2: horizontal timings
{
for (; current < NUMITEMS(VGA->CRTC.colstatus);)
{
VGA->CRTC.charcolstatus[current << 1] = current / charsize;
VGA->CRTC.charcolstatus[(current << 1) | 1] = current % charsize; //Doesn't affect the rendering process itself!
VGA->CRTC.textcharcolstatus[current << 1] = current / textcharsize;
VGA->CRTC.textcharcolstatus[(current << 1) | 1] = innerpixel = current % textcharsize;
Show last 65 lines
			if (usegraphicsrate) //Graphics mode is used? Don't use the extended text-mode sizes!
{
VGA->CRTC.textcharcolstatus[current << 1] = VGA->CRTC.charcolstatus[current << 1];
innerpixel = (byte)(VGA->CRTC.textcharcolstatus[(current << 1) | 1] = VGA->CRTC.charcolstatus[(current << 1) | 1]);
}
VGA->CRTC.colstatus[current] = get_display_x(VGA, ((current >> theshift))); //Translate to display rate!

//Determine some extra information!
extrastatus = 0; //Initialise extra horizontal status!

if (((VGA->registers->specialCGAflags | VGA->registers->specialMDAflags) & 1) && !CGA_DOUBLEWIDTH(VGA)) //Affect by 620x200/320x200 mode?
{
extrastatus |= 1; //Always render like we are asked, at full resolution single pixels!
pixelticked = 1; //A pixel has been ticked!
}
else //Normal VGA?
{
if (++pixelrate > clockrate) //To read the pixel every or every other pixel(forced every clock in CGA normal mode)?
{
extrastatus |= 1; //Reset for the new block/next pixel!
pixelrate = 0; //Reset!
pixelticked = 1; //A pixel has been ticked!
}
else
{
pixelticked = 0; //Not ticked!
}
}

if (pixelticked)
{
if (innerpixel == 0) //First pixel of a character(loading)?
{
fetchrate = 0; //Reset fetching for the new character!
}

//Tick fetch rate!
++fetchrate; //Fetch ticking!
if (usegraphicsrate) //Use 4 pixel clocking?
{
if ((++graphicshalfclockrate & 3) == 1) goto tickdiv4; //Tick 1&5, use 4 clock division for pixels 1&5, ignoring character width completely!
}
else if (((fetchrate == 1) || (fetchrate == 5))) //Half clock rate? Tick clocks 1&5 out of 8 or 9+!
{
tickdiv4: //Graphics DIV4 clock!
if (!firstfetch) //Not the first fetch?
{
extrastatus |= 2; //Half pixel clock for division in graphics rates!
}
else --firstfetch; //Not the first fetch anymore!
}
pixelticked = 0; //Not ticked anymore!
}

if (current < NUMITEMS(VGA->CRTC.extrahorizontalstatus)) //Valid to increase?
{
extrastatus |= 4; //Allow increasing to prevent overflow if not allowed!
}
VGA->CRTC.extrahorizontalstatus[current] = extrastatus; //Extra status to apply!

//Finished horizontal timing!
++current; //Next!
}
}
}

VGA_calcprecalcs_CRTC is the function constantly hammered by the CGA register precalc write routine, causing the extreme slowdown to 2% instead of usual 25% or higher.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 93 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-06-29, 15:44:

I can change those tables back to single point comparators, but that'll stress the entire system way more, as all those variables need to be checked each clock

My original CGA was ticked at 14Mhz and did all those comparisons for every pixel. Yeah, it was a bit heavy, but it still ran at 60FPS on my desktop.

You can clock the CRTC at the character clock instead of 14Mhz because that's what the CGA does; don't know if that helps or if that's what you're already doing. That's at least 8 or 16 times faster, then. The Wibble/Lake effects really didn't like character clocking, they needed more precision, so I ended up needing a dynamic system where a CRTC write will tick the CGA at 14Mhz until the next character clock boundary.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 94 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-06-29, 16:43:
superfury wrote on 2023-06-29, 15:44:

I can change those tables back to single point comparators, but that'll stress the entire system way more, as all those variables need to be checked each clock

My original CGA was ticked at 14Mhz and did all those comparisons for every pixel. Yeah, it was a bit heavy, but it still ran at 60FPS on my desktop.

You can clock the CRTC at the character clock instead of 14Mhz because that's what the CGA does; don't know if that helps or if that's what you're already doing. That's at least 8 or 16 times faster, then. The Wibble/Lake effects really didn't like character clocking, they needed more precision, so I ended up needing a dynamic system where a CRTC write will tick the CGA at 14Mhz until the next character clock boundary.

It is ticked at 14MHz directly on the CGA. The other video cards don't do that though (since they have their own clock crystal at different or configurable speeds).

So far managed to get most precalcs removed and called directly in the renderer itself.
The only that's left so far is the horizontal timing precalcs for the extra horizontal status, which dictates the horizontal increases for active scanline rendering itself (again, using counters in parallel).
Edit: Just converted that into a live version as well, ticking it's state every clock.
It should still work as before, but now without precalcs and ran directly from the horizontal timings (at the pixel clock level). So the two horizontal clocks still run as before, but now not using those big precalc buffers anymore to check what action to take every clock cycle (stuff like dot clock rate etc.'s effects on rendering pixels instead of the CRTC only).
One thing added now is that it supports mid-scanline text-graphics (from text to graphics mode or the other way around) mode changes (in theory), which causes the mechanics in those buffers to fully reset now (as if it's on the beginning of the scanline).

So now the entire video card is completely running without precalcs (there is still one live running precalc (other than attribute controller and DAC), but it doesn't affect the CGA in any way, since it's in single-pixel passthrough mode anyways (although still using the attribute registers 0-F to set the used CGA/MDA palette)). The DAC isn't used at all, simply rendering to the single-line framebuffer, which is afterward converted using Reenigne's NTSC algorithm or simply converted to RGB using my lookup table (with the new adjusted CGA colors from Viler's post (At https://int10h.org/blog/2022/06/ibm-5153-colo … ue-cga-palette/).
No fancy scanline blurring yet though, may not ever make it (unless it's fast and easy to implement without large memory requirements on small memory devices like the PSP(-1000)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 95 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

https://www.dropbox.com/s/twtb9j8k4luiy0q/Uni … 3-45-03.7z?dl=0

Made a short capture over Microsoft RDP. The cycle-accurate slowdowns (they were just because it was hammering the CRT horizontal/vertical registers, causing precalcs to take up more than 80% of CPU time on the host CPU) are gone now! 😁 Also makes debugging way faster ofc.

Still not a speed horse at only 25%-30% realtime speed, but way way beter than 2%! 😁
And that's just the Visual Studio build I was using. GCC(MinGW) has better optimization, so that runs even faster!

Edit: And 8088MPH with the latest changes (although hanging at the credits, silent):
https://www.dropbox.com/s/q7cvediozlaeiyi/Uni … 0-11-48.7z?dl=0

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 96 of 113, by mothergoose729

User metadata
Rank Oldbie
Rank
Oldbie

Hey Glorious Cow! I love this emulator already! I have been messing around with more accurate PC emulators lately and it's fascinating to appreciate all the nuances of these machines. Your emulator was very easy to get up and running! The ability to seamlessly toggle between RGB and composite graphics is freaking sweet, I love it!

I know this is an initial release, so obviously there are a lot of QOL and feature improvement to make from a usability stand point. I used bios images from my PCem collection and it was hard to figure out what MartyPC needed exactly to boot. I eventually found it, but mostly just by copy and pasting tones of image files into the ROMs folder until something booted.

I have a couple of questions about the features the emulator supports right now:

1) Do you plan to support specific machine configurations like PCem does? (amount of memory, number of floppy drives and their size, ect)
2) Is there a true full screen mode implemented?
3) Can you load floppies from other directories?

Reply 97 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

Put a video of the latest version of UniPCemu running with the vertical retrace timing fixes:
https://www.youtube.com/watch?v=kOoG0Orvk_M

Still not there entirely, but making progress on it.
Although the parts mentioned (the vertical scrolling part until the bob at the bottom of the houses image and the part before the dancing elephant) should work, assuming that it's not doing anything weird with horizontal timings?
Both seem to have issues with some frames being half-width instead of full width? Or perhaps the issue is something taking double the time in clocks for even or odd frames only (seeing as such a thing would cause such an effect)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 98 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
mothergoose729 wrote on 2023-06-29, 23:27:

Hey Glorious Cow! I love this emulator already!

😀 I hope you don't mind but i responded to you in a new thread. This thread is a bit of a mix between Area 5150 troubleshooting and MartyPC discussion, so I thought I'd separate things out.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 99 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-06-30, 01:31:

Still not there entirely, but making progress on it.
Although the parts mentioned (the vertical scrolling part until the bob at the bottom of the houses image and the part before the dancing elephant) should work, assuming that it's not doing anything weird with horizontal timings?
Both seem to have issues with some frames being half-width instead of full width? Or perhaps the issue is something taking double the time in clocks for even or odd frames only (seeing as such a thing would cause such an effect)?

How do you handle CGA mode byte 0x03? Both these effects use that trick. VileR explains how it works here: https://pcem-emulator.co.uk/phpBB3/viewtopic.php?t=3831

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc