I've had some ideas in my head for a while now to make emulation .. better I thought I'd put them to paper. I've made some progress towards these.
1. The ability to drop frames. This is required for vf3. Currently in the level transition loading phases we are drawing crazy amounts of garbage. There are some hacks in the renderer to actually stop it blowing up during these stages, without it would probably crash. A few games will crash in rare circumstances if you under clock the CPU and the renderer will try to render incomplete frames. Other games during boot are trying to draw stuff they shouldn't, I think virtua-on is doing this.
- To make this work we I rewrote the renderer to make the 3d graphics independent from the 2d, before the image was composited, which made independent drawing basically impossible.
What's left?
- Just need to plug in the correct logic to make this work. Frames seem to be terminated when a 0xC write is written to the tilegen. For the 3d to kick off 0x88 has to have been written to the 3d hardware. Some games will call this a few times per frame. I *think* it kind of just marks the transfer as complete. Some games will write extra data after 0x88 has been called. Not sure if this is after the ping_pong bit, or if these writes are double buffered at all. I am not 100% sure.
2. Move the start of the frame back to when v-blank is fired. The h/w will start writing data for a new frame something like 2/3rds of the way through the current frame, for a sort of triple buffered approach. To fix data straddling to frames I shifted the end of the frame to be where this ping_pong bit flips, but this is a bit of a kludge.
- To make this work we need to figure out exactly how the double buffered memory writes happen.
3. Some games ie scud and possibly lost world will update the tilegen mid frame, which is perfectly legal to do. Scud has an animated car which currently is completely missing from emulation. Quite often updates happen after the ping_pong bit flips, but in theory the games could write anywhere in the frame.
- To fix this we should draw line by line when the tilegen hardware will be updating the image. The 3d and the 2d hardware can not be drawing at the same time with opengl, to make this work we'd have to buffer all the writes for the tilegen. All the efficiency of the GPU tilegen shader implementation basically goes out the window as soon as you try and draw line by line or having to buffer memory writes.
- I've actually nearly finished this. I ported the GLSL shader code basically back to software, which was a port of Bart's original tilegen code. Just need to clean it up a bit really.
4. Code cleanup. The project originally was I assume? mostly done in visual studio. Bart later Bart changed the project to use 2 spaces for tabs instead of 4. For us visual studio peasants this is really painful lol. I'd personally like to convert it back to 4 spaces for tabs ...
I'm sure there are plenty of other bits in the code to clean up.
5. What's missing?
Long term dev plan
Forum rules
Keep it classy!
Keep it classy!
- No ROM requests or links.
- Do not ask to be a play tester.
- Do not ask about release dates.
- No drama!
Re: Long term dev plan
Converting to 4 spaces is fine (just please not tab characters lol). The project was originally always built with GCC and I used UltraEdit to write it Nik added MSVC.
I've forgotten quite a bit about the frame logic. I wish I had written up those emails we exchanged several years back when I ran a series of experiments on the VF3 board. Are we sure that 0xC terminates rendering? I seem to recall it is written around the time that 0x88000000 is, but it wasn't clear what these two mean.
I'm still pretty sure that ping pong memory is actually double-buffered and the other culling RAM region is not. Should this be emulated?
Lastly, were there changes to the tile renderer logic in GLSL or can we just revert back to the original CPU rendering code?
I've forgotten quite a bit about the frame logic. I wish I had written up those emails we exchanged several years back when I ran a series of experiments on the VF3 board. Are we sure that 0xC terminates rendering? I seem to recall it is written around the time that 0x88000000 is, but it wasn't clear what these two mean.
I'm still pretty sure that ping pong memory is actually double-buffered and the other culling RAM region is not. Should this be emulated?
Lastly, were there changes to the tile renderer logic in GLSL or can we just revert back to the original CPU rendering code?
Re: Long term dev plan
I'm fairly sure 0xC is the last thing written in a frame. It makes sense that the tilegen controls the rendering since it generates irq2. The ping_pong flip time is also written to the tilegen when games boot up and that is really just something for the 3d hw.
-
- Posts: 24
- Joined: Wed Nov 08, 2023 2:10 am
Re: Long term dev plan
We made one change to the tilegen code since it was implemented in GLSL, basically just to swap the scroll values for the odd/even lines since we had them the wrong way round.
Writing to tilegen register 0xC tells the tilegen to signal the video board to flip the ping-pong buffers when it reaches the active video line number specified in register 0x8. Once that's done there's nothing left to do except wait for decrementer exception, which is timed to fire approximately when ping-pong flips.
Ping-pong memory is double-buffered, the rest of culling RAM is not. Writing to 0x8Cxxxxxx accesses culling RAM directly, while writing to 0x8Exxxxxx accesses the ping-pong buffer within culling RAM that is not being actively used to generate the next frame. Writing to 0x90xxxxxx, 0x94xxxxxx or 0x98xxxxxx only writes to texture RAM or polygon RAM if rendering has finished, otherwise it writes to the update buffer first. There are two update buffers, also stored within culling RAM, which are swapped at the same time as the ping-pong buffers to allow the video board to update polygon RAM and/or texture RAM for the current frame while also receiving updates for the next frame.
I think games only update culling RAM via (0x8Cxxxxxx) when nothing is being rendered. I have a hunch that it only happens when rendering is disabled via JTAG; I'll have to check.
Writing to tilegen register 0xC tells the tilegen to signal the video board to flip the ping-pong buffers when it reaches the active video line number specified in register 0x8. Once that's done there's nothing left to do except wait for decrementer exception, which is timed to fire approximately when ping-pong flips.
Ping-pong memory is double-buffered, the rest of culling RAM is not. Writing to 0x8Cxxxxxx accesses culling RAM directly, while writing to 0x8Exxxxxx accesses the ping-pong buffer within culling RAM that is not being actively used to generate the next frame. Writing to 0x90xxxxxx, 0x94xxxxxx or 0x98xxxxxx only writes to texture RAM or polygon RAM if rendering has finished, otherwise it writes to the update buffer first. There are two update buffers, also stored within culling RAM, which are swapped at the same time as the ping-pong buffers to allow the video board to update polygon RAM and/or texture RAM for the current frame while also receiving updates for the next frame.
I think games only update culling RAM via (0x8Cxxxxxx) when nothing is being rendered. I have a hunch that it only happens when rendering is disabled via JTAG; I'll have to check.
Re: Long term dev plan
Some additional dev priorities that I'd like to get to:
- Better racing controls for both analog sticks and digital pads. I have some ideas, just need to find time to start playing around.
- Spruce up documentation and web page.
- Add analytics. Given that the user base is much larger than I realized, it would be interesting to know how frequently they upgrade, what input settings they tend to use, as well as basic PC configuration. I don't think this would be too intrusive because it won't fingerprint users and at any rate, as a fully open source project, there would be a public dashboard to view these stats.
- Get threading working in netplay mode. I need to modify Musashi to use a context object to store its state rather than a global context as it does now.
- Integrated GUI -- imgui is probably the leading candidate.
- Rewrite ROM loading one more time to better conform to the way MAME ROM sets are actually distributed. I do still want it to identify ROMs by CRC (I find this feature super handy myself because I don't like upgrading my ROMs).
- Metal renderer for iOS support.
- Real3D Pro-1000 support.
- Scripting support. I think a lot of enhancements could be controlled via scripts. Lua would be easy but I recently worked on a project that used Lua and absolutely hated it. Might need to suck it up and use it but I'm going to be on the lookout for something that's similarly lightweight but with a better language.
- Enhancements: model dumping would be fun and is something I might actually do sooner rather than later. Texture and model packs are frequently requested but this would be too much of a burden on the rendering engine in terms of maintainability. I do wonder if at some point, when the engine stabilizes, it could be supported by defining special case code paths for texture mapping (supporting ARGB8 and RGB8 textures). *Replacing* textures and even VROM models would be easy and could be done outside of the engine by intercepting texture transfers in Real3D.cpp or via a scripting interface, but they'd have to conform to the Real3D format and I'm not sure how willing anyone would be to develop a workflow and custom exporters to support this.
- Better racing controls for both analog sticks and digital pads. I have some ideas, just need to find time to start playing around.
- Spruce up documentation and web page.
- Add analytics. Given that the user base is much larger than I realized, it would be interesting to know how frequently they upgrade, what input settings they tend to use, as well as basic PC configuration. I don't think this would be too intrusive because it won't fingerprint users and at any rate, as a fully open source project, there would be a public dashboard to view these stats.
- Get threading working in netplay mode. I need to modify Musashi to use a context object to store its state rather than a global context as it does now.
- Integrated GUI -- imgui is probably the leading candidate.
- Rewrite ROM loading one more time to better conform to the way MAME ROM sets are actually distributed. I do still want it to identify ROMs by CRC (I find this feature super handy myself because I don't like upgrading my ROMs).
- Metal renderer for iOS support.
- Real3D Pro-1000 support.
- Scripting support. I think a lot of enhancements could be controlled via scripts. Lua would be easy but I recently worked on a project that used Lua and absolutely hated it. Might need to suck it up and use it but I'm going to be on the lookout for something that's similarly lightweight but with a better language.
- Enhancements: model dumping would be fun and is something I might actually do sooner rather than later. Texture and model packs are frequently requested but this would be too much of a burden on the rendering engine in terms of maintainability. I do wonder if at some point, when the engine stabilizes, it could be supported by defining special case code paths for texture mapping (supporting ARGB8 and RGB8 textures). *Replacing* textures and even VROM models would be easy and could be done outside of the engine by intercepting texture transfers in Real3D.cpp or via a scripting interface, but they'd have to conform to the Real3D format and I'm not sure how willing anyone would be to develop a workflow and custom exporters to support this.
Re: Long term dev plan
Any plans for model1 & 2 ?
Thanks for your great work !
Thanks for your great work !
-
- Posts: 24
- Joined: Wed Nov 08, 2023 2:10 am
Re: Long term dev plan
I ought to work on getting the simulated netboard to run in its own thread at some point. The multithreading code could probably do with a rewrite, it currently uses the CThread class which is just a wrapper for the SDL thread functions, while C++11 onwards has multithreading support built-in.
If a Metal renderer is implemented it could also be used for macOS, which would mean the existing OpenGL renderer wouldn't be restricted to version 4.1 (the last version supported by Apple) any more
If a Metal renderer is implemented it could also be used for macOS, which would mean the existing OpenGL renderer wouldn't be restricted to version 4.1 (the last version supported by Apple) any more
Re: Long term dev plan
Okay so I've ported the shader code back to software. The basic shader code is not as efficient as Bart's original code. However we only iterate over the 'active' layer according to the mask. So if the mask is the same for the whole layer we never have to even look at the alt layer which gives us quite a speed up. In bench-marking it's getting on for 2x as fast. I used Bart's idea of caching the palettes, which helps a lot with performance. I tried to keep the code simple as well, the original tilegen code was hard to read, I actually had to look back at the earliest versions of supermodel to really get my head around how it was working. There is no attempt to be smart such as writing to the pixels only once, the buffers are cleared every frame. There isn't even code to check for pixels out of bounds, which might happen if half a tile overlapped the right hand side. I just made the buffers wider (512 pixels).
There are a few minor differences. I only check the mask once per tile, instead at every pixel. This should hopefully replicate this:
That will want to be moved around a bit in the future.
Clearing surfaces should be moved to v-blank start. Line drawing called in the main thread at the right time. Then at the end of the frame opengl should be updated with the finished display surfaces. The surfaces are double buffered, but that might not be needed in the future.
But anyway all the pieces of the puzzle are there so we can correctly emulate how both of these chips drew simultaneously. Just need to finish the puzzle
There are a few minor differences. I only check the mask once per tile, instead at every pixel. This should hopefully replicate this:
Lastly I just made the code a drop in replacement for the original. All the code currently lives inDisplay quirks
The same hardware that is used to render the background layer is used for
the window layer. This causes a problem when the lower 3 bits of the
horizontal scroll are nonzero between tiles that are adjacent to each other
and come from different layers as specified by the mask bit.
The 8-pixel shift registers are only loaded once every 8 pixels. So in this
case the shift register will still contain data from the left tile, which
is shifted out into the right tile.
If the layer to the left has a scroll value of zero, then the shift
register is empty. Pixels 1-7 of the layer to the right are filled in
with the background color of the left tile instead.
Code: Select all
UINT32 CTileGen::SyncSnapshots(void)
{
// clear surfaces
for (auto& s : m_drawSurface) {
s->Clear();
}
// draw buffers (this should be called elsewhere later)
for (int i = 0; i < 384; i++) {
DrawLine(i);
}
// swap buffers
for (int i = 0; i < 2; i++) {
std::swap(m_drawSurface[i], m_drawSurfaceRO[i]);
}
Render2D->AttachDrawBuffers(m_drawSurfaceRO[0], m_drawSurfaceRO[1]);
return UINT32(0);
}
Clearing surfaces should be moved to v-blank start. Line drawing called in the main thread at the right time. Then at the end of the frame opengl should be updated with the finished display surfaces. The surfaces are double buffered, but that might not be needed in the future.
But anyway all the pieces of the puzzle are there so we can correctly emulate how both of these chips drew simultaneously. Just need to finish the puzzle