Frame timing

Technical discussion for those interested in Supermodel development and Model 3 reverse engineering. Prospective contributors welcome.
Forum rules
Keep it classy!

  • No ROM requests or links.
  • Do not ask to be a play tester.
  • Do not ask about release dates.
  • No drama!

Re: Frame timing

Postby Bart » Wed Aug 16, 2017 5:48 pm

Ian wrote:
2. Flush (88000000 = deaddead) and wait for acknowledgment by checking for the status bit flipping.
3. Compute time from when status bit flipped until next VBL.


Only briefly skimmed what you've read. But VBL is a period of time. You could wait for it twice in a single frame, once waiting for VBL to start, where the status bit would go from low to high. Then wait for it to end, where it would go back to low again. I presume as soon as it goes low again its currently drawing to the screen.


That's true, it is a period of time, but I haven't seen a system that triggers multiple interrupts during VBL (although it could exist). According to Charles MacDonald's documentation, the System 24 tile generator raises its IRQ signal for VBL on the very last active scan line and then it goes low at the end of that line. Depending on how the interrupt controller works, it will either trigger once (on the low -> high transition) or continue to trigger interrupts as long as the signal is high (e.g., if the IRQ handler exits before one scan line of time has elapsed, the IRQ exception will immediately be triggered again). The latter is what actually happens on the old System 24 hardware. You can see that at the very end of the IRQ 02 function, the PowerPC continually writes an ack register and waits for a signal to go low again before exiting the interrupt routine.

Therefore, I'm very confident that the game can only detect when VBL has started (well, there is probably a bit it can read from one of the tile generator registers that tells it whether it's currently in VBL or not).

When polling it looks like this at boot
000011111111111111111110 <- immediately stops polling when it flips to zero again


That's true -- you can actually see this in the code I pasted (the first snippet):

Code: Select all
  // Issue a flush command
  real3d_flush(2);  // 0x88000000 = 0xdeaddead

  // Wait for status bit to flip
  read_real3d_status();
  uint8_t old_status_bit = _real3d_status_bits[2];
  do
  {
    read_real3d_status();
  } while (_real3d_status_bits[2] != old_status_bit);


After the flush, it waits for the status bit to flip. Then, it starts a timer that counts until the beginning of the next VBL period. It looks to me like it is trying to time how long it takes to perform a buffer swap.

I can't be certain until I take a look at what it's doing with the decrementer register. However, it's very telling that the decrementer (which is just a downward-counting high precision timer that can also trigger an exception when it hits 0) is loaded in the IRQ 2 handler (VBL). I'll take a closer look at the main loop but my guess is that it waits for VBL, triggers a swap (and by now, the decrementer is beginning its count down because it was re-armed by the VBL IRQ handler), and then waits for the decrementer to reach 0 before resuming. Why it uses the decrementer instead of just polling is unknown to me. Not all the games use this scheme. Seems a bit over-engineered but theoretically it could be used to asynchronously push in the next completed frame of data. I haven't yet analyzed what the decrementer is being used for (but it's definitely being used for something).


All the games do this polling at boot waiting on status bit to flip like that.
I guessed they do it, as to start computing the frame from the moment the CRT starts drawing the first line


That or they're waiting to synchronize on the ping pong buffer swap. For example, if you want to clear all the memory out, you need to flip twice and keep track of which buffer you just wrote.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Frame timing

Postby Bart » Wed Aug 23, 2017 2:45 pm

I've now spent days and countless hours looking at this without any breakthroughs yet. Fighting Vipers 2 is particularly strange. Haven't delved too far into any other Model 3 games before but I don't recall them relying on the decrementer as a means of syncing to VBL. What's especially bizarre is that there is an actual wait_for_vbl() function in the game, used frequently during boot-up initialization, but then after it performs the frame timing calculation I found early on, it relies entirely on looking at the value of the decrementer to determine where it is relative to VBL.

I'm fairly confident now that the status bit is merely an indicator for which memory page is active. I also think that this buffering mechanism only affects the RAM region at 0x8Exxxxxx, and not the other one at 0x8Cxxxxxx that contains the culling nodes. I'm not entirely sure how games determine it is safe to write to 0x8Cxxxxxx. Without having looked at it yet, I'm going to speculate that they first write a dummy viewport to both pages of 8E ping-pong RAM (requiring two swap commands), then load up a bunch of nodes and resume rendering by updating the display lists and matrices in ping-pong RAM.

The reason Fighting Vipers 2 is running so slowly in attract mode (half speed) is that each iteration of the main application loop takes two frames (two VBLs). Why? Therein lies the mystery but today I mapped out the sequence of Real3D and frame-sync events that occur in each iteration of the loop. Most importantly, I've discovered two very interesting sync functions that I need to wrap my head around.

The main loop is just this:

Code: Select all
0x00038F94: 0x48000011  bl  0x00038FA4  <-- all DMA transfers to Real3D eventually happen in here, with a DEC loop wait at the beginning
0x00038F98: 0x4803B9FD  bl  0x00074994  <-- I don't know what this is but nothing hardware related happens. Maybe program logic.
0x00038F9C: 0x4BFD114D  bl  0x0000A0E8  <-- another DEC loop wait occurs here
0x00038FA0: 0x4BFFFFF4  b 0x00038F94


The third function A0E8 is some sort of frame sync function but it also does a *lot* more at the beginning. I'm not sure what but in addition to some floating point calculations it calls a number of subroutines. It also resets some internal state variable every other frame. I need to understand this better. The basic frame sync logic is peculiar and seems to work like this:

1. If the DEC value is *negative*, it has counted down past 0 and we are therefore well past the last VBL IRQ. Wait until it becomes positive, which indicates VBL happened because the first thing the VBL IRQ handler does is re-arm DEC with the value calibrated at boot-up.

2. Now that DEC is positive and a VBL has definitely occured, write the value 3 to 0xf118000c (a tilegen register whose purpose is unknown but that seems to often get written after VBL -- it is *not* the ack register, though, which is already known and handled in the IRQ routine itself).

3. Wait for DEC exception to happen. This occurs when DEC hits 0. The loop here detects this by monitoring a variable that is incremented by the DEC exception handler.

4. If the DEC count has reached 2, increment some other counter in RAM and clear the DEC count. My guess is this drives some sort of logic that updates at 30 Hz. Maybe the physics loop?

Once this is all done, the main loop goes back to the first function. This is a crucial routine and does a *lot* of stuff. All Real3D updates happen in there as well. I don't know if it also handles game logic but I don't think so given the sort of other things it does.

Code: Select all
0x00038FA4: 0x9421FFF0  stwu  r1,-0x10(r1)
0x00038FA8: 0x7C0802A6  mfspr r0,lr
0x00038FAC: 0x90010004  stw r0,0x04(r1)
0x00038FB0: 0x7C0C42E6  mftb  r0,tbl
0x00038FB4: 0x3D400058  li  r10,0x00580000
0x00038FB8: 0x816A0E28  lwz r11,0xE28(r10)
0x00038FBC: 0x3D200027  li  r9,0x00270000
0x00038FC0: 0x90094314  stw r0,0x4314(r9)
0x00038FC4: 0x3D20FE10  li  r9,0xFE100000
0x00038FC8: 0x380B0001  addi  r0,r11,0x01
0x00038FCC: 0x900A0E28  stw r0,0xE28(r10)
0x00038FD0: 0x7D6B00D0  neg r11,r11
0x00038FD4: 0x9969001C  stb r11,0x1C(r9)  ; update board LED
0x00038FD8: 0x48002469  bl  0x0003B440
0x00038FDC: 0x4BFE7C35  bl  0x00020C10
0x00038FE0: 0x3D200011  li  r9,0x00110000
0x00038FE4: 0x88090B20  lbz r0,0xB20(r9)
0x00038FE8: 0x2C800001  cmpi  cr1,0,r0,0x01
0x00038FEC: 0x41860034  bt  cr1[eq],0x00039020
0x00038FF0: 0x3D200058  li  r9,0x00580000
0x00038FF4: 0x38000000  li  r0,0x00000000
0x00038FF8: 0x98090F03  stb r0,0xF03(r9)
0x00038FFC: 0x3D200058  li  r9,0x00580000
0x00039000: 0x98090F02  stb r0,0xF02(r9)
0x00039004: 0x3D200058  li  r9,0x00580000
0x00039008: 0x98090F01  stb r0,0xF01(r9)
0x0003900C: 0x3D200058  li  r9,0x00580000
0x00039010: 0x98090F00  stb r0,0xF00(r9)
0x00039014: 0x3D200059  li  r9,0x00590000
0x00039018: 0x38000000  li  r0,0x00000000
0x0003901C: 0x90091D14  stw r0,0x1D14(r9)
0x00039020: 0x4BFE7A3D  bl  0x00020A5C
0x00039024: 0x4BFE82DD  bl  0x00021300
0x00039028: 0x48038D15  bl  0x00071D3C
0x0003902C: 0x3D200059  li  r9,0x00590000
0x00039030: 0x8009FD4C  lwz r0,-0x2B4(r9)
0x00039034: 0x70090C00  andi. r9,r0,0x0C00
0x00039038: 0x41820024  bt  cr0[eq],0x0003905C  ; this path is taken during attract mode...
0x0003903C: 0x3D200059  li  r9,0x00590000
0x00039040: 0x8809FD99  lbz r0,-0x267(r9)
0x00039044: 0x2880001F  cmpli cr1,0,r0,0x001F
0x00039048: 0x41850014  bt  cr1[gt],0x0003905C
0x0003904C: 0x4BFD15C9  bl  0x0000A614  <-- saves all regs, including FP
0x00039050: 0x4BFD1B75  bl  0x0000ABC4
0x00039054: 0x4BFD163D  bl  0x0000A690  <-- restores all regs
0x00039058: 0x48000028  b 0x00039080
0x0003905C: 0x4BFCBE81  bl  0x00004EDC
0x00039060: 0x4BFD15B5  bl  0x0000A614  ; save all regs
0x00039064: 0x4BFD1B61  bl  0x0000ABC4
0x00039068: 0x4BFD1629  bl  0x0000A690  ; restore all regs
0x0003906C: 0x480000BD  bl  0x00039128 <-- DMA transfers and status reg read here (also, an interesting routine at A424 saves TBL at a certain point into SPRG3)
                                           This function may also delay one frame occasionally, which explains why FV2 intro runs so slowly (30FPS)
                                           Also, function A068 is called here, which does some sort of DEC frame timing.
0x00039070: 0x4BFE947D  bl  0x000224EC
0x00039074: 0x4800024D  bl  0x000392C0
0x00039078: 0x4801CCE5  bl  0x00055D5C
0x0003907C: 0x4BFCBF89  bl  0x00005004 <-- more DMA transfers here (at 5220, 88000000 is written)
0x00039080: 0x80010004  lwz r0,0x04(r1)
0x00039084: 0x7C0803A6  mtspr lr,r0
0x00039088: 0x38210010  addi  r1,r1,0x10
0x0003908C: 0x4E800020  bclr  0x14,0


You can see some notes to myself there. The bottom is where things are interesting. Normally, during actual game play and parts of the attract mode where textures are *not* being uploaded, the sequence of events here goes something like this:

1. Function A068 is called. It's several levels down the call stack from function 39128, which is the top-level function for Real3D transfers to polygon RAM and, evidently, texture RAM. Not sure about culling RAM (8C). Synchronization of some sort is performed. This function is short but important. It also writes to 0xf118000c. And it does something especially interesting: it actually waits for only a portion of the DEC countdown. That is, it waits a very specific number of cycles.

2. DMA copies to 98xxxxxx are performed, still inside of function 39128.

3. Eventually, function 5004 is executed. Here, ping-pong memory is updated (8E) and a flush or swap is triggered by writing 88000000.

This sequence gets more interesting during texture uploads. In attract mode, textures are only uploaded using the VROM mechanism, which involves writing a special Real3D register with the address of a texture in VROM to load up. A swap command (88000000) happens after each. But here's the interesting part: the function at A068 is called before *each* VROM texture register write. It doesn't necessarily sync to the next frame but I think what is happening is that it may be checking to see whether there is sufficient time remaining to perform texture transfers, thereby limiting how many are done per frame. I'm going to test this idea tomorrow by manually patching up the code to try to change the cycle count and see if the game tries to perform more uploads per frame.

More tomorrow, hopefully.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Frame timing

Postby Ian » Wed Aug 23, 2017 3:06 pm

It doesn't necessarily sync to the next frame but I think what is happening is that it may be checking to see whether there is sufficient time remaining to perform texture transfers, thereby limiting how many are done per frame.


When I wrote the mipmap code. I unexpectedly found that mipmaps are often sent a frame later than the base textures. So technically there can exist a frame where models don't have complete mipmap textures.
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Frame timing

Postby Bart » Wed Aug 23, 2017 5:51 pm

Ian wrote:
It doesn't necessarily sync to the next frame but I think what is happening is that it may be checking to see whether there is sufficient time remaining to perform texture transfers, thereby limiting how many are done per frame.


When I wrote the mipmap code. I unexpectedly found that mipmaps are often sent a frame later than the base textures. So technically there can exist a frame where models don't have complete mipmap textures.


That could be legitimately a result of them trying to send too many textures in the time they've allotted themselves or it could be an emulation bug. For example, the game's throttling code could end up waiting an entire frame because Supermodel is not handling Real3D updates correctly. Surely the hardware could handle several texture uploads per frame. But if the game is self-timing and thinks it's spending too much time on texture uploads, it could end up deferring them to the next frame.

You know, that timing routine I found on Wednesday August 16 (sub40c0()) executes right after that dummy scene is uploaded. Haven't yet checked but I wonder if that VROM model might actually be a very large one designed to simulate a relatively high or characteristic workload. Texture uploads would take significantly less time. Other game scenes could take more. I should dump some stats on that model tomorrow...
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Frame timing

Postby Ian » Thu Aug 24, 2017 11:40 am

Bart,
would it help if you actually used the correct v-blank time?? Currently the one in the source is I assume pure guess work. You posted a link sometime ago to the system24 one and that has various times for v sync and v blank from what I recall. Would those be similar? If i recall they usde the same crazy 57.5 fps
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Frame timing

Postby Bart » Thu Aug 24, 2017 12:17 pm

I've translated the two delay routines. It's clear they're trying to achieve some sort of precise frame timing. You may have been right all along about the Real3D being aware of frame timing as well. It may be that the Real3D only actually allows a swap to occur at a particular point in time and issuing a flush command may simply enqueue that swap to occur at the designated swapping time.

Here is suba068(). This one occurs somewhere before data is transferred to the Real3D, in the first of the three top-level main loop functions. Note that there is a flag there that causes the sync to only happen once per frame. When this function is called multiple times per frame (as occurs during texture uploads), it only does anything the first time around and bails out on subsequent calls.

Code: Select all
uint8_t _580e32;
uint32_t _580e70;

void suba068()
{
  if (_580e32 != 0)
    return;

  uint32_t start_time = read_tbl();

  // Value of DEC after 3300 cycles have elapsed (i.e., 3300 cycles since VBL
  // started). This value may not be coincidental. According to Charles'
  // System 24 doc, if the tile generator is operating at 424 lines per frame,
  // the breakdown is:
  //
  //   25 scanlines from /VSYNC high to /BLANK high (top border)
  //  384 scanlines from /BLANK high to /BLANK low (active display)
  //   11 scanlines from /BLANK low to /VSYNC low (bottom border)
  //    4 scanlines from /VSYNC low to /VSYNC high (vertical sync. pulse)
  //
  // On System 24, the interrupt happens on the last line and is "asserted on
  // the negative edge of H-sync before blanking is disabled and held for one
  // scanline (656 pixels) such that it is negated on the negative edge of H-
  // sync of the next scanline, line 384."
  //
  // Given:
  //
  //  - Assume 66 MHz bus frequency
  //  - TBR and DEC registers tick once every 4 bus cycles
  //  - Assume display rate of 57.52 Hz
  //  - Assume 424 scanlines per frame
  //
  // Then the number of timer ticks per scanline would be:
  //
  //  ((66e6 / 4) / 57.52) / 424 = 676.5
  //
  // The value 3300 corresponds to 4.8 scanlines. If the IRQ is triggered on
  // /VSYNC = 1 -> 0, they could be waiting out the vsync pulse and waiting for
  // the *start* of the next active frame.
  //

  uint32_t value_after_delay = _dec_reload_on_vbl - 3300;

  // Wait for VBL if decrementer is negative, which indicates that the allotted
  // time to do things after VBL is triggered (calibrated at start-up) has
  // already passed
  if (read_dec() <= 0)
  {
    uint8_t old_value = _vbl_count;
    while (_vbl_count == old_value)
      ;
  }

  // This seems to make sure at least 3300 cycles have elapsed since VBL
  while (read_dec() > value_after_delay)
    ;

  // How much time we wasted in this bullshit subroutine ;)
  _580e70 = read_tbl() - start_time;

  // Reset VBL count
  _vbl_count = 0;

  // Disables this routine the next time it is called. This is reset in
  // suba0e8() at the end of the main application loop.
  _580e32 = 1;
}


What really caught my eye here is the very precise value of 3300 that it is waiting for. Remember: DEC is simply a register that ticks down at the same rate as TBL ticks up. These high frequency counters are tied to the bus clock and tick every 4 bus cycles. On Model 3, the bus clock is 66 MHz across all steppings as far as we know. If you read my comment in the code, I suspect this value is trying to wait out the 4-line vsync pulse.

I also looked at the routine that is called at the *end* of the app loop. Mercifully, in attract mode, the bulk of this routine's code (which involves a bunch of confusing floating point arithmetic) is skipped over and we're left with this code path being executed:

Code: Select all
uint8_t _580e31;
uint32_t _274300;
uint32_t _274304;
uint32_t _27430c;
uint32_t _274314;
uint32_t _274318;

void suba0e8()
{
  sub39338();
  sub3b444();
  sub20db4();

  uint32_t time = read_tbl();
  _274318 = time;
  _274304 = time - _274314;
  _274300 = _274314 - _27430c;
  if ((_591d14 & 0x00400000) != 0)
  {
    // Some crazy stuff here that does not seem to be executed (at least not in
    // attract mode)
    // TODO: translate me?
  }

  // Wait for next VBL to start
  while (read_dec() < 0)
    ;

  // Unknown write to tilegen
  write32(0xf118000c, bswap32(3));

  // VBL has started, wait for decrementer exception to occur (wait for
  // calibrated time to pass)
  uint8_t old_value = _dec_count;
  while (_dec_count == old_value)
    ;
  if (_dec_count >= 2)
    _580e31++;
  _dec_count = 0;

  // Re-enable the wait loops in suba068()
  _580e32 = 0;
}


The interesting bit is at the end. This is causing a whole frame to be wasted.

I wonder if the actual intent of that strange calibration value (_dec_reload_on_vbl) is actually to time the workable part of a frame. The value may actually be very large and the game expects that after the main loop has run, it will still have some time left over. If not, and it ran over a bit, it syncs up to the next frame.

The two possible cases here are:

1. The function is entered before DEC has counted down to 0. In this case, it will simply wait until DEC has counted down to 0. I guess this is the case if the current frame's processing has finished "on schedule".

2. The function is entered after the DEC has counted down to 0, before VBL is triggered. This would seem to indicate that the CPU ran over schedule and we are perhaps already in the active display part of the frame. This code will then wait for VBL (first loop) *AND* then wait until DEC has expired, wasting an entire frame because then the next main loop iteration, the first sync function gets called.

What I'm still unsure of, embarrassingly, is where the game physics and logic get updated. I should probably time each of the three main loop subroutines and figure out where that's likely to occur. Maybe I was wrong and it actually happens first, before the Real3D transfers. The active display part of the frame is the longest, so you would think that game logic should ideally occur there, with VBL serving as a swap point for graphics (note that this would also introduce a full frame of graphics latency, but I think this is actually what happens on modern 3D GPUs as well).
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Frame timing

Postby Bart » Thu Aug 24, 2017 12:18 pm

Ian wrote:Bart,
would it help if you actually used the correct v-blank time?? Currently the one in the source is I assume pure guess work. You posted a link sometime ago to the system24 one and that has various times for v sync and v blank from what I recall. Would those be similar? If i recall they usde the same crazy 57.5 fps


You are correct of course :) I just typed out a post as you wrote this. LMK what you think. But yes, I think accurate VBL is essential. Fighting Vipers 2 turns out to be an interesting case because of its peculiar reliance on precise cycle counting. I don't think many other Model 3 games do it this way.

EDIT: hit reload if you don't see two code snippets. I made a mistake in my post.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Frame timing

Postby Ian » Thu Aug 24, 2017 2:47 pm

What do you have to prod/change to get attract mode to work at the right speed?
Even if only by guess work
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Frame timing

Postby Bart » Thu Aug 24, 2017 3:05 pm

Interesting discovery: I started manually modifying the value that the calibration function sets. At 68000 bus cycles (68000*6 CPU cycles), which corresponds to about 100 scanlines if my math is correct, FV2 attract mode starts to behave nearly correctly. The missing textures are now there, although they still pop in a few frames late, which means the value is still too low.

Note that VBL timing is not really that important if you think about it. The games don't seem to know when VBL actually ends nor do they care. What actually matters is 1) getting the whole number of cycles per frame correct (which we do already) and 2) making sure writes to 0x88000000 eat enough cycles.

As far as I can tell, the timing calibration routine isn't doing anything fancy. It is an odd bit of code if you look at it. I wouldn't have written it in such a convoluted way but I am now quite certain my interpretation is correct: it is trying to count how many cycles it takes for the status bit to flip. But wouldn't this be highly variable? The scene it has uploaded during the calibration procedure is trivial (TODO: still need to verify how large that single model is).

Perhaps this indicates that the Real3D really does lock to a particular frame rate and somehow defers swaps until VBL begins? In such a model, I would expect that the game wants to do its processing during the *active frame*.

Here's a theory I need to think about:

IRQ 02 and 01 both happen once per frame. No game ever seems to care about IRQ 01 but I have evidence from running my code on the real system that this IRQ happens as often as IRQ 02. I'll bet that one of them is tied to VBL start and the other to VBL end or something like that. Maybe the Real3D actually defers the page swap until VBL start?

In this case, IRQ 02 should be VBL end or something like that. The calibration procedure would wait until VBL end, trigger a swap, and then measure how long it took to actually occur. This value would give the active frame time, during which one would want to do all of their processing and uploads, given how relatively short VBL is.

The only thing that doesn't add up is that 3300 cycle delay. It *really* seems like the IRQ is triggered before the VBL pulse and that this code is waiting it out. Of course, the actual frame buffer flip can happen at any point before active scan begins -- at the start of VBL, end of VBL, or in the border regions.

As for the texture lag in FV2 -- either I need to try eating more cycles *or* the hardware responds differently to flushes depending on whether the texture FIFO is full or not. This wouldn't be hard to do nor would it be unusual: on flush, if the texture FIFO is full, upload the texture, else, wait until VBL to halt rendering and swap the buffer.

What do you think? I still need to clean up the code and give this a proper test. Having only looked at FV2 it may miserably fail elsewhere.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Frame timing

Postby Bart » Thu Aug 24, 2017 3:06 pm

Ian wrote:What do you have to prod/change to get attract mode to work at the right speed?
Even if only by guess work


Haha, we are almost in perfect sync today ;) I have to context switch now, though. I may not be able to resume my testing until Monday. Big deadline this Friday for some HoloLens stuff.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

PreviousNext

Return to The Dark Room

Who is online

Users browsing this forum: No registered users and 1 guest

cron