I've now spent days and countless hours looking at this without any breakthroughs yet. Fighting Vipers 2 is particularly strange. Haven't delved too far into any other Model 3 games before but I don't recall them relying on the decrementer as a means of syncing to VBL. What's especially bizarre is that there is an actual wait_for_vbl() function in the game, used frequently during boot-up initialization, but then after it performs the frame timing calculation I found early on, it relies entirely on looking at the value of the decrementer to determine where it is relative to VBL.
I'm fairly confident now that the status bit is merely an indicator for which memory page is active. I also think that this buffering mechanism only affects the RAM region at 0x8Exxxxxx, and not the other one at 0x8Cxxxxxx that contains the culling nodes. I'm not entirely sure how games determine it is safe to write to 0x8Cxxxxxx. Without having looked at it yet, I'm going to speculate that they first write a dummy viewport to both pages of 8E ping-pong RAM (requiring two swap commands), then load up a bunch of nodes and resume rendering by updating the display lists and matrices in ping-pong RAM.
The reason Fighting Vipers 2 is running so slowly in attract mode (half speed) is that each iteration of the main application loop takes two frames (two VBLs). Why? Therein lies the mystery but today I mapped out the sequence of Real3D and frame-sync events that occur in each iteration of the loop. Most importantly, I've discovered two very interesting sync functions that I need to wrap my head around.
The main loop is just this:
- Code: Select all
0x00038F94: 0x48000011 bl 0x00038FA4 <-- all DMA transfers to Real3D eventually happen in here, with a DEC loop wait at the beginning
0x00038F98: 0x4803B9FD bl 0x00074994 <-- I don't know what this is but nothing hardware related happens. Maybe program logic.
0x00038F9C: 0x4BFD114D bl 0x0000A0E8 <-- another DEC loop wait occurs here
0x00038FA0: 0x4BFFFFF4 b 0x00038F94
The third function A0E8 is some sort of frame sync function but it also does a *lot* more at the beginning. I'm not sure what but in addition to some floating point calculations it calls a number of subroutines. It also resets some internal state variable every other frame. I need to understand this better. The basic frame sync logic is peculiar and seems to work like this:
1. If the DEC value is *negative*, it has counted down past 0 and we are therefore well past the last VBL IRQ. Wait until it becomes positive, which indicates VBL happened because the first thing the VBL IRQ handler does is re-arm DEC with the value calibrated at boot-up.
2. Now that DEC is positive and a VBL has definitely occured, write the value 3 to 0xf118000c (a tilegen register whose purpose is unknown but that seems to often get written after VBL -- it is *not* the ack register, though, which is already known and handled in the IRQ routine itself).
3. Wait for DEC exception to happen. This occurs when DEC hits 0. The loop here detects this by monitoring a variable that is incremented by the DEC exception handler.
4. If the DEC count has reached 2, increment some other counter in RAM and clear the DEC count. My guess is this drives some sort of logic that updates at 30 Hz. Maybe the physics loop?
Once this is all done, the main loop goes back to the first function. This is a crucial routine and does a *lot* of stuff. All Real3D updates happen in there as well. I don't know if it also handles game logic but I don't think so given the sort of other things it does.
- Code: Select all
0x00038FA4: 0x9421FFF0 stwu r1,-0x10(r1)
0x00038FA8: 0x7C0802A6 mfspr r0,lr
0x00038FAC: 0x90010004 stw r0,0x04(r1)
0x00038FB0: 0x7C0C42E6 mftb r0,tbl
0x00038FB4: 0x3D400058 li r10,0x00580000
0x00038FB8: 0x816A0E28 lwz r11,0xE28(r10)
0x00038FBC: 0x3D200027 li r9,0x00270000
0x00038FC0: 0x90094314 stw r0,0x4314(r9)
0x00038FC4: 0x3D20FE10 li r9,0xFE100000
0x00038FC8: 0x380B0001 addi r0,r11,0x01
0x00038FCC: 0x900A0E28 stw r0,0xE28(r10)
0x00038FD0: 0x7D6B00D0 neg r11,r11
0x00038FD4: 0x9969001C stb r11,0x1C(r9) ; update board LED
0x00038FD8: 0x48002469 bl 0x0003B440
0x00038FDC: 0x4BFE7C35 bl 0x00020C10
0x00038FE0: 0x3D200011 li r9,0x00110000
0x00038FE4: 0x88090B20 lbz r0,0xB20(r9)
0x00038FE8: 0x2C800001 cmpi cr1,0,r0,0x01
0x00038FEC: 0x41860034 bt cr1[eq],0x00039020
0x00038FF0: 0x3D200058 li r9,0x00580000
0x00038FF4: 0x38000000 li r0,0x00000000
0x00038FF8: 0x98090F03 stb r0,0xF03(r9)
0x00038FFC: 0x3D200058 li r9,0x00580000
0x00039000: 0x98090F02 stb r0,0xF02(r9)
0x00039004: 0x3D200058 li r9,0x00580000
0x00039008: 0x98090F01 stb r0,0xF01(r9)
0x0003900C: 0x3D200058 li r9,0x00580000
0x00039010: 0x98090F00 stb r0,0xF00(r9)
0x00039014: 0x3D200059 li r9,0x00590000
0x00039018: 0x38000000 li r0,0x00000000
0x0003901C: 0x90091D14 stw r0,0x1D14(r9)
0x00039020: 0x4BFE7A3D bl 0x00020A5C
0x00039024: 0x4BFE82DD bl 0x00021300
0x00039028: 0x48038D15 bl 0x00071D3C
0x0003902C: 0x3D200059 li r9,0x00590000
0x00039030: 0x8009FD4C lwz r0,-0x2B4(r9)
0x00039034: 0x70090C00 andi. r9,r0,0x0C00
0x00039038: 0x41820024 bt cr0[eq],0x0003905C ; this path is taken during attract mode...
0x0003903C: 0x3D200059 li r9,0x00590000
0x00039040: 0x8809FD99 lbz r0,-0x267(r9)
0x00039044: 0x2880001F cmpli cr1,0,r0,0x001F
0x00039048: 0x41850014 bt cr1[gt],0x0003905C
0x0003904C: 0x4BFD15C9 bl 0x0000A614 <-- saves all regs, including FP
0x00039050: 0x4BFD1B75 bl 0x0000ABC4
0x00039054: 0x4BFD163D bl 0x0000A690 <-- restores all regs
0x00039058: 0x48000028 b 0x00039080
0x0003905C: 0x4BFCBE81 bl 0x00004EDC
0x00039060: 0x4BFD15B5 bl 0x0000A614 ; save all regs
0x00039064: 0x4BFD1B61 bl 0x0000ABC4
0x00039068: 0x4BFD1629 bl 0x0000A690 ; restore all regs
0x0003906C: 0x480000BD bl 0x00039128 <-- DMA transfers and status reg read here (also, an interesting routine at A424 saves TBL at a certain point into SPRG3)
This function may also delay one frame occasionally, which explains why FV2 intro runs so slowly (30FPS)
Also, function A068 is called here, which does some sort of DEC frame timing.
0x00039070: 0x4BFE947D bl 0x000224EC
0x00039074: 0x4800024D bl 0x000392C0
0x00039078: 0x4801CCE5 bl 0x00055D5C
0x0003907C: 0x4BFCBF89 bl 0x00005004 <-- more DMA transfers here (at 5220, 88000000 is written)
0x00039080: 0x80010004 lwz r0,0x04(r1)
0x00039084: 0x7C0803A6 mtspr lr,r0
0x00039088: 0x38210010 addi r1,r1,0x10
0x0003908C: 0x4E800020 bclr 0x14,0
You can see some notes to myself there. The bottom is where things are interesting. Normally, during actual game play and parts of the attract mode where textures are *not* being uploaded, the sequence of events here goes something like this:
1. Function A068 is called. It's several levels down the call stack from function 39128, which is the top-level function for Real3D transfers to polygon RAM and, evidently, texture RAM. Not sure about culling RAM (8C). Synchronization of some sort is performed. This function is short but important. It also writes to 0xf118000c. And it does something especially interesting: it actually waits for only a portion of the DEC countdown. That is, it waits a very specific number of cycles.
2. DMA copies to 98xxxxxx are performed, still inside of function 39128.
3. Eventually, function 5004 is executed. Here, ping-pong memory is updated (8E) and a flush or swap is triggered by writing 88000000.
This sequence gets more interesting during texture uploads. In attract mode, textures are only uploaded using the VROM mechanism, which involves writing a special Real3D register with the address of a texture in VROM to load up. A swap command (88000000) happens after each. But here's the interesting part: the function at A068 is called before *each* VROM texture register write. It doesn't necessarily sync to the next frame but I think what is happening is that it may be checking to see whether there is sufficient time remaining to perform texture transfers, thereby limiting how many are done per frame. I'm going to test this idea tomorrow by manually patching up the code to try to change the cycle count and see if the game tries to perform more uploads per frame.
More tomorrow, hopefully.