PPC603e division by zero

Technical discussion for those interested in Supermodel development and Model 3 reverse engineering. Prospective contributors welcome.
Forum rules
Keep it classy!

  • No ROM requests or links.
  • Do not ask to be a play tester.
  • Do not ask about release dates.
  • No drama!

PPC603e division by zero

Postby Spindizzi » Mon Feb 13, 2017 1:23 pm

Hello,
I found something that may be interesting (or not!)
In the ppc603e floating point division instructions fdivx and fdivsx, there are enormous amount of division by zero (second operand register fpr(b) = 0) in many games

I don't know why we have so many call to fdivx and fdivsx with operand ((a or b) or (a and b)) set to 0
bad dissassembling ?

so I introduce a little test for that

In magtruck we have in the attract demo cycle :
demo (ok) -> ranking screen (black screen for the scene actually) -> demo ocean (polygon bug actually) -> ranking screen ....
and with the detection of div by zero
demo (ok) -> ranking screen (depending of the result in register see code fdivsx) -> demo ocean (ok) -> ranking screen ....

Code: Select all
static void ppc_fdivx(UINT32 op)
{
   UINT32 b = RB;
   UINT32 a = RA;
   UINT32 t = RT;

   CHECK_FPU_AVAILABLE();

    SET_VXSNAN(FPR(a), FPR(b));

   if (FPR(b).fd == 0.0)
   {
      FPR(t).fd = 0.0; // force result to zero
   }
   else
   {
      FPR(t).fd = FPR(a).fd / FPR(b).fd;
   }

   set_fprf(FPR(t));
   if (RCBIT) {
      SET_CR1();
      }
   
}


Code: Select all
static void ppc_fdivsx(UINT32 op)
{
   UINT32 b = RB;
   UINT32 a = RA;
   UINT32 t = RT;

   CHECK_FPU_AVAILABLE();

   SET_VXSNAN(FPR(a), FPR(b));

   if (FPR(b).fd == 0.0)
   {
      // if I force result fpr(t)=0 -> black screen on the ranking screen in magtruck
      // if I don't put any result in register fpr(t) -> screen is blinking on the ranking screen magtruck
      // if I force result fpr(t)=0.08 or 1 (or other value) -> we see the scene but deformed/zoomed (magtruck)
      FPR(t).fd = 0.08;
   }
   else
   {
      FPR(t).fd = (float)(FPR(a).fd / FPR(b).fd);
   }

   set_fprf(FPR(t));
   if( RCBIT ) {
      SET_CR1();
   }
}


May be someone can investigate more and better than I can do myself...

++
Spindizzi
 
Posts: 50
Joined: Thu Nov 17, 2016 8:55 am
Location: France

Re: PPC603e division by zero

Postby Ian » Tue Feb 14, 2017 3:38 am

Hey, both me and Bart had actually been looking at the magical truck issue, and had actually tried similar things. Although I hadn't tried writing non zero values in the divsx function. That's interesting :)

I was debugging some of the bad polys in mag truck and found good polys were mixed with nan values. I thought originally it could have been some memory transfer bug but the code is using dma which is extremely trivial. I then traced the bad values back to the cpu itself and then those 2 div functions. Magical truck also does a square root of zero.

I wonder where those zero values come from. Is it actually the game code or as a result of some rounding function which is incorrectly pushing the values towards zero?

Also what's the ocean hunter bug which is fixed by this? Screenshot would be interesting :)
Ian
 
Posts: 836
Joined: Tue Feb 23, 2016 9:23 am

Re: PPC603e division by zero

Postby Spindizzi » Tue Feb 14, 2017 6:53 am

Hi,
Pushing non zero little values to the result of the fdivsx show the 3d scene but distorted (focal) and the camera viewpoint seems to be locked to look down
I think you're right, and I pretty sure some code ahead change to 0 the values of reg a and b, or a sort of lose of precision .Do know if you have noticed just after the sega logo, the 3d scene appears rapidly and then blackout

I don't know if it is related
On the fdivsx doc, I see :
"The result is rounded to single-precision under control of the floating-point rounding control field RN of the FPSCR and placed into frD."
During ranking scene, the RN (coded on 2 bits 30 and 31) alterns from 0x10 and 0x01

When I spoke about demo ocean, it was the attract scene of magical truck with the shark not ocean hunter -> No more bad polygons bug
Image

little optimization: why always test if the ppc603e has FPU (same cpu for all revision board, no?), every call to fonction CHECK_FPU_AVAILABLE() could be removed.
++
Spindizzi
 
Posts: 50
Joined: Thu Nov 17, 2016 8:55 am
Location: France

Re: PPC603e division by zero

Postby Ian » Tue Feb 14, 2017 11:03 am

It would probably be worth checking all the instructions before the div 0, to see where the 0 value actually came from.
It could well be some rounding/precision issue.
Ian
 
Posts: 836
Joined: Tue Feb 23, 2016 9:23 am

Re: PPC603e division by zero

Postby Ian » Tue Feb 14, 2017 5:02 pm

Ian
 
Posts: 836
Joined: Tue Feb 23, 2016 9:23 am

Re: PPC603e division by zero

Postby Bart » Tue Feb 14, 2017 10:15 pm

I'm skeptical the problem is really with the fdiv instructions. I think tracing the source of the data that produces bad values might yield more interesting results. Unfortunately, this is an extremely painful process. Nik's debugger is excellent but I don't think it supports breakpoints with register conditions (which is necessary here). Might not be too hard to add though...

In the meantime, I spent an hour looking at one of the first divide-by-zero errors in magtruck by manually inserting breakpoints in the PPC execution loop and printing out register values. I discovered something pretty interesting. Here is the code. I've put annotations where appropriate.

Code: Select all
0x0001A650: 0x3D600025   li   r11,0x00250000
0x0001A654: 0x392BF424   addi   r9,r11,-0xBDC   ; r9 = 0x24f424
0x0001A658: 0x3D600025   li   r11,0x00250000
0x0001A65C: 0x394BF424   addi   r10,r11,-0xBDC  ; r10 = 0x24f424
0x0001A660: 0xC009000C   lfs   f0,0x0C(r9)       ; f0 = 585.004 (40824807a0000000)
0x0001A664: 0xC1AA0000   lfs   f13,0x00(r10)     ; f13 = 585.004 (40824807a0000000)
0x0001A668: 0xEC006828   fsubs   f0,f0,f13       ; f0 = 0 (0)
0x0001A66C: 0x3D600025   li   r11,0x00250000
0x0001A670: 0x392BF424   addi   r9,r11,-0xBDC   ; r9 = 0x24f424
0x0001A674: 0x3D600025   li   r11,0x00250000
0x0001A678: 0x394BF424   addi   r10,r11,-0xBDC  ; r10 = 0x24f424
0x0001A67C: 0xC1A90010   lfs   f13,0x10(r9)      ; f13 = -83.8817 (c054f86e60000000)
0x0001A680: 0xC18A0004   lfs   f12,0x04(r10)     ; f12 = 15.5906 (402f2e6500000000)
0x0001A684: 0xEDAD6028   fsubs   f13,f13,f12     ; f13 = -99.4724 (c058de3b00000000)
0x0001A688: 0x3D600025   li   r11,0x00250000
0x0001A68C: 0x392BF424   addi   r9,r11,-0xBDC   ; r9 = 0x24f424
0x0001A690: 0x3D600025   li   r11,0x00250000
0x0001A694: 0x394BF424   addi   r10,r11,-0xBDC  ; r10 = 0x24f424
0x0001A698: 0xC1890014   lfs   f12,0x14(r9)      ; f12 = -903.486 (c08c3be3e0000000)
0x0001A69C: 0xC16A0008   lfs   f11,0x08(r10)     ; f11 = -903.486 (c08c3be3e0000000)
0x0001A6A0: 0xED8C5828   fsubs   f12,f12,f11     ; f12 = 0 (0)
0x0001A6A4: 0xFC200090   fmr   f1,f0             ; f1 = 0
0x0001A6A8: 0xFC406890   fmr   f2,f13            ; f2 = -99.4724
0x0001A6AC: 0xFC606090   fmr   f3,f12            ; f3 = 0
0x0001A6B0: 0x4812C015   bl   0x001466C4
...


Code: Select all
0x001466C4: 0x3821FFD0   addi   r1,r1,-0x30
0x001466C8: 0x38000000   li   r0,0x00000000
0x001466CC: 0xEC010072   fmuls   f0,f1,f1          ; f0 = 0
0x001466D0: 0xED0300FA   fmadds   f8,f3,f3,f0     ; f8 = f3 * f3 + f0 = 0 * 0 + 0 = 0
0x001466D4: 0x9001000C   stw   r0,0x0C(r1)
0x001466D8: 0xFD200890   fmr   f9,f1               ; f9 = 0
0x001466DC: 0xFD401090   fmr   f10,f2              ; f10 = -99.4724
0x001466E0: 0xFD601890   fmr   f11,f3              ; f11 = 0
0x001466E4: 0x7C8802A6   mfspr   r4,lr
0x001466E8: 0xFC204090   fmr   f1,f8               ; f1 = 0
0x001466EC: 0x4800CA45   bl   0x00153130
0x001466F0: 0xFCE00890   fmr   f7,f1               ; f7 = 0
0x001466F4: 0xEC2A42BA   fmadds   f1,f10,f10,f8   ; f1 = f10 * f10 + f8 = 9894.75
0x001466F8: 0x4800CA39   bl   0x00153130          ; this routine does not change f7; return value is in f1 (and it ends up being 0)
0x001466FC: 0xEC470824   fdivs   f2,f7,f1          ; f2 = 0 / 0 = NaN
...


Code: Select all
0x00153130: 0xFC010828   fsub   f0,f1,f1
0x00153134: 0xFC810000   fcmpu   cr1,f1,f0
0x00153138: 0x40850040   bf   cr1[gt],0x00153178
0x0015313C: 0x3C600015   li   r3,0x00150000
0x00153140: 0x38633180   addi   r3,r3,0x3180
0x00153144: 0xFC000834   frsqrte   f0,f1
0x00153148: 0xC0630004   lfs   f3,0x04(r3)
0x0015314C: 0xFC400032   fmul   f2,f0,f0
0x00153150: 0xC0A30000   lfs   f5,0x00(r3)
0x00153154: 0xFC8118B8   fmsub   f4,f1,f2,f3
0x00153158: 0xFCC50032   fmul   f6,f5,f0
0x0015315C: 0xFC8401B2   fmul   f4,f4,f6
0x00153160: 0xFC440132   fmul   f2,f4,f4
0x00153164: 0xFC0118B8   fmsub   f0,f1,f2,f3
0x00153168: 0xFCC50132   fmul   f6,f5,f4
0x0015316C: 0xFC0001B2   fmul   f0,f0,f6
0x00153170: 0xFC200072   fmul   f1,f0,f1
0x00153174: 0x4E800020   bclr   0x14,0
0x00153178: 0xEC210828   fsubs   f1,f1,f1
0x0015317C: 0x4E800020   bclr   0x14,0


There are two pairs of values in memory that are identical and whose difference ends up being 0:

0x24f424 + 0, 0x24f424 + 0xc
0x24f424 + 0x14, 0x24f424 + 0x8

My guess is that this is where the problem starts. Could these be values from the current and last frame? If our frame timing is off, they might have ended up the same.
User avatar
Bart
Site Admin
 
Posts: 1827
Joined: Thu Sep 01, 2011 2:13 pm
Location: New York City

Re: PPC603e division by zero

Postby Ian » Wed Feb 15, 2017 3:33 am

I checked all the rounding modes and the code looks all fine
The only thing I would have changed was

INLINE INT64 smround_to_nearest(FPR f)

from
return -(INT64)(-f.fd + 0.5);
to
return (INT64)(f.fd - 0.5);

But the result is the same anyway.

If it is a timing issue, could maybe test by messing with the timing?
Ian
 
Posts: 836
Joined: Tue Feb 23, 2016 9:23 am

Re: PPC603e division by zero

Postby Bart » Wed Feb 15, 2017 8:39 am

Possibly. But when I say timing issue, I also mean IRQ timing. I'm starting to lose track a bit but I recall you had gotten the FMV intro in Scud Race Plus working? Also, note that Fighting Vipers 2 is one to pay careful attention to. Nik got it working by properly emulating PowerPC timing but noticed that it seemed to be behaving like a Step 1.x game timing-wise despite being a 2.x game. If FV2 timing is broken, game play ends up running at half speed or less.

Do you know how to use Nik's debugger (when Supermodel is compiled with SUPERMODEL_DEBUGGER defined)? You can set up memory watches. We can investigate how those areas of RAM I identified are modified during the relevant part of the attract mode.
User avatar
Bart
Site Admin
 
Posts: 1827
Joined: Thu Sep 01, 2011 2:13 pm
Location: New York City

Re: PPC603e division by zero

Postby Ian » Wed Feb 15, 2017 9:47 am

Well
I did some experiments with when to render the database, and concluded there is basically no end frame marker
The 0x88, just marks the end of the data section, ie graphic data etc. (You can see this as well from the sdk).
Mame renders after this command, this is why sega bass fishing runs at like 1/6th of the speed. Because 0x88 is written like 6 times per frame.

I also did some experiments with the real3d status bit. And you can 'fix' the video sequence in scud, and also fix daytona on the edge by messing with the timing. But other games won't even boot. Games sync with this bit upon loading. They will poll the bit until the state changes (actual value doesn't matter). But I've no idea what it is syncing with. I assumed it was v-sync, but it seems not to be the case. It could be v-sync / 2 for interlaced displays? That might even make sense. Interestingly the 2D overlays are also effected by the status bit. When messing with the value I had it so only half the text intro would load in virtua fighter. Even though the rest of the game worked flawlessly.

I'll have a tinker with it tonight and see if it makes any difference to mag truck
Ian
 
Posts: 836
Joined: Tue Feb 23, 2016 9:23 am

Re: PPC603e division by zero

Postby Bart » Wed Feb 15, 2017 1:58 pm

I recall the Real3D documentation implying that the device could be locked to certain frame rates but it isn't clear how this is achieved. You would think that it would take as much time as it needed to render, with the onus being on the programmer to ensure the scene database isn't too complex to render in 1/60th of a second.

The tilegen is definitely synced to vsync. That is to say, there is definitely an IRQ that is fired at every vblank. But the game can time itself in a number of different ways. It's a tricky problem...
User avatar
Bart
Site Admin
 
Posts: 1827
Joined: Thu Sep 01, 2011 2:13 pm
Location: New York City

Next

Return to The Dark Room

Who is online

Users browsing this forum: No registered users and 1 guest