Supermodel Forum

by **Spindizzi** » Mon Feb 13, 2017 1:23 pm

Hello,
I found something that may be interesting (or not!)
In the ppc603e floating point division instructions fdivx and fdivsx, there are enormous amount of division by zero (second operand register fpr(b) = 0) in many games

I don't know why we have so many call to fdivx and fdivsx with operand ((a or b) or (a and b)) set to 0
bad dissassembling ?

so I introduce a little test for that

In magtruck we have in the attract demo cycle :
demo (ok) -> ranking screen (black screen for the scene actually) -> demo ocean (polygon bug actually) -> ranking screen ....
and with the detection of div by zero
demo (ok) -> ranking screen (depending of the result in register see code fdivsx) -> demo ocean (ok) -> ranking screen ....

Code: Select all: static void ppc_fdivx(UINT32 op) { UINT32 b = RB; UINT32 a = RA; UINT32 t = RT; CHECK_FPU_AVAILABLE(); SET_VXSNAN(FPR(a), FPR(b)); if (FPR(b).fd == 0.0) { FPR(t).fd = 0.0; // force result to zero } else { FPR(t).fd = FPR(a).fd / FPR(b).fd; } set_fprf(FPR(t)); if (RCBIT) { SET_CR1(); } }

Code: Select all: static void ppc_fdivsx(UINT32 op) { UINT32 b = RB; UINT32 a = RA; UINT32 t = RT; CHECK_FPU_AVAILABLE(); SET_VXSNAN(FPR(a), FPR(b)); if (FPR(b).fd == 0.0) { // if I force result fpr(t)=0 -> black screen on the ranking screen in magtruck // if I don't put any result in register fpr(t) -> screen is blinking on the ranking screen magtruck // if I force result fpr(t)=0.08 or 1 (or other value) -> we see the scene but deformed/zoomed (magtruck) FPR(t).fd = 0.08; } else { FPR(t).fd = (float)(FPR(a).fd / FPR(b).fd); } set_fprf(FPR(t)); if( RCBIT ) { SET_CR1(); } }

May be someone can investigate more and better than I can do myself...

++

by **Ian** » Tue Feb 14, 2017 3:38 am

Hey, both me and Bart had actually been looking at the magical truck issue, and had actually tried similar things. Although I hadn't tried writing non zero values in the divsx function. That's interesting

I was debugging some of the bad polys in mag truck and found good polys were mixed with nan values. I thought originally it could have been some memory transfer bug but the code is using dma which is extremely trivial. I then traced the bad values back to the cpu itself and then those 2 div functions. Magical truck also does a square root of zero.

I wonder where those zero values come from. Is it actually the game code or as a result of some rounding function which is incorrectly pushing the values towards zero?

Also what's the ocean hunter bug which is fixed by this? Screenshot would be interesting

by **Spindizzi** » Tue Feb 14, 2017 6:53 am

Hi,
Pushing non zero little values to the result of the fdivsx show the 3d scene but distorted (focal) and the camera viewpoint seems to be locked to look down
I think you're right, and I pretty sure some code ahead change to 0 the values of reg a and b, or a sort of lose of precision .Do know if you have noticed just after the sega logo, the 3d scene appears rapidly and then blackout

I don't know if it is related
On the fdivsx doc, I see :
"The result is rounded to single-precision under control of the floating-point rounding control field RN of the FPSCR and placed into frD."
During ranking scene, the RN (coded on 2 bits 30 and 31) alterns from 0x10 and 0x01

When I spoke about demo ocean, it was the attract scene of magical truck with the shark not ocean hunter -> No more bad polygons bug

little optimization: why always test if the ppc603e has FPU (same cpu for all revision board, no?), every call to fonction CHECK_FPU_AVAILABLE() could be removed.
++

by **Ian** » Tue Feb 14, 2017 11:03 am

It would probably be worth checking all the instructions before the div 0, to see where the 0 value actually came from.
It could well be some rounding/precision issue.

by **Ian** » Tue Feb 14, 2017 5:02 pm

dunno if this is any help
https://github.com/dolphin-emu/dolphin/ ... t.cpp#L352
https://github.com/dolphin-emu/dolphin/ ... ils.h#L105

by **Bart** » Tue Feb 14, 2017 10:15 pm

I'm skeptical the problem is really with the fdiv instructions. I think tracing the source of the data that produces bad values might yield more interesting results. Unfortunately, this is an extremely painful process. Nik's debugger is excellent but I don't think it supports breakpoints with register conditions (which is necessary here). Might not be too hard to add though...

In the meantime, I spent an hour looking at one of the first divide-by-zero errors in magtruck by manually inserting breakpoints in the PPC execution loop and printing out register values. I discovered something pretty interesting. Here is the code. I've put annotations where appropriate.

Code: Select all: 0x0001A650: 0x3D600025 li r11,0x00250000 0x0001A654: 0x392BF424 addi r9,r11,-0xBDC ; r9 = 0x24f424 0x0001A658: 0x3D600025 li r11,0x00250000 0x0001A65C: 0x394BF424 addi r10,r11,-0xBDC ; r10 = 0x24f424 0x0001A660: 0xC009000C lfs f0,0x0C(r9) ; f0 = 585.004 (40824807a0000000) 0x0001A664: 0xC1AA0000 lfs f13,0x00(r10) ; f13 = 585.004 (40824807a0000000) 0x0001A668: 0xEC006828 fsubs f0,f0,f13 ; f0 = 0 (0) 0x0001A66C: 0x3D600025 li r11,0x00250000 0x0001A670: 0x392BF424 addi r9,r11,-0xBDC ; r9 = 0x24f424 0x0001A674: 0x3D600025 li r11,0x00250000 0x0001A678: 0x394BF424 addi r10,r11,-0xBDC ; r10 = 0x24f424 0x0001A67C: 0xC1A90010 lfs f13,0x10(r9) ; f13 = -83.8817 (c054f86e60000000) 0x0001A680: 0xC18A0004 lfs f12,0x04(r10) ; f12 = 15.5906 (402f2e6500000000) 0x0001A684: 0xEDAD6028 fsubs f13,f13,f12 ; f13 = -99.4724 (c058de3b00000000) 0x0001A688: 0x3D600025 li r11,0x00250000 0x0001A68C: 0x392BF424 addi r9,r11,-0xBDC ; r9 = 0x24f424 0x0001A690: 0x3D600025 li r11,0x00250000 0x0001A694: 0x394BF424 addi r10,r11,-0xBDC ; r10 = 0x24f424 0x0001A698: 0xC1890014 lfs f12,0x14(r9) ; f12 = -903.486 (c08c3be3e0000000) 0x0001A69C: 0xC16A0008 lfs f11,0x08(r10) ; f11 = -903.486 (c08c3be3e0000000) 0x0001A6A0: 0xED8C5828 fsubs f12,f12,f11 ; f12 = 0 (0) 0x0001A6A4: 0xFC200090 fmr f1,f0 ; f1 = 0 0x0001A6A8: 0xFC406890 fmr f2,f13 ; f2 = -99.4724 0x0001A6AC: 0xFC606090 fmr f3,f12 ; f3 = 0 0x0001A6B0: 0x4812C015 bl 0x001466C4 ...

Code: Select all: 0x001466C4: 0x3821FFD0 addi r1,r1,-0x30 0x001466C8: 0x38000000 li r0,0x00000000 0x001466CC: 0xEC010072 fmuls f0,f1,f1 ; f0 = 0 0x001466D0: 0xED0300FA fmadds f8,f3,f3,f0 ; f8 = f3 * f3 + f0 = 0 * 0 + 0 = 0 0x001466D4: 0x9001000C stw r0,0x0C(r1) 0x001466D8: 0xFD200890 fmr f9,f1 ; f9 = 0 0x001466DC: 0xFD401090 fmr f10,f2 ; f10 = -99.4724 0x001466E0: 0xFD601890 fmr f11,f3 ; f11 = 0 0x001466E4: 0x7C8802A6 mfspr r4,lr 0x001466E8: 0xFC204090 fmr f1,f8 ; f1 = 0 0x001466EC: 0x4800CA45 bl 0x00153130 0x001466F0: 0xFCE00890 fmr f7,f1 ; f7 = 0 0x001466F4: 0xEC2A42BA fmadds f1,f10,f10,f8 ; f1 = f10 * f10 + f8 = 9894.75 0x001466F8: 0x4800CA39 bl 0x00153130 ; this routine does not change f7; return value is in f1 (and it ends up being 0) 0x001466FC: 0xEC470824 fdivs f2,f7,f1 ; f2 = 0 / 0 = NaN ...

Code: Select all: 0x00153130: 0xFC010828 fsub f0,f1,f1 0x00153134: 0xFC810000 fcmpu cr1,f1,f0 0x00153138: 0x40850040 bf cr1[gt],0x00153178 0x0015313C: 0x3C600015 li r3,0x00150000 0x00153140: 0x38633180 addi r3,r3,0x3180 0x00153144: 0xFC000834 frsqrte f0,f1 0x00153148: 0xC0630004 lfs f3,0x04(r3) 0x0015314C: 0xFC400032 fmul f2,f0,f0 0x00153150: 0xC0A30000 lfs f5,0x00(r3) 0x00153154: 0xFC8118B8 fmsub f4,f1,f2,f3 0x00153158: 0xFCC50032 fmul f6,f5,f0 0x0015315C: 0xFC8401B2 fmul f4,f4,f6 0x00153160: 0xFC440132 fmul f2,f4,f4 0x00153164: 0xFC0118B8 fmsub f0,f1,f2,f3 0x00153168: 0xFCC50132 fmul f6,f5,f4 0x0015316C: 0xFC0001B2 fmul f0,f0,f6 0x00153170: 0xFC200072 fmul f1,f0,f1 0x00153174: 0x4E800020 bclr 0x14,0 0x00153178: 0xEC210828 fsubs f1,f1,f1 0x0015317C: 0x4E800020 bclr 0x14,0

There are two pairs of values in memory that are identical and whose difference ends up being 0:

0x24f424 + 0, 0x24f424 + 0xc
0x24f424 + 0x14, 0x24f424 + 0x8

My guess is that this is where the problem starts. Could these be values from the current and last frame? If our frame timing is off, they might have ended up the same.

by **Ian** » Wed Feb 15, 2017 3:33 am

I checked all the rounding modes and the code looks all fine
The only thing I would have changed was

INLINE INT64 smround_to_nearest(FPR f)

from
return -(INT64)(-f.fd + 0.5);
to
return (INT64)(f.fd - 0.5);

But the result is the same anyway.

If it is a timing issue, could maybe test by messing with the timing?

by **Bart** » Wed Feb 15, 2017 8:39 am

Possibly. But when I say timing issue, I also mean IRQ timing. I'm starting to lose track a bit but I recall you had gotten the FMV intro in Scud Race Plus working? Also, note that Fighting Vipers 2 is one to pay careful attention to. Nik got it working by properly emulating PowerPC timing but noticed that it seemed to be behaving like a Step 1.x game timing-wise despite being a 2.x game. If FV2 timing is broken, game play ends up running at half speed or less.

Do you know how to use Nik's debugger (when Supermodel is compiled with SUPERMODEL_DEBUGGER defined)? You can set up memory watches. We can investigate how those areas of RAM I identified are modified during the relevant part of the attract mode.

by **Ian** » Wed Feb 15, 2017 9:47 am

Well
I did some experiments with when to render the database, and concluded there is basically no end frame marker
The 0x88, just marks the end of the data section, ie graphic data etc. (You can see this as well from the sdk).
Mame renders after this command, this is why sega bass fishing runs at like 1/6th of the speed. Because 0x88 is written like 6 times per frame.

I also did some experiments with the real3d status bit. And you can 'fix' the video sequence in scud, and also fix daytona on the edge by messing with the timing. But other games won't even boot. Games sync with this bit upon loading. They will poll the bit until the state changes (actual value doesn't matter). But I've no idea what it is syncing with. I assumed it was v-sync, but it seems not to be the case. It could be v-sync / 2 for interlaced displays? That might even make sense. Interestingly the 2D overlays are also effected by the status bit. When messing with the value I had it so only half the text intro would load in virtua fighter. Even though the rest of the game worked flawlessly.

I'll have a tinker with it tonight and see if it makes any difference to mag truck

by **Bart** » Wed Feb 15, 2017 1:58 pm

I recall the Real3D documentation implying that the device could be locked to certain frame rates but it isn't clear how this is achieved. You would think that it would take as much time as it needed to render, with the onus being on the programmer to ensure the scene database isn't too complex to render in 1/60th of a second.

The tilegen is definitely synced to vsync. That is to say, there is definitely an IRQ that is fired at every vblank. But the game can time itself in a number of different ways. It's a tricky problem...

Supermodel Forum

PPC603e division by zero

PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Re: PPC603e division by zero

Who is online