My working theory is the hardware must be doing a depth only pass on translucent polys. By translucent I mean PolyAlpha() not TextureAlpha(). Texture alpha is a different issue.
So it could be doing something like
Pass 1, render opaque polygons normally.
Pass 2, render translucent polys but just write depth values
Pass 3, render translucent polys normally.
The z buffer would be already filled so only the topmost polygons would render.
This is what we have currently (without the stencil hack)

The left side is fully opaque, because the back side renders first, then the front of the balloon renders ontop and blends together. On the right side the front renders first, so the back is automatically depth tested out and never draws.
This is what the real hardware is doing. Quality is the best I could find




Looks to me like the entire balloon is semi transparent. This is probably only possible with either depth sorting each individual poly .. which seems extremely unlikely. Or doing a depth only pass first.
Using the stencil test instead of a Z pass results in this monster

The shading is wrong because on the left you only see the back side, and on the right you only see the front. So this can't be the solution that was used.