There are 3 frame buffers. 1 opaque and 2 for the transparent layers. Each depth tests separately.
So..
496×384 × 4 bytes per pixel is 671k
And also a depth buffer. Depth buffer is 24 bits probably with packed stencil too.
So
496×384 × 4 bytes per pixel so 671k
(671 ×2) × 3 frame buffers is 4.5meg of ram. That's not even counting memory memory for page flipping. Potentially would need another 1.5meg for a final two buffers for composition.
I mean it's possible the architecture doesn't work like this at all. But the result is perfect for every game. Seems an overly expensive solution.