Tilegen inside a fragment shader

Technical discussion for those interested in Supermodel development and Model 3 reverse engineering. Prospective contributors welcome.
Forum rules
Keep it classy!

  • No ROM requests or links.
  • Do not ask to be a play tester.
  • Do not ask about release dates.
  • No drama!

Tilegen inside a fragment shader

Postby Ian » Wed Sep 06, 2023 6:17 am

Why?
Because drawing is kind of slow on the cpu, and a GPU is a massive SIMD monster that can probably eat more data than you can throw at it. Originally I was thinking about writing a compute shader to do this, because with compute shaders you can write to arbitrary memory locations, but in the end I figured we could do it in a fragment shader.
Image

With older opengl (2.1) really you could only do floating point math on the GPU in the shader. There really was very little integer math other than maybe a loop counter, which might get unrolled anyway.
With gl3+ you can do full integer maths so %^|&& etc with signed and unsigned data types

How do we map the memory to the GPU?
Well in opengl we either get uniform memory or texture memory. The VRAM is way too big to pass as uniform memory to the shader, so must be mapped as a texture. How do we map vram as a texture? Well the bottom part of the memory is exactly 1mb, which we can map to a 512x512 texture (4 bytes per pixel) == 1mb. In order to index the vram texture, we have to generate texture coordinates to map to the memory locations. I use this simple function

Code: Select all
   ivec2 GetVRamCoords(int offset)
   {
      return ivec2(offset % 512, offset / 512);
   }


That essentially maps a 1d to a 2d array of a fixed width/height. The palette data works the same way, but with a different texture. To read the texture i use the texelFetch function, which takes integer coordinates and skips the bilinear/trilinear and other possible filtering modes associated with reading textures.

The shader inputs can map to an array of integer uniform inputs. Uniforms are great because conditional logic based upon inputs can be optimised away because every pixel for the draw call will evaluate to the same logic.

Here is what it looks like. This is drawing drawing like layer 2 directly to the frame buffer in a 496x384 window as a test

Image

This is the shader code
Code: Select all
// Vertex shader
static const char s_vertexShaderSourceNew[] = R"glsl(

   #version 410 core

   void main(void)
   {
      const vec4 vertices[] = vec4[](vec4(-1.0, -1.0, 0.0, 1.0),
                              vec4(-1.0,  1.0, 0.0, 1.0),
                              vec4( 1.0, -1.0, 0.0, 1.0),
                              vec4( 1.0,  1.0, 0.0, 1.0));

      gl_Position = vertices[gl_VertexID % 4];   
   }

   )glsl";

// Fragment shader
static const char s_fragmentShaderSourceNew[] = R"glsl(

   #version 410 core

   //layout(origin_upper_left) in vec4 gl_FragCoord;

   // inputs
   uniform usampler2D vram;         // texture 512x512
   uniform usampler2D palette;         // texture 128x256   - actual dimensions dont matter too much but we have to stay in the limits of max tex width/height, so can't have 1 giant 1d array
   uniform uint regs[32];
   uniform int layerNumber;

   // outputs
   out vec4 fragColor;

   ivec2 GetVRamCoords(int offset)
   {
      return ivec2(offset % 512, offset / 512);
   }

   ivec2 GetPaletteCoords(int offset)
   {
      return ivec2(offset % 128, offset / 128);
   }

   uint GetLineMask(int yCoord)
   {
      uint shift         = (layerNumber<2) ? 16u : 0u;                           // need to check this, we could be endian swapped so could be wrong
      uint maskPolarity   = ((layerNumber & 1) > 0) ? 0xFFFFu : 0x0000u;
      int index         = (0xF7000 / 4) + yCoord;

      ivec2 coords      = GetVRamCoords(index);
      uint mask         = ((texelFetch(vram,coords,0).r >> shift) & 0xFFFFu) ^ maskPolarity;

      return mask;
   }

   bool GetPixelMask(int xCoord, int yCoord)
   {
      uint lineMask = GetLineMask(yCoord);
      uint maskTest = 1 << (15-(xCoord/32));

      return (lineMask & maskTest) != 0;
   }

   int GetLineScrollValue(int layerNum, int yCoord)
   {
      int index = ((0xF6000 + layerNum * 0x400) / 4) + (yCoord /2);
      int shift = (yCoord % 2) * 16;                                    // double check this

      ivec2 coords = GetVRamCoords(index);
      return int((texelFetch(vram,coords,0).r >> shift) & 0xFFFFu);
   }

   int GetTileNumber(int xCoord, int yCoord, int xScroll, int yScroll)
   {
      int xIndex = ((xCoord + xScroll) / 8) & 0x3F;
      int yIndex = ((yCoord + yScroll) / 8) & 0x3F;
      
      return (yIndex*64) + xIndex;
   }

   int GetTileData(int layerNum, int tileNumber)
   {
      int addressBase = (0xF8000 + (layerNum * 0x2000)) / 4;
      int offset = tileNumber / 2;                     // two tiles per 32bit word
      int shift = (1 - (tileNumber % 2)) * 16;            // triple check this

      ivec2 coords = GetVRamCoords(addressBase+offset);
      uint data = (texelFetch(vram,coords,0).r >> shift) & 0xFFFFu;

      return int(data);
   }

   int GetVFine(int yCoord, int yScroll)
   {
      return (yCoord + yScroll) & 7;
   }

   int GetHFine(int xCoord, int xScroll)
   {
      return (xCoord + xScroll) & 7;
   }

   // register data
   bool LineScrollMode      (int layerNum)   { return (regs[0x60/4 + layerNum] & 0x8000) != 0; }
   int  GetHorizontalScroll(int layerNum)   { return int(regs[0x60 / 4 + layerNum] &0x3FFu); }
   int  GetVerticalScroll   (int layerNum)   { return int((regs[0x60/4 + layerNum] >> 16) & 0x1FFu); }
   int    LayerPriority      ()            { return int((regs[0x20/4] >> 8) & 0xFu); }
    bool LayerIs4Bit      (int layerNum)   { return (regs[0x20/4] & (1 << (12 + layerNum))) != 0; }
    bool LayerEnabled      (int layerNum)   { return (regs[0x60/4 + layerNum] & 0x80000000) != 0; }
    bool LayerSelected      (int layerNum)   { return (LayerPriority() & (1 << layerNum)) == 0; }

   float Int8ToFloat(uint c)
   {
      if((c & 0x80u) > 0u) {      // this is a bit harder in GLSL. Top bit means negative number, we extend to make 32bit
         return float(int(c | 0xFFFFFFu)) / 128.0;
      }
      else {
         return float(c) / 127.0;
      }
   }

   vec4 GetColourOffset(int layerNum, vec4 colour) 
   {
      uint offsetReg = regs[(0x40/4) + layerNum/2];

      vec4 c;
      c.b = Int8ToFloat((offsetReg >>16) & 0xFFu);
      c.g = Int8ToFloat((offsetReg >> 8) & 0xFFu);
      c.r = Int8ToFloat((offsetReg >> 0) & 0xFFu);
      c.a = 0.0;

      colour += c;

      return clamp(colour,0.0,1.0);      // clamp is probably not needed
   }

   vec4 Int16ColourToVec3(uint colour)
   {
      uint alpha = (colour>>15);      // top bit is alpha. 1 means clear, 0 opaque
      alpha = ~alpha;               // invert
      alpha = alpha & 0x1u;         // mask bit
      
      vec4 c;
      c.r = float((colour >> 0 ) & 0x1F) / 31.0;
      c.g = float((colour >> 5 ) & 0x1F) / 31.0;
      c.b = float((colour >> 10) & 0x1F) / 31.0;
      c.a = float(alpha) / 1.0;

      c.rgb *= c.a;      // multiply by alpha value, this will push transparent to black, no branch needed
      
      return c;
   }

   vec4 GetColour(int paletteOffset)
   {
      ivec2 coords = GetPaletteCoords(paletteOffset);
      uint colour = texelFetch(palette,coords,0).r;

      return Int16ColourToVec3(colour);      // each colour is only 16bits, but occupies 32bits
   }

   vec4 Draw4Bit(int tileData, int hFine, int vFine)
   {      
      // Tile pattern offset: each tile occupies 32 bytes when using 4-bit pixels (offset of tile pattern within VRAM)
      int patternOffset = ((tileData & 0x3FFF) << 1) | ((tileData >> 15) & 1);
      patternOffset *= 32;
      patternOffset /= 4;

      // Upper color bits; the lower 4 bits come from the tile pattern
      int paletteIndex = tileData & 0x7FF0;

      ivec2 coords = GetVRamCoords(patternOffset+vFine);
      uint pattern = texelFetch(vram,coords,0).r;
      pattern = (pattern >> ((7-hFine)*4)) & 0xFu;         // get the pattern for our horizontal value

      return GetColour(paletteIndex | int(pattern));
   }

   vec4 Draw8Bit(int tileData, int hFine, int vFine)
   {
      // Tile pattern offset: each tile occupies 64 bytes when using 8-bit pixels
      int patternOffset = tileData & 0x3FFF;
      patternOffset *= 64;
      patternOffset /= 4;

      // Upper color bits
      int paletteIndex = tileData & 0x7F00;

      // each read is 4 pixels
      int offset = hFine / 4;

      ivec2 coords = GetVRamCoords(patternOffset+(vFine*2)+offset);      // 8-bit pixels, each line is two words
      uint pattern = texelFetch(vram,coords,0).r;

      pattern = (pattern >> ((3-(hFine%4))*8)) & 0xFFu;               // shift out the bits we want for this pixel

      return GetColour(paletteIndex | int(pattern));
   }
   
   void main()
   {
      ivec2 pos = ivec2(gl_FragCoord.xy);

      int tileNumber = GetTileNumber(pos.x,pos.y,0,0);
      int hFine = GetHFine(pos.x,0);
      int vFine = GetVFine(pos.y,0);

      int tileData = GetTileData(layerNumber,tileNumber);

      if(LayerIs4Bit(layerNumber)) {
         fragColor = Draw4Bit(tileData,hFine,vFine);
      }
      else {
         fragColor = Draw8Bit(tileData,hFine,vFine);
      }
   }

   )glsl";


What's left? Well most of the code is pretty much there. I just need to plugin scroll values/masking. Maybe can generate 1 layer of A and A' with a single draw pass?
The colour offsets can also be directly applied in the shader, so no need to pre-calculate these. This will simplify the logic in the tilegen class a lot.

Upsides/Downsides?

Downside is fragment shaders are much harder to debug than cpu side code. Emulating possible querks of the tilegen such as the fact the shift register? is reloaded only 8 pixels might be harder in a fragment shader and might require some ugly logic, as every pixel in the fragment shader is basically independent. Also if we emulated drawing 1 line at a time we would start to lose the speed benefits of doing it on the GPU as each draw call has a cost.

Upside?
So fast might as well be free, at least on a non ghetto gpu from this decade :p
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Tilegen inside a fragment shader

Postby Bart » Wed Sep 06, 2023 12:29 pm

This is really cool! I'm hoping it works with both legacy and new engines and perhaps can be left configurable. It's always a bit dicey having two implementations of something lying around because, like the old engine, they will naturally start to diverge but the tilegen is mostly solved except for the window mask thing. For those ghetto GPUs, having the CPU version around as a fallback might be useful.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Tilegen inside a fragment shader

Postby Ian » Wed Sep 06, 2023 2:06 pm

Yes it'll work with the legacy engine :)
Currently it's just a proof of concept. I could also probably lower the required glsl version to maybe 330. I'd have to check.

But with any luck most people would get a speed up even on lower end systems.
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Tilegen inside a fragment shader

Postby Ian » Fri Sep 08, 2023 3:10 pm

Not at pc currently but I saw mame was doing this with the line scroll values

int rowscroll = BYTE_REVERSE16(rowscroll_ram[((layer * 0x200) + y) ^ NATIVE_ENDIAN_VALUE_LE_BE(3,0)]) & 0x7fff;
if (rowscroll & 0x100)
rowscroll |= ~0x1ff

Basically if line scroll is larger than 256 it makes the value negative. So treating the values as signed. I wonder if that makes any difference to the maths
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Tilegen inside a fragment shader

Postby Bart » Sat Sep 09, 2023 7:38 pm

I think it would but I don't know why they do this. Is this code also present in the System 24 and Model 2 tile generator logic? Model 2 and 3 use the System 24 tile generator, which was a 2D system that would have made heavy use of scrolling.
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Re: Tilegen inside a fragment shader

Postby Ian » Sat Sep 09, 2023 8:29 pm

Mame is doing this for system 24

uint16_t *hscrtb = tile_ram.get() + 0x4000 + 0x200*layer :mrgreen: ;

hscr = (-hscrtb[y]) & 0x1ff;

It makes them negative. I'm not sure what that even does to an unsigned number. Maybe it just makes them negative because scrolling happens in the left direction.
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Tilegen inside a fragment shader

Postby Ian » Sat Sep 09, 2023 8:56 pm

Rolling start apparently does work correctly on mame

https://youtu.be/F1ttfo3KUYw

Go to 6mins 41 seconds. The game eventually hangs but no scroll issues present
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Tilegen inside a fragment shader

Postby gm_matthew » Sun Sep 10, 2023 3:43 am

Ian wrote:Rolling start apparently does work correctly on mame

https://youtu.be/F1ttfo3KUYw

Go to 6mins 41 seconds. The game eventually hangs but no scroll issues present


The line scrolling code in MAME is broken as it's scrolling the "ROLLING START" banner twice as much as it should, scrolling it 16 pixels per frame when it should only be scrolling 8. This is not a timing issue; I've tested it myself in MAME 0.220 and if I use the debugger to edit the layer A' scrolling table and scroll certain lines left by an extra 8 pixels, they instead scroll an extra 16:

Image

I'd test the scroll registers as well but unfortunately I can't get them to work properly in the debugger since MAME isn't set up to actually read them.
gm_matthew
 
Posts: 224
Joined: Fri Oct 07, 2011 7:29 am
Location: Bristol, UK

Re: Tilegen inside a fragment shader

Postby Ian » Sun Sep 10, 2023 4:15 am

hm well that sucks :)
i tested with signed maths also and it didn't make any difference, so the error is definitely not here
Ian
 
Posts: 2044
Joined: Tue Feb 23, 2016 9:23 am

Re: Tilegen inside a fragment shader

Postby Bart » Sun Sep 10, 2023 8:25 pm

I need to take a look at this again at some point. Matthew: I'll look around the area of your patch but do you happen to have the code that updates the window layer (I call it 'stencil' in TileGen.cpp but it's actually 'window' in Sega terminology, similar to the window layer on Sega Genesis) disassembled for reference?
User avatar
Bart
Site Admin
 
Posts: 3086
Joined: Thu Sep 01, 2011 2:13 pm
Location: Reno, Nevada

Next

Return to The Dark Room

Who is online

Users browsing this forum: No registered users and 1 guest