[C] Understanding an interesting piece of code

[C] Understanding an interesting piece of code

Postby Pix3l » 14 Jan 2023, 21:29

Hello everyone!

Some times ago I found an interesting bomberman clone written in C with some cool graphics effects in it.
Trying to understand how it achieved those effects, I put up togheter the minimum code required to reproduce them, but still have trouble understanding some very obscure (to me) pieces...
Especially this routine:

{l Code}: {l Select All Code}
void Rotator(void)
{
   u32   ix, iy;
   u16   px, py, px2, py2, incx, incy;
   u32   nMultiplier;
   u8   *pScr, *pPic;

   SDL_LockSurface(screen);

   // Multiplicateur (Zoom).
    #if RZ_TYPE   == 0
        nMultiplier = 0x08000 + ((0x0060 + pCos[(gRoto.nAngle + 192) & 0xFF]/2) << 9);
    #else
        nMultiplier = 0x08000 + ((0x0060 + pCos[gRoto.nZoom]/2) << 9);
    #endif

   // Val sin et cos (Roto).
   incx = (pCos[gRoto.nAngle] * nMultiplier) >> 16;
   incy = (pSin[gRoto.nAngle] * nMultiplier) >> 16;

   // Point de départ.
   // x' = x cos f - y sin f
   // y' = y cos f + x sin f
   px = (128 << 8) - ((ROTO_CTR_X * incx) - (ROTO_CTR_Y * incy));
   py = (128 << 8) - ((ROTO_CTR_Y * incx) + (ROTO_CTR_X * incy));

   // Loop.
   pScr = (u8*)screen->pixels;
   pPic = (u8*)pRotoPic->pixels;
   pPic += (gRoto.nScroll & 63) << 8;   // Scroll.
   s32   nPitchDiff = screen->pitch - SCREEN_WIDTH;
   
   for (iy = 0; iy < SCREEN_HEIGHT; iy++)
   {
      px2 = px;
      py2 = py;

      for (ix = 0; ix < SCREEN_WIDTH; ix++)
      {
         *pScr++ = *(pPic + ((py2 >> 8) << 8) + (px2 >> 8));

         px2 += incx;
         py2 += incy;
      }
      pScr += nPitchDiff;

      //incx++;      // Déformation.

      px -= incy;
      py += incx;
   }

    #if RZ_TYPE   == 0
        gRoto.nAngle += 1;
    #else
        // Déplacement des rotations, zooms, scrolls...
        gRoto.nAngCnt--;
        gRoto.nAngCnt &= 63;
        if (gRoto.nAngCnt == 0)
        {
            gRoto.nAngVar = (gRoto.nAngVar + 1) & 3;
            gRoto.nAngCnt = (rand() & 63) | 16;
        }
        if (gRoto.nAngVar == 1) gRoto.nAngle++;
        else if (gRoto.nAngVar == 3) gRoto.nAngle--;
        else gRoto.nScroll++;

        gRoto.nZoomCnt--;
        gRoto.nZoomCnt &= 63;
        if (gRoto.nZoomCnt == 0)
        {
            gRoto.nZoomVar = (gRoto.nZoomVar + 1) & 1;
            gRoto.nZoomCnt = (rand() & 63) | 32;
        }
        if (gRoto.nZoomVar == 0) gRoto.nZoom++;
    #endif

   SDL_UnlockSurface(screen);

}


I can't understand some numbers come from, like in
nMultiplier = 0x08000 + ((0x0060 + pCos[(gRoto.nAngle + 192) & 0xFF]/2) << 9);
or
px = (128 << 8) - ((ROTO_CTR_X * incx) - (ROTO_CTR_Y * incy));
, are them some kind of mathematical constants?

I've tracked down the author website (in french) and found a simplistic explanations about some of the operations, but I can't completly understand them all.

Does anyone have some tips about this code?
I attached the full example to this post.
Attachments
roto.zip
(4.92 KiB) Downloaded 238 times
Last edited by Pix3l on 15 Jan 2023, 22:02, edited 1 time in total.
http://www.pix3lworkshop.altervista.org/ - Your 8bit choice since 2006!
User avatar
Pix3l
 
Posts: 55
Joined: 10 Sep 2010, 21:00
Location: Italy

Re: [C] Understanding an interesting piece of code

Postby drummyfish » 15 Jan 2023, 13:03

You haven't picked easy code to understand :) It's and efficient oldschool code (judging from C89 and old SDL), the guy is using sin/cos lookup tables and fixed point math and some other hacks with pointers that help efficiency -- this is good but it'll be hard to understand and decypher, especially with French comments etc. I've been briefly trying to decypher it and got some idea how it works, but TBH I would just rewrite it if I were to do this myself, it would be faster than trying to understand it, the code is not extremely well documented. If you want to dig through it, take a look at affine transformations and transform matrices (he's not using matrices per se, but he's effectively implementing the same effect), take a look at bitwise operators and how they're used (e.g. & is used to mask out bits and can be used e.g. as fast modulo, bitwise shifts << and >> are used for fast multiplication by powers of two and utilized with fixed point math, | is used to set bits, i.e. combine different flags etc.). Sorry I can't be of more help, but you can perhaps find some help on my wiki (e.g. https://www.tastyfish.cz/lrs/fixed_point.html, https://www.tastyfish.cz/lrs/bit_hack.html etc.).
User avatar
drummyfish
 
Posts: 448
Joined: 29 Jul 2018, 20:30
Location: Moravia

Re: [C] Understanding an interesting piece of code

Postby Pix3l » 15 Jan 2023, 22:20

Thanks drummyfish for your reply :]

I forgot to include the link to the author's website in the previous post.
What I can tell you, is that, this example use a circle of 256 degrees, instead of 360 (for simplicity he says), and like you pointed, he precalculate all the cos and sin around.

Anyway, I agree with you that will be faster rewriting all of this code than trying to understand, but I was just curious.
The rotation and zooming effect applied to the image in this example, it's quite amazing for me to see it doesn't use some external libraries to achieve these effects.
I'd just like to understand better how this thing works, mainly for personal interest.

Anyway, many thanks for your time and your suggestions, your blog seems very interesting, I will take an eye on it soon :]
http://www.pix3lworkshop.altervista.org/ - Your 8bit choice since 2006!
User avatar
Pix3l
 
Posts: 55
Joined: 10 Sep 2010, 21:00
Location: Italy

Re: [C] Understanding an interesting piece of code

Postby drummyfish » 16 Jan 2023, 20:44

I like that attitude, most people would just do it with some library and not even care how it works. The principle is the same as mode 7 rendering, look that up for details.

Here I've written basically the same thing with my tiny C engine (https://codeberg.org/drummyfish/saf) -- maybe this will be better understandable; it also doesn't use any libraries for the transformation and it only uses fixed point. The code shows the idea without trying to be hardcore optimized because that obscures it (for speed you should e.g. write directly to frame buffer etc.).

Image

Here is the code (compile by putting saf.h in the directory of the source and do e.g. gcc -DSAF_PLATFORM_SDL2=1 tmp.c -lSDL2):

{l Code}: {l Select All Code}
#define SAF_PROGRAM_NAME "test"

#include "saf.h"

uint8_t image[1026] = {
0x20,0x20,0xb6,0x92,0x92,0xb6,0x4e,0x49,0xb6,0xb6,0x92,0x49,0x9a,0xb6,0x92,0x92,
0x92,0x92,0x92,0x92,0x49,0xb6,0x92,0x76,0x92,0x92,0x92,0x92,0x6d,0x6d,0x49,0x76,
0x92,0x92,0x92,0x92,0x92,0x51,0x4a,0x20,0xb6,0xf3,0x6d,0x49,0x92,0xb6,0x92,0x92,
0x92,0x92,0x92,0x92,0x49,0x92,0xb6,0x92,0x92,0xb6,0x92,0xb6,0x92,0x4d,0x6d,0x72,
0x76,0xb6,0x31,0x31,0x31,0x6d,0x49,0x49,0x72,0x92,0x49,0x49,0x6d,0x6d,0x92,0x6d,
0x92,0x6d,0x92,0x6d,0x28,0x49,0x49,0xb6,0x92,0x6d,0x51,0x6d,0x31,0x49,0x4d,0x49,
0x49,0x31,0x92,0x92,0x92,0x92,0x49,0x6d,0x6d,0x6d,0x44,0x44,0x6d,0x92,0x6d,0x4e,
0x49,0x49,0x69,0x44,0x49,0x49,0x09,0x49,0x25,0x31,0x49,0x49,0x00,0x6d,0x92,0x92,
0x92,0x92,0x76,0x92,0x92,0x92,0x49,0xdb,0xdb,0xdb,0xb6,0xdb,0x92,0x92,0xdb,0xdb,
0xb6,0xdb,0xb6,0xdb,0xdb,0xdb,0xdb,0xdb,0xb6,0xdb,0xb6,0xdb,0xdb,0x6d,0x92,0x92,
0x92,0xb6,0x92,0x92,0x92,0x92,0x6d,0xdb,0xb6,0xb6,0xb6,0x92,0x92,0xb6,0xb6,0x92,
0xb6,0x92,0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x6d,0x92,0x92,
0x92,0x36,0x36,0x92,0x76,0x92,0x6d,0xdb,0xb6,0x92,0x92,0x92,0x92,0xb6,0xb6,0x92,
0x92,0xb6,0x92,0xb6,0x92,0xb6,0x92,0xb6,0x92,0xb6,0x92,0xb6,0xb6,0x49,0xb6,0x92,
0x92,0x92,0x92,0x92,0x92,0x6d,0x6d,0xdb,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0x76,0xb6,0x76,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x51,0x92,0x92,
0x92,0x92,0x6d,0x92,0x92,0x6d,0x6d,0xdb,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0x9a,0x92,0xb6,0x92,0xb6,0xdb,0x6d,0x36,0x92,
0x92,0x92,0x49,0x49,0x49,0x24,0x6d,0xdb,0x92,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0x92,0xb6,0x92,0xb6,0x92,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0x51,0x92,0x2a,
0x25,0x31,0x6d,0x49,0x49,0x49,0x6d,0xdb,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x92,0x9a,0xb6,0xb6,0xb6,0x6d,0x49,0x6d,
0x92,0x6d,0xb6,0x92,0xb6,0x6d,0x6d,0xdb,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x6d,0x6d,0x92,
0x92,0xb6,0x92,0x92,0x92,0x92,0x6d,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0x31,0x6d,0x92,
0x92,0x92,0x92,0x92,0x92,0x6d,0x6d,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0x49,0x6d,0x92,
0x92,0x92,0x92,0xb6,0x92,0x4d,0x6d,0xdb,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0x6d,0x6d,0xb6,
0x92,0xb6,0x92,0xb6,0x92,0x6d,0x6d,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xdb,0x49,0x6d,0x92,
0x92,0xb6,0x92,0x92,0x92,0x05,0x6d,0xdb,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xdb,0x49,0x6d,0x92,
0x92,0x92,0xb6,0x92,0x92,0x00,0x6d,0xdb,0xb6,0x95,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xdb,0x6d,0x6d,0xb6,
0xb6,0x92,0x6d,0x6d,0x6d,0x24,0x6d,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0x6d,0x4d,0x6d,
0x6d,0x51,0x36,0x92,0x6d,0x4d,0x6d,0xdb,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x6d,0x24,0x6d,
0x6d,0x92,0x92,0x92,0x92,0x6d,0x6d,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x95,0xb6,0x92,0xb6,0x92,0x6d,0xb6,
0x92,0x92,0xb6,0x92,0xb6,0x6d,0x6d,0xdb,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x6d,0x49,0x92,
0x92,0x92,0x92,0xb6,0xb6,0x76,0x6d,0xdb,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0x92,0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0x92,0xb6,0x92,0xb6,0x6d,0x6d,0xb6,
0x92,0x92,0x76,0xb6,0x92,0x92,0x6d,0xb6,0xb6,0x92,0xb6,0xb6,0xb6,0xdb,0x92,0xb6,
0xb6,0xb6,0xb6,0x95,0xb6,0xb6,0xb6,0x92,0x92,0xb6,0x92,0xb6,0xb6,0x92,0x6d,0x92,
0xb6,0xb6,0x92,0x92,0x92,0x92,0x6d,0xdb,0xb6,0xb6,0xb6,0x92,0xb6,0xb1,0xb6,0xb6,
0xb6,0x92,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0x95,0xb6,0x92,0xb6,0x6d,0x6d,0x9a,
0x92,0xb6,0x92,0xb6,0xb6,0x6d,0x6d,0xdb,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,0xb6,
0xb6,0xb6,0xb6,0xb6,0xb6,0xdb,0xb6,0xb6,0xb6,0x92,0xb6,0xb6,0xdb,0x92,0x6d,0x92,
0xb6,0xb6,0x6d,0x6d,0x6d,0x31,0x4e,0x92,0x6d,0x92,0x6d,0x92,0x6d,0x92,0x92,0x6d,
0x92,0x4d,0x6d,0x6d,0x4d,0x92,0x4d,0x6d,0x92,0x6d,0x49,0x49,0x31,0x4d,0x6d,0x92,
0x92,0x51,0x92,0x92,0x92,0x92,0x49,0x4d,0x25,0x05,0x49,0x49,0x49,0x49,0x49,0x49,
0x25,0x6d,0x6d,0x49,0x49,0x44,0x31,0x6d,0x49,0x6d,0x6d,0x92,0x6d,0x92,0x49,0x31,
0x6d,0xb6,0xb6,0x92,0xb6,0xb6,0x49,0x6d,0x92,0x92,0x92,0x44,0x92,0x92,0x92,0x92,
0x92,0x92,0x92,0x92,0x49,0x92,0x92,0x92,0x92,0x92,0x92,0x92,0x92,0xb6,0x49,0x92,
0xb6,0xb6,0x76,0x76,0x92,0xb6,0x92,0x6d,0xb6,0xb6,0x92,0x49,0x92,0x92,0x6d,0x92,
0x92,0x92,0x92,0x6d,0x92,0x92,0x92,0x92,0x92,0x92,0x92,0x92,0x92,0xf3,0x49,0x92,
0x9a,0x92,0x92,0x92,0x92,0x92,0x92,0x89,0xb6,0xb6,0x92,0x28,0x92,0x92,0x92,0x6d,
0x92,0x92,0x92,0x6d,0x6d,0x92,0x92,0x92,0x92,0x6d,0x92,0x92,0x6d,0xb6,0x6d,0xb6,
0x92,0x93,0x92,0x36,0x92,0x92,0x92,0x49,0xb6,0xd2,0x92,0x49,0x92,0x92,0x92,0x92,
0x92,0x76,0x92,0x6d,0x6d,0xb6,0x92,0x92,0x92,0x92,0x92,0x92,0x6d,0xdb,0x49,0x92,
0x92,0x92};

void SAF_init(void)
{
}

uint8_t imagePixel(int x, int y)
{
  // wrap-around the coordinates so that they're inside the image
  return image[2 + (((unsigned int) x) % 32) + (((unsigned int) y) % 32) * 32];
}

#define FIXED_POINT_SCALE 512

void drawBackground(int centerX, int centerY, int scale, int rotation)
{
  int stepX = (scale * SAF_sin(rotation)) / 256, // traversal direction vector
      stepY = (scale * SAF_cos(rotation)) / 256;

  centerX *= FIXED_POINT_SCALE; // convert to fixed point
  centerY *= FIXED_POINT_SCALE;

  for (uint8_t y = 0; y < SAF_SCREEN_HEIGHT; ++y)
  {
    int pictureX = centerX, pictureY = centerY;

    for (uint8_t x = 0; x < SAF_SCREEN_WIDTH; ++x) // draw the line
    {
      SAF_drawPixel(x,y,
        imagePixel(pictureX / FIXED_POINT_SCALE,pictureY / FIXED_POINT_SCALE));

      pictureX += stepX;
      pictureY += stepY;
    }
 
    centerX += stepY; // step in 90 degree direction to the next line
    centerY -= stepX;
  }
}
 
int x = 0, y = 0, s = FIXED_POINT_SCALE * 2, r = 0;
int dx = 0, dy = 0, ds = 0, dr = 0;

uint8_t SAF_loop(void)
{
  SAF_clearScreen(255);

  drawBackground(x,y,s,r);

  x += dx;
  y += dy;
  s += ds;
  r += dr;

  if (SAF_frame() % 64 == 0)
  {
    dx = -2 + SAF_random() % 4;
    dy = -2 + SAF_random() % 4;
    dr = -2 + SAF_random() % 4;
    ds = -8 + SAF_random() % 16;
  }

  if (s < FIXED_POINT_SCALE)
    ds = 5;
  else if (s > FIXED_POINT_SCALE * 3)
    ds = -5;

  return 1;
}
User avatar
drummyfish
 
Posts: 448
Joined: 29 Jul 2018, 20:30
Location: Moravia

Re: [C] Understanding an interesting piece of code

Postby freem » 26 Jan 2023, 18:10

Let me translate the comments:

* "// Multiplicateur (Zoom)." -> "multiplier (zoom)" obviously.
* "// Val sin et cos (Roto)." -> "sin and cos value (roto)"
* "// Point de départ." -> "starting point"
* "// Déformation." -> hm... don't know english word for this... basically, modifying the shape. Transforming, except it's not scaling, rotating, nor translating which are technically transformations, too.
* "// Déplacement des rotations, zooms, scrolls..." -> "moving rotations, zooms, scrolls..."

I doubt this helps though. If there's questions about french stuff, I can certainly help anyway. As for trigonometry and old optimisation tricks, I have some knowledge and understanding, but overall my math level is not very high, and I usually need to focus more than a bit to do 3D transformations, as well as browsing again bookmarks and whatnot to remember what those things (dot products, for example) are used for.
Note that those old tricks were mostly useful in the past for several reasons:

* old CPUs didn't necessarily had a unit dedicated to floating point numbers
* they didn't even had instructions to run parallel instructions like we now have, that is, something like: " for( struct foo const* bar = start; bar != end; ++bar ){ bar.i++; bar.j++; bar.z++; }" can nowadays run all the for loop content in parallel, and it's even better when doing some loop unrolling (have to measure on a case by case though).
* there was obviously a single core
* and having a GPU was not the norm
* there were no SRAM caches in processors, they accessed DRAM modules directly (SRAM: static ram. Cost an eye, but is fast as hell. Used mostly for CPU caches: L1, L2, L3, etc. DRAM: dynamic ram, cost nothing, but is rather slow. This is what you have in those DDR RAM, obviously).

I said those tricks were useful in the past... they still are, but not much (not much, means they still are) on normal desktops or laptops. They can bring you benefits if you write code for micro-controllers though.

Nowadays, for laptops/desktops, a good way to get some speed is by taking care about your memory layout (avoiding cache misses can really help, and this actually means avoiding padding in structures, and using types as small as possible, not to save RAM, but to save bandwidth).
Of course, avoiding cosine, sine or square root calculations is still important, and you might want to avoid doing square roots when you want to compare distances. As always, reducing number of instructions used will lead to improved performances. Will this be worth it though, is a different problem. What can make optimisations not worth it is when it makes the code hard to maintain, while not giving much, if any, performance improvements because the time spent in a routine only represents 0.01% of total time.

It's quite interesting how many people seem to not care much about structures padding or size, and then complains about bad performances...
Having highly specialized structures would allow for both (usually) easier to maintain and faster code.
Easier to maintain, because no need to worry what all those members are for, so you only bother about less stuff, and faster because this means members are close in memory, allowing CPU to store more of them in memory lines (which are usually 64 bytes only, mind you... on modern systems, it means only 8 pointers, so I think it's quite fun to use pointers to address elements in arrays that won't have more than 64K elements: an unsigned short index would save 75% size here... but I'm out of topic now I think?).
Close data in memory also allows compilers to use sse instruction set, sometimes, which is about doing things in parallel. And people sometimes forcefully use SSE instructions, because no, compilers don't know how to optimise everything (really, loop unrolling can notably give pretty impressive perf boosts, but again: measure before and after, so that you can know if it's worth it).
freem
BN Moderator
 
Posts: 106
Joined: 18 May 2015, 19:38

Re: [C] Understanding an interesting piece of code

Postby drummyfish » 26 Jan 2023, 21:04

True, though if you write for microcontrollers things like cache friendliness may not matter because simple MCUs often don't have memory caches -- in the end it all depends on the platform. My approach is to always optimize for the weakest platform I want to support -- if I make it run fast on a weak embedded device, it will definitely run fast on desktop, even if the code isn't optimimal for desktop. On my wiki I keep a list and quick-how-to on optimization I have learned so far: https://www.tastyfish.cz/lrs/optimization.html.
User avatar
drummyfish
 
Posts: 448
Joined: 29 Jul 2018, 20:30
Location: Moravia

Who is online

Users browsing this forum: No registered users and 1 guest