First of all, the PS1 version of this:

Code: Select all

```
GP0 = (0x20<<24) | 0xFFFF00);
GP0 = (50<<16)|(120 & 0xFFFF));
GP0 = (130<<16)|(240 & 0xFFFF));
GP0 = (210<<16)|(100 & 0xFFFF));
```

Anyway, enough about the PS1, time for the N64.

A lot of code for the N64 just uses the Nintendo microcode. I have no idea how their interface works, though, so I thought I'd learn how the RDP works.

There are 3 things you'll need.

- RDP_COMMANDS.pdf
- Nintendo_Ultra64_Programming_Manual_and_Addendums.pdf - namely page 196, absolutely mandatory if you aren't using Fill mode
- The important note that Sync Load is 0x26 and NOT 0x31. I don't know if we actually need this note just yet.

Here we are using 320x240 32bpp RGBA, because, well, that's what I've been using.

We are also going to be banging stuff into our RDP buffer one 32-bit word at a time.

I'm also assuming you know how to bang commands into the RDP. When doing it from the main CPU, you need to ensure that it's all 8-byte-aligned, and don't forget to flush!

...or just use the 0xA uncached range like I do because I'm not sure how you flush the cache on this thing just yet.

FLAT-SHADED TRIANGLE

Here's the setup:

- Set Color Image to where you want it, width field 320-1, 32bpp, RGBA
- Set Scissor to (0,0) -> (320,240)
- Set Other Modes: Use "Fill" mode, set bits 35:32 to 0xF (it's just a requirement), everything else should be 0
- Set Fill Color to the colour you want in 0xRRGGBBAA format

Now, let's get some points!

Code: Select all

```
int32_t x1 = 120, y1 = 50;
int32_t x2 = 240, y2 = 130;
int32_t x3 = 100, y3 = 210;
```

You will need to order the points from top to bottom - that is, y1 <= y2 <= y3. I am guessing that your function already does this.

So what's required in order to draw a triangle? Well... page 15 of RDP_COMMANDS.pdf will help you here.

Basically, you have a major line, and two minor lines, and they split at a point.

So let's take a look - all X and DX values are s.15.16 fixed point, all Y values are s.11.2 fixed point, and that side flag is 1 bit.

- YH: The Y position where the triangle starts. (Simple enough, it's Y1.)
- YM: The Y position where the second minor line starts. (It's Y2! This is looking good.)
- YL: The Y position where the triangle ends. minor line starts. (Y3. Piece of cake.)
- XH: The X position where the major line starts. (That's just X1. Easy.)
- XL: The X position where the second minor line starts. (X2. Pretty straightforward)
- XM: The X position where the first minor line starts. (OK, a bit weird, but it's Y1 again. Wait, where's X3?)
- DXHDY: The change in X per Y along the major line. (Uh oh.)
- DXMDY: The change in X per Y along the first minor line. (Erm...)
- DXLDY: The change in X per Y along the second minor line. (...help?)
- Finally, which side the minor split point is on. You can either obtain this from the gradients, or by actually calculating the X of the major line at Y2. The former is easier, but the latter will be beneficial for gouraud shading. (AAAAAAAAAAAAAAAAAAAAAAAAAAAA)

Code: Select all

```
int32_t xgrad12 = ((x2-x1)<<16)/(y2-y1); // DXMDY
int32_t xgrad13 = ((x3-x1)<<16)/(y3-y1); // DXHDY
int32_t xgrad23 = ((x3-x2)<<16)/(y3-y2); // DXLDY
```

The Y division can be done using a LUT and a multiply if you prefer to go that way... and if you do go that way you can cut out the left shift too.

Anyway, the last thing we need to work out is whether we should put the minor on the left or the right. If it goes on the left, LFT=1. If it goes on the right, LFT=0.

We can just compare gradients for this.

And here's how you add a triangle!

Code: Select all

```
rdp_push(
(0x08<<24) | (xgrad13 < xgrad12 ? (1<<23) : 0) | ((y3<<2)&0x7FF),
(((y2<<2)&0x7FF)<<16) | ((y1<<2)&0x7FF));
rdp_push(x2<<16, xgrad23);
rdp_push(x1<<16, xgrad13);
rdp_push(x1<<16, xgrad12);
```

Oh wait, it did.

Anyway, everything should be really easy now.

GOURAUD-SHADED TRIANGLE

Here's the setup:

- Set Color Image to where you want it, width field 320-1, 32bpp, RGBA
- Set Scissor to (0,0) -> (320,240)
- Set Other Modes: Use "1-cycle" mode, set bits 35:32 to 0xF (it's just a requirement), everything else should be 0
- Set Combine Mode: RGB should be sub_A=8, sub_B=8, mul=16, add=4. Alpha can be anything but might as well use sub_A=7, sub_B=7, mul=7, add=6. Set cycle0 and cycle1 values to be the same as we are in 1-cycle mode.

A note on combine mode. Page 196 of that Ultra64 manual explains what each number means (page 198 of the actual PDF, but page 196 on the printed document). To actually set the mode it's a nuisance as it's all a jumbled mess but here's some code that'll get you nicely set up for 1-cycle mode:

Code: Select all

```
int cmb_RA = 8 & 15;
int cmb_RB = 8 & 15;
int cmb_RC = 16 & 31;
int cmb_RD = 4 & 7;
int cmb_LA = 7 & 7;
int cmb_LB = 7 & 7;
int cmb_LC = 7 & 7;
int cmb_LD = 7 & 7;
uint32_t cmb_0 = (0x3C<<24);
uint32_t cmb_1 = 0;
cmb_0 |= (cmb_RA<<20) | (cmb_RA<< 5);
cmb_0 |= (cmb_LA<<12) | (cmb_LC<< 9);
cmb_0 |= (cmb_RC<<15) | (cmb_RC<< 0);
cmb_1 |= (cmb_LA<<21) | (cmb_LC<<18);
cmb_1 |= (cmb_RB<<28) | (cmb_RB<<24);
cmb_1 |= (cmb_RD<<15) | (cmb_RD<< 6);
cmb_1 |= (cmb_LB<<12) | (cmb_LB<< 3);
cmb_1 |= (cmb_LD<< 9) | (cmb_LD<< 0);
rdp_push(cmb_0, cmb_1);
```

Let's get some colours!

Code: Select all

```
int32_t cr1 = 0x00000000;
int32_t cg1 = 0x00000000;
int32_t cb1 = 0x00FF0000;
int32_t ca1 = 0x00000000;
int32_t cr2 = 0x00000000;
int32_t cg2 = 0x00FF0000;
int32_t cb2 = 0x00000000;
int32_t ca2 = 0x00000000;
int32_t cr3 = 0x00FF0000;
int32_t cg3 = 0x00000000;
int32_t cb3 = 0x00000000;
int32_t ca3 = 0x00000000;
```

Anyway, they're all in s.15.16 format. I think. The fractional part is 16 bits anyway, and you do want to end up with 0xFF and not 0x1 for the integer part.

There are 3 different gradients per colour we have here.

- DRDX: Change of R per X
- DRDY: Change of R per Y
- DRDE: Change of R along major edge

Code: Select all

```
int32_t crde = (cr3-cr1)/(y3-y1);
int32_t cgde = (cg3-cg1)/(y3-y1);
int32_t cbde = (cb3-cb1)/(y3-y1);
int32_t cade = (ca3-ca1)/(y3-y1);
```

Well you just need R when we reach Y2 on the major line at Y2! .. wait, how do you get that?

Well you just need the X on the major line at Y2! .. dammit, will this ever end?

...as a matter of fact, yes! Here's how I calculate it:

Code: Select all

```
int32_t xmajm = (x1<<16)+(y2-y1)*xgrad13;
int32_t xmajmp = xmajm>>16;
```

Next up, we need to get R:

Code: Select all

```
int32_t crmajm = cr1+((cr3-cr1)*(y2-y1))/(y3-y1);
int32_t cgmajm = cg1+((cg3-cg1)*(y2-y1))/(y3-y1);
int32_t cbmajm = cb1+((cb3-cb1)*(y2-y1))/(y3-y1);
int32_t camajm = ca1+((ca3-ca1)*(y2-y1))/(y3-y1);
```

Code: Select all

```
int32_t crmajm = cr1+crde*(y2-y1);
int32_t cgmajm = cg1+cgde*(y2-y1);
int32_t cbmajm = cb1+cbde*(y2-y1);
int32_t camajm = ca1+cade*(y2-y1);
```

This then brings us to DRDX, where we need to slide from whatever's on the major line to whatever should be at X2,Y2:

Code: Select all

```
int32_t crdx = (cr2-crmajm)/(x2-xmajmp);
int32_t cgdx = (cg2-cgmajm)/(x2-xmajmp);
int32_t cbdx = (cb2-cbmajm)/(x2-xmajmp);
int32_t cadx = (ca2-camajm)/(x2-xmajmp);
```

Well, you seem to get subtle distortion if you get it wrong. At least as far as I know. I keep thinking it's something to do with the scissor test. But you should always calculate this one properly anyway.

Here's what I have:

Code: Select all

```
int32_t crdy = crde + crdx*(x3-x1)/(y3-y1);
int32_t cgdy = cgde + cgdx*(x3-x1)/(y3-y1);
int32_t cbdy = cbde + cbdx*(x3-x1)/(y3-y1);
int32_t cady = cade + cadx*(x3-x1)/(y3-y1);
```

Anyway, let's build this triangle!

For the edge coefficients, the only thing different here is we use 0x0C instead of 0x08:

Code: Select all

```
rdp_push(
(0x0C<<24) | (xgrad13 < xgrad12 ? (1<<23) : 0) | ((y3<<2)&0x7FF),
(((y2<<2)&0x7FF)<<16) | ((y1<<2)&0x7FF));
rdp_push(x2<<16, xgrad23);
rdp_push(x1<<16, xgrad13);
rdp_push(x1<<16, xgrad12);
```

Here's pretty much how I do it:

Code: Select all

```
rdp_push((cr1 &0xFFFF0000)|(((uint32_t)cg1 )>>16), (cb1 &0xFFFF0000)|(((uint32_t)ca1 )>>16));
rdp_push((crdx&0xFFFF0000)|(((uint32_t)cgdx)>>16), (cbdx&0xFFFF0000)|(((uint32_t)cadx)>>16));
rdp_push((cr1 <<16)|(cg1 &0x0000FFFF), (cb1 <<16)|(ca1 &0x0000FFFF));
rdp_push((crdx<<16)|(cgdx&0x0000FFFF), (cbdx<<16)|(cadx&0x0000FFFF));
rdp_push((crde&0xFFFF0000)|(((uint32_t)cgde)>>16), (cbde&0xFFFF0000)|(((uint32_t)cade)>>16));
rdp_push((crdy&0xFFFF0000)|(((uint32_t)cgdy)>>16), (cbdy&0xFFFF0000)|(((uint32_t)cady)>>16));
rdp_push((crde<<16)|(cgde&0x0000FFFF), (cbde<<16)|(cade&0x0000FFFF));
rdp_push((crdy<<16)|(cgdy&0x0000FFFF), (cbdy<<16)|(cady&0x0000FFFF));
```

Don't ask me why SGI interleaved the values here, because I don't know, and I also don't see how it would even be faster.

Anyway, that's about as far as I am through this mess. But if you've been observant, you may notice that you can do a gradient rectangle. It looks a bit like this:

Code: Select all

```
rdp_push((0x0C<<24) | (1<<23) | (0<<19) | (0<<16) | ((239<<2)<<0), ((239<<2)<<16) | ((0<<2)<<0));
rdp_push(319<<16, 0); // XL (lower)
rdp_push(0<<16, 0); // XH (major)
rdp_push(319<<16, 0); // XM (upper)
rdp_push(0x00000000, 0x00000000);
rdp_push(0x00000000, 0x00000000);
rdp_push(0x80008000, 0x80008000);
rdp_push(0x00000000, 0xCC000000);
rdp_push(0x00000001, 0x00000000);
rdp_push(0x00000001, 0x00000000);
rdp_push(0x00001000, 0x00000000);
rdp_push(0x00001000, 0x00000000);
```