How To Triangle

Discuss RCP-related matter here.
Post Reply
User avatar
GreaseMonkey
Posts: 2
Joined: Fri Jul 15, 2016 4:33 am

How To Triangle

Post by GreaseMonkey » Fri Jul 15, 2016 8:02 pm

Here's a quick explanation.

First of all, the PS1 version of this:

Code: Select all

GP0 = (0x20<<24) | 0xFFFF00);
GP0 = (50<<16)|(120 & 0xFFFF));
GP0 = (130<<16)|(240 & 0xFFFF));
GP0 = (210<<16)|(100 & 0xFFFF));
Got that? Good. Now you can make 2D games that have triangles in them. But if you want to make 3D stuff, you'll have to know a bit more math than that.

Anyway, enough about the PS1, time for the N64.
A lot of code for the N64 just uses the Nintendo microcode. I have no idea how their interface works, though, so I thought I'd learn how the RDP works.

There are 3 things you'll need.
  • RDP_COMMANDS.pdf
  • Nintendo_Ultra64_Programming_Manual_and_Addendums.pdf - namely page 196, absolutely mandatory if you aren't using Fill mode
  • The important note that Sync Load is 0x26 and NOT 0x31. I don't know if we actually need this note just yet.
Unfortunately I cannot find a definition for a left-major triangle as when I start to look things up it ends up talking about the anatomy of a human body. So for all intents and purposes, when it says left-major, it means "put all the minor lines to the left of the major".

Here we are using 320x240 32bpp RGBA, because, well, that's what I've been using.
We are also going to be banging stuff into our RDP buffer one 32-bit word at a time.

I'm also assuming you know how to bang commands into the RDP. When doing it from the main CPU, you need to ensure that it's all 8-byte-aligned, and don't forget to flush!
...or just use the 0xA uncached range like I do because I'm not sure how you flush the cache on this thing just yet.

FLAT-SHADED TRIANGLE

Here's the setup:
  • Set Color Image to where you want it, width field 320-1, 32bpp, RGBA
  • Set Scissor to (0,0) -> (320,240)
  • Set Other Modes: Use "Fill" mode, set bits 35:32 to 0xF (it's just a requirement), everything else should be 0
  • Set Fill Color to the colour you want in 0xRRGGBBAA format
Use 0x08 for your triangle mode, and only use the edge coefficients.

Now, let's get some points!

Code: Select all

int32_t x1 = 120, y1 = 50;
int32_t x2 = 240, y2 = 130;
int32_t x3 = 100, y3 = 210;
...yeah, those'll do.

You will need to order the points from top to bottom - that is, y1 <= y2 <= y3. I am guessing that your function already does this.

So what's required in order to draw a triangle? Well... page 15 of RDP_COMMANDS.pdf will help you here.
Basically, you have a major line, and two minor lines, and they split at a point.

So let's take a look - all X and DX values are s.15.16 fixed point, all Y values are s.11.2 fixed point, and that side flag is 1 bit.
  • YH: The Y position where the triangle starts. (Simple enough, it's Y1.)
  • YM: The Y position where the second minor line starts. (It's Y2! This is looking good.)
  • YL: The Y position where the triangle ends. minor line starts. (Y3. Piece of cake.)
  • XH: The X position where the major line starts. (That's just X1. Easy.)
  • XL: The X position where the second minor line starts. (X2. Pretty straightforward)
  • XM: The X position where the first minor line starts. (OK, a bit weird, but it's Y1 again. Wait, where's X3?)
  • DXHDY: The change in X per Y along the major line. (Uh oh.)
  • DXMDY: The change in X per Y along the first minor line. (Erm...)
  • DXLDY: The change in X per Y along the second minor line. (...help?)
  • Finally, which side the minor split point is on. You can either obtain this from the gradients, or by actually calculating the X of the major line at Y2. The former is easier, but the latter will be beneficial for gouraud shading. (AAAAAAAAAAAAAAAAAAAAAAAAAAAA)
And now I'm going to make this look easy.

Code: Select all

int32_t xgrad12 = ((x2-x1)<<16)/(y2-y1); // DXMDY
int32_t xgrad13 = ((x3-x1)<<16)/(y3-y1); // DXHDY
int32_t xgrad23 = ((x3-x2)<<16)/(y3-y2); // DXLDY
By all means feel free to use better rounding.

The Y division can be done using a LUT and a multiply if you prefer to go that way... and if you do go that way you can cut out the left shift too.

Anyway, the last thing we need to work out is whether we should put the minor on the left or the right. If it goes on the left, LFT=1. If it goes on the right, LFT=0.

We can just compare gradients for this.

And here's how you add a triangle!

Code: Select all

rdp_push(
  (0x08<<24) | (xgrad13 < xgrad12 ? (1<<23) : 0) | ((y3<<2)&0x7FF),
  (((y2<<2)&0x7FF)<<16) | ((y1<<2)&0x7FF));
rdp_push(x2<<16, xgrad23);
rdp_push(x1<<16, xgrad13);
rdp_push(x1<<16, xgrad12);
See? That didn't require any gradient calculations or other fancy linear algebra tricks, did it?

Oh wait, it did.

Anyway, everything should be really easy now.

GOURAUD-SHADED TRIANGLE

Here's the setup:
  • Set Color Image to where you want it, width field 320-1, 32bpp, RGBA
  • Set Scissor to (0,0) -> (320,240)
  • Set Other Modes: Use "1-cycle" mode, set bits 35:32 to 0xF (it's just a requirement), everything else should be 0
  • Set Combine Mode: RGB should be sub_A=8, sub_B=8, mul=16, add=4. Alpha can be anything but might as well use sub_A=7, sub_B=7, mul=7, add=6. Set cycle0 and cycle1 values to be the same as we are in 1-cycle mode.
Use 0x0C for your triangle mode, and use the edge coefficients followed by the shade coefficients.

A note on combine mode. Page 196 of that Ultra64 manual explains what each number means (page 198 of the actual PDF, but page 196 on the printed document). To actually set the mode it's a nuisance as it's all a jumbled mess but here's some code that'll get you nicely set up for 1-cycle mode:

Code: Select all

int cmb_RA = 8 & 15;
int cmb_RB = 8 & 15;
int cmb_RC = 16 & 31;
int cmb_RD = 4 & 7;
int cmb_LA = 7 & 7;
int cmb_LB = 7 & 7;
int cmb_LC = 7 & 7;
int cmb_LD = 7 & 7;
uint32_t cmb_0 = (0x3C<<24);
uint32_t cmb_1 = 0;
cmb_0 |= (cmb_RA<<20) | (cmb_RA<< 5);
cmb_0 |= (cmb_LA<<12) | (cmb_LC<< 9);
cmb_0 |= (cmb_RC<<15) | (cmb_RC<< 0);
cmb_1 |= (cmb_LA<<21) | (cmb_LC<<18);
cmb_1 |= (cmb_RB<<28) | (cmb_RB<<24);
cmb_1 |= (cmb_RD<<15) | (cmb_RD<< 6);
cmb_1 |= (cmb_LB<<12) | (cmb_LB<< 3);
cmb_1 |= (cmb_LD<< 9) | (cmb_LD<< 0);
rdp_push(cmb_0, cmb_1);
Also the mode we're using, in case you're wondering, is (0 - 0) * 0 + shade for RGB, and (0 - 0) * 0 + 1 for Alpha.

Let's get some colours!

Code: Select all

int32_t cr1 = 0x00000000;
int32_t cg1 = 0x00000000;
int32_t cb1 = 0x00FF0000;
int32_t ca1 = 0x00000000;

int32_t cr2 = 0x00000000;
int32_t cg2 = 0x00FF0000;
int32_t cb2 = 0x00000000;
int32_t ca2 = 0x00000000;

int32_t cr3 = 0x00FF0000;
int32_t cg3 = 0x00000000;
int32_t cb3 = 0x00000000;
int32_t ca3 = 0x00000000;
... yeah OK we're calculating the alpha here but it's only for completeness.

Anyway, they're all in s.15.16 format. I think. The fractional part is 16 bits anyway, and you do want to end up with 0xFF and not 0x1 for the integer part.

There are 3 different gradients per colour we have here.
  • DRDX: Change of R per X
  • DRDY: Change of R per Y
  • DRDE: Change of R along major edge
...yeah there's a bit of redundancy. Oddly enough, the easiest thing to calculate here is DRDE:

Code: Select all

int32_t crde = (cr3-cr1)/(y3-y1);
int32_t cgde = (cg3-cg1)/(y3-y1);
int32_t cbde = (cb3-cb1)/(y3-y1);
int32_t cade = (ca3-ca1)/(y3-y1);
DRDY can just be calculated in terms of DRDE and DRDX... wait, where's DRDX?

Well you just need R when we reach Y2 on the major line at Y2! .. wait, how do you get that?

Well you just need the X on the major line at Y2! .. dammit, will this ever end?

...as a matter of fact, yes! Here's how I calculate it:

Code: Select all

int32_t xmajm = (x1<<16)+(y2-y1)*xgrad13;
int32_t xmajmp = xmajm>>16;
xmajm is in s.15.16 format. But we want the X in pixels in an integer form at the moment so xmajmp gets us that.

Next up, we need to get R:

Code: Select all

int32_t crmajm = cr1+((cr3-cr1)*(y2-y1))/(y3-y1);
int32_t cgmajm = cg1+((cg3-cg1)*(y2-y1))/(y3-y1);
int32_t cbmajm = cb1+((cb3-cb1)*(y2-y1))/(y3-y1);
int32_t camajm = ca1+((ca3-ca1)*(y2-y1))/(y3-y1);
... wait that's awful, let's tidy that one up...

Code: Select all

int32_t crmajm = cr1+crde*(y2-y1);
int32_t cgmajm = cg1+cgde*(y2-y1);
int32_t cbmajm = cb1+cbde*(y2-y1);
int32_t camajm = ca1+cade*(y2-y1);
And my sincere apologies to anyone who had to try and decipher the earlier version.

This then brings us to DRDX, where we need to slide from whatever's on the major line to whatever should be at X2,Y2:

Code: Select all

int32_t crdx = (cr2-crmajm)/(x2-xmajmp);
int32_t cgdx = (cg2-cgmajm)/(x2-xmajmp);
int32_t cbdx = (cb2-cbmajm)/(x2-xmajmp);
int32_t cadx = (ca2-camajm)/(x2-xmajmp);
And finally, DRDY. Which is more like DRD-why. Well why?
Well, you seem to get subtle distortion if you get it wrong. At least as far as I know. I keep thinking it's something to do with the scissor test. But you should always calculate this one properly anyway.

Here's what I have:

Code: Select all

int32_t crdy = crde + crdx*(x3-x1)/(y3-y1);
int32_t cgdy = cgde + cgdx*(x3-x1)/(y3-y1);
int32_t cbdy = cbde + cbdx*(x3-x1)/(y3-y1);
int32_t cady = cade + cadx*(x3-x1)/(y3-y1);
I haven't confirmed it as correct, but it seems to work.

Anyway, let's build this triangle!

For the edge coefficients, the only thing different here is we use 0x0C instead of 0x08:

Code: Select all

rdp_push(
  (0x0C<<24) | (xgrad13 < xgrad12 ? (1<<23) : 0) | ((y3<<2)&0x7FF),
  (((y2<<2)&0x7FF)<<16) | ((y1<<2)&0x7FF));
rdp_push(x2<<16, xgrad23);
rdp_push(x1<<16, xgrad13);
rdp_push(x1<<16, xgrad12);
As for the shade coefficients... they like to interleave these. Basically, you have the top halves of 4 numbers, then the bottom halves, then this repeats itself exactly once.

Here's pretty much how I do it:

Code: Select all

rdp_push((cr1 &0xFFFF0000)|(((uint32_t)cg1 )>>16), (cb1 &0xFFFF0000)|(((uint32_t)ca1 )>>16));
rdp_push((crdx&0xFFFF0000)|(((uint32_t)cgdx)>>16), (cbdx&0xFFFF0000)|(((uint32_t)cadx)>>16));
rdp_push((cr1 <<16)|(cg1 &0x0000FFFF), (cb1 <<16)|(ca1 &0x0000FFFF));
rdp_push((crdx<<16)|(cgdx&0x0000FFFF), (cbdx<<16)|(cadx&0x0000FFFF));

rdp_push((crde&0xFFFF0000)|(((uint32_t)cgde)>>16), (cbde&0xFFFF0000)|(((uint32_t)cade)>>16));
rdp_push((crdy&0xFFFF0000)|(((uint32_t)cgdy)>>16), (cbdy&0xFFFF0000)|(((uint32_t)cady)>>16));
rdp_push((crde<<16)|(cgde&0x0000FFFF), (cbde<<16)|(cade&0x0000FFFF));
rdp_push((crdy<<16)|(cgdy&0x0000FFFF), (cbdy<<16)|(cady&0x0000FFFF));
The uint32_t typecast is to ensure that it does a logical shift, as if it does an arithmetic shift you'll trash a whole field.
Don't ask me why SGI interleaved the values here, because I don't know, and I also don't see how it would even be faster.

Anyway, that's about as far as I am through this mess. But if you've been observant, you may notice that you can do a gradient rectangle. It looks a bit like this:

Code: Select all

rdp_push((0x0C<<24) | (1<<23) | (0<<19) | (0<<16) | ((239<<2)<<0), ((239<<2)<<16) | ((0<<2)<<0));
rdp_push(319<<16, 0); // XL (lower)
rdp_push(0<<16, 0); // XH (major)
rdp_push(319<<16, 0); // XM (upper)

rdp_push(0x00000000, 0x00000000);
rdp_push(0x00000000, 0x00000000);
rdp_push(0x80008000, 0x80008000);
rdp_push(0x00000000, 0xCC000000);
rdp_push(0x00000001, 0x00000000);
rdp_push(0x00000001, 0x00000000);
rdp_push(0x00001000, 0x00000000);
rdp_push(0x00001000, 0x00000000);
Special thanks to MarathonMan for making CEN64, marshallh for confirming that my tests actually work on real hardware (even though this specific version of the code isn't tested... but I *did* send a working gouraud triangle!), and angrylion for the RDP emulator that MarathonMan didn't have to write.

User avatar
MarathonMan
Site Admin
Posts: 691
Joined: Fri Oct 04, 2013 4:49 pm

Re: How To Triangle

Post by MarathonMan » Sat Jul 16, 2016 12:04 pm

This is probably the most through, detailed, and open documentation on how to render triangles with the RDP. Cheers. :D

User avatar
izy
Posts: 25
Joined: Tue Jun 02, 2015 11:34 am

How not To Mistake

Post by izy » Sun Jul 17, 2016 10:14 am

Let me ruin the party :D There is a missing link. You turned this:

Code: Select all

    int32_t crmajm = cr1+(cr3-cr1)*(y2-y1)/(y3-y1);
    int32_t cgmajm = cg1+(cg3-cg1)*(y2-y1)/(y3-y1);
    int32_t cbmajm = cb1+(cb3-cb1)*(y2-y1)/(y3-y1);
    int32_t camajm = ca1+(ca3-ca1)*(y2-y1)/(y3-y1);
into this:

Code: Select all

    int32_t crde = (cr3-cr1)/(y3-y1);
    int32_t cgde = (cg3-cg1)/(y3-y1);
    int32_t cbde = (cb3-cb1)/(y3-y1);
    int32_t cade = (ca3-ca1)/(y3-y1);
    int32_t crmajm = cr1+crde*(y2-y1); // cr1+(cr3-cr1)/(y3-y1)*(y2-y1);
    int32_t cgmajm = cg1+cgde*(y2-y1); // cg1+(cg3-cg1)/(y3-y1)*(y2-y1);
    int32_t cbmajm = cb1+cbde*(y2-y1); // cb1+(cb3-cb1)/(y3-y1)*(y2-y1);
    int32_t camajm = ca1+cade*(y2-y1); // ca1+(ca3-ca1)/(y3-y1)*(y2-y1);
In C integer multiplications will usually keep the same bit-width of the input operands causing a division(mod) to be performed (a*b mod 2^max(bits a, bits b)) and integer divisions will always lose information to begin with. That is often the desired behavior and i actually don't know what are the possible values that can be assumed by cr3, cr1, y3, y1, y2, y1 etc.... but it doesn't seem to me that this could be the case (whichever of the two version is the right one, they ain't the same thing).
K=cr1+(cr3-cr1)*(y2-y1)/(y3-y1);
Z=cr1+(cr3-cr1)/(y3-y1)*(y2-y1);
K=Q+A*B/C;
Z=Q+A/C*B;
Assume that for some reasons you get the following Q, A, B and C. Q=0, A=129, B=4, C=4 (and assume that they are unsigned 8-bits numbers)
K=10000001b*4/4
_K=00000100b/4 // the high part is lost ¹
__K=00000001b
___K=1
Z=10000001b/4*4
_Z=00100000b*4 // the low part gets truncated ¹
__Z=10000000b
___Z=128

¹ and it won't come back

Integer and floating-points multiplications, divisions and shifts (again multiplications and divisions by b^N where b is the base of the numeral system) cannot be usually exchanged at all. Although maybe there is no problem in some cases, i don't know what are the possible ranges of all inputs here.
The usual way to solve the issue is to perform first the multiplications (with a type cast) and later the divisions.
OK=(uint16_t)10000001b*4/4
OK=0000000010000001b*4/4
OK=0000001000000100b/4
OK=0000000010000001b
OK=10000001b // 8-bit truncation ok
OK=129

Just in case, instructions such as pmuldq+_mm_mul_epi32 (SSE4.1) for signed 32-bits ints and pmuludq+_mm_mul_epu32 (SSE2) for unsigned 32-bits ints do the conversion automatically (they discard the high 32-bits part of the two 64-bits inputs).

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void)
{
    uint32_t A=2147483649u, B=4, C=4, K, Z, OK;
    K=A*B/C;
    Z=A/C*B;
    printf("K = %u\n", K);
    printf("Z = %u\n", Z);
    OK=(uint64_t)A*B/C;
    printf("O = %u ok\n", OK);
    return 0;
}
gcc main.c && ./a.out 
K = 1
Z = 2147483648
O = 2147483649 ok
.... after turning the uint*_t into signed int*_t
K = 1
Z = -2147483644
O = -2147483647 ok

User avatar
GreaseMonkey
Posts: 2
Joined: Fri Jul 15, 2016 4:33 am

Re: How not To Mistake

Post by GreaseMonkey » Sun Jul 17, 2016 6:13 pm

Thanks for the pointers, izy.

Here's something I will definitely point out though:
izy wrote:You turned this:

Code: Select all

    int32_t crmajm = cr1+(cr3-cr1)*(y2-y1)/(y3-y1);
    int32_t cgmajm = cg1+(cg3-cg1)*(y2-y1)/(y3-y1);
    int32_t cbmajm = cb1+(cb3-cb1)*(y2-y1)/(y3-y1);
    int32_t camajm = ca1+(ca3-ca1)*(y2-y1)/(y3-y1);
into this:

Code: Select all

    int32_t crde = (cr3-cr1)/(y3-y1);
    int32_t cgde = (cg3-cg1)/(y3-y1);
    int32_t cbde = (cb3-cb1)/(y3-y1);
    int32_t cade = (ca3-ca1)/(y3-y1);
    int32_t crmajm = cr1+crde*(y2-y1); // cr1+(cr3-cr1)/(y3-y1)*(y2-y1);
    int32_t cgmajm = cg1+cgde*(y2-y1); // cg1+(cg3-cg1)/(y3-y1)*(y2-y1);
    int32_t cbmajm = cb1+cbde*(y2-y1); // cb1+(cb3-cb1)/(y3-y1)*(y2-y1);
    int32_t camajm = ca1+cade*(y2-y1); // ca1+(ca3-ca1)/(y3-y1)*(y2-y1);
The latter code is actually better for this, because you get what the colour *really* is at that specific point. The RDP cannot magically guess what c[rgba]3 is exactly.

There's a closely related issue on the GBA which very blatantly crops up when implementing the "Mode 7" effect, it's explained in 20.4 and 20.5 here: http://www.coranac.com/tonc/text/mode7.htm
Basically, if your calculations are more accurate than the hardware in the wrong places, you can actually end up with worse results.

If you want something more accurate you'd want to try something like this:

Code: Select all

int32_t crde = (2*(cr3-cr1)+(y3-y1))/(2*(y3-y1));
int32_t crdx = (2*(cr2-crmajm)+x2-xmajmp)/(2*(x2-xmajmp));
c[rgba]dy may still benefit from accuracy improvements though. I'm just doing it the "easy way" here, because I'm not actually sure when it's used and it doesn't appear to be used in the case where the whole triangle is on the screen - I think it really only gets used when the major line is clipped. If I'm right about that, then the most accurate c[rgba]dy values should be calculated at the intersection with the major line and the appropriate side of the screen.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: How To Triangle

Post by Narann » Mon Jul 25, 2016 9:07 am

Thanks, those are also very useful and very well commented.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest