Optimizing the RDP
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Optimizing the RDP
AIO has made some excellent points about optimizing the RDP. I have taken some of his algorithms and my own approaches and gotten some performance boosts that are starting to become quite convincing.
With a single-threaded build, the speed up over master is about 2VI/s on my desktop at the moment. You can download a build with the RDP optimizations here:
http://downloads.cen64.com/cen64-linux6 ... mental_rdp
http://downloads.cen64.com/cen64-linux6 ... mental_rdp
http://downloads.cen64.com/cen64-win64- ... al_rdp.exe
http://downloads.cen64.com/cen64-win64- ... al_rdp.exe
There is no SSE2 or SSSE3 support with optimizations right now. I may look into this for the future.
With a single-threaded build, the speed up over master is about 2VI/s on my desktop at the moment. You can download a build with the RDP optimizations here:
http://downloads.cen64.com/cen64-linux6 ... mental_rdp
http://downloads.cen64.com/cen64-linux6 ... mental_rdp
http://downloads.cen64.com/cen64-win64- ... al_rdp.exe
http://downloads.cen64.com/cen64-win64- ... al_rdp.exe
There is no SSE2 or SSSE3 support with optimizations right now. I may look into this for the future.
- Attachments
-
- angrylion-rdp_2.png (43.56 KiB) Viewed 33605 times
Re: Optimizing the RDP
That's great progress, thanks for sharing! 

- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
This is exciting! Do you think these optimizations in the end will yield enough boost so that games can be run at 60 VI/s with the single-threaded build?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
Re: Optimizing the RDP
A lot of 2D games should be able to run full speed after optimizing. It seems that games which use a lot of rectangles, use the RSP less as well (Yoshi's Story, Bangaioh, Tower and Shaft, etc.). So multi-threading won't be necessary for many 2D games.Snowstorm64 wrote:Do you think these optimizations in the end will yield enough boost so that games can be run at 60 VI/s with the single-threaded build?
There are certain games that appear to have a variable frame rate, that seem to have 0 chance of running full speed. I'm starting to wonder if these games actually frame-skip (and how often), on the console. I'm thinking maybe the reason they have no chance of full speed is because HLE emulators may be running some of these games at a higher speed. Even after factoring in frame-skip though, some games will still have practically no chance of ever running full speed. I'd appreciate it if someone could confirm how much the N64 frame skips in some of these games with seemingly variable frame rate. Some games I have in mind are Vigilante 8, Goldeneye, and Star Wars Ep 1 Racer.
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
I'll let you know when I'm able to use two threads to run things at 60VI/s, let alone one.Snowstorm64 wrote:This is exciting! Do you think these optimizations in the end will yield enough boost so that games can be run at 60 VI/s with the single-threaded build?

In all seriousness, AIO is correct. Some 2D titles (Rampage: World Tour) already run at 60VI/s for me. OTOH, SPLiT's Nacho demo is a different story...
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
I think Banjo-Kazooie has that too, and maybe also other Rare games.AIO wrote:I'd appreciate it if someone could confirm how much the N64 frame skips in some of these games with seemingly variable frame rate. Some games I have in mind are Vigilante 8, Goldeneye, and Star Wars Ep 1 Racer.
On the other hand, Super Mario 64 seems to be more performant with these RDP optimizations and with -multithread option enabled (~5 VI/s boost, average is 50-60 VI/s in most levels, with peaks = 80 VI/s and drops = 40 VI/s). I have to say SM64 is quite playable even at 50 VI/s, at least for me.

OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
Super Smash Bros. also seems to have gotten a nice boost, at least for me.
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
Update 06/23/2016
Looks like a movdqa alignment issue happened sometime on Linux builds? There is now a 5+% VI/s gain over master now.
Overall not very much higher perf. than last posting, but better in low VI/s areas.
Looks like a movdqa alignment issue happened sometime on Linux builds? There is now a 5+% VI/s gain over master now.
Overall not very much higher perf. than last posting, but better in low VI/s areas.
- Attachments
-
- angrylion-rdp_3.png (43.04 KiB) Viewed 33373 times
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
With latest commits, SM64 has just gained another +1 VI/s boost.
But SSB64 is now broken...
EDIT: Paper Mario and Mario Party are causing a segfault too.
EDIT2: This is the faulty commit.
EDIT3: Wrong commit, I have updated the link.

EDIT: Paper Mario and Mario Party are causing a segfault too.
EDIT2: This is the faulty commit.
EDIT3: Wrong commit, I have updated the link.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
Looks a-ok to me... can you please provide more info?*Snowstorm64 wrote:But SSB64 is now broken...
EDIT: Unless it's a segfault... I know the cause of that and just caught it a bit ago myself.
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
Looks like I haven't made in time to edit the message...
However, is this commit you're referring to it as the cause?

OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
That commit causes a segfault sometimes, yep.Snowstorm64 wrote:Looks like I haven't made in time to edit the message...However, is this commit you're referring to it as the cause?
Re: Optimizing the RDP
What games do you have in mind? I'm willing to profile and examine a few popular games that are very slow, to see what can be done. I'm not really concerned about games like Star Fox, SM64, OOT, etc because those are relatively lightweight games tbh. Those can already be full speed once optimizations are done. Although I may at one point, profile the explosions in starfox again, so that the VI/s don't drop.Snowstorm64 wrote:Do you think these optimizations in the end will yield enough boost so that games can be run at 60 VI/s with the single-threaded build?
- Nintendo Maniac 64
- Posts: 185
- Joined: Fri Oct 04, 2013 11:37 pm
Re: Optimizing the RDP
I don't own Vigilante 8, and I never noticed frame skipping whenever I last played it 10-15 years ago (but I was much less sensitive to such thing), but anyone that's done 3-4 player splitscreen in GoldenEye with any sort of explosive weapon in a level with exploding scenery will very know that GoldenEye has an extremely variable framerate - it can and will drop down to what has to be like 5fps when things are really blowing up.AIO wrote:I'd appreciate it if someone could confirm how much the N64 frame skips in some of these games with seemingly variable frame rate. Some games I have in mind are Vigilante 8, Goldeneye, and Star Wars Ep 1 Racer.
And to clarify, the game does not slow down (like some NES games) but rather will become choppy (like most PC games), thereby implying an intentionally variable framerate - it's very noticeable when you're trying to aim your rocket launcher at the attacker(s) causing the mini WW3 around you and one moment your rocket launcher is pointing slightly to the right and then, half a second later when the next frame finally arrives, your giant bazooka barrel is pointing clear across your entire screen to the far left.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
Other than the games that are mentioned in this thread (like Vigilante 8), I could think of F-Zero X (the cartridge port of the Expansion Kit, because the N64 version isn't working right now), the Clock Town part in Majora's Mask, the world hub in Mario Party. There's also Banjo-Kazooie, Doom 64, Star Wars Episode 1 - Racer (this is especially slow!), Iggy's Reckin' Balls (although I don't think this is a popular game, nor it's slow, but there's a scene that happens after the end of the level, where a bunch of colorful explosions makes the VI/s drop, like in Star Fox 64). All those games, except Iggy's Reckin' Balls, rarely pass the 50 VI/s point, even with -multithread on on my PC.AIO wrote:What games do you have in mind? I'm willing to profile and examine a few popular games that are very slow, to see what can be done. I'm not really concerned about games like Star Fox, SM64, OOT, etc because those are relatively lightweight games tbh. Those can already be full speed once optimizations are done. Although I may at one point, profile the explosions in starfox again, so that the VI/s don't drop.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
Re: Optimizing the RDP
I'm glad you brought up Goldeneye. I'm guessing maybe some of these games that run really poorly with Angrylion's is partially because these games also ran poorly on the console.Nintendo Maniac 64 wrote: I don't own Vigilante 8, and I never noticed frame skipping whenever I last played it 10-15 years ago (but I was much less sensitive to such thing), but anyone that's done 3-4 player splitscreen in GoldenEye with any sort of explosive weapon in a level with exploding scenery will very know that GoldenEye has an extremely variable framerate - it can and will drop down to what has to be like 5fps when things are really blowing up.
And to clarify, the game does not slow down (like some NES games) but rather will become choppy (like most PC games), thereby implying an intentionally variable framerate - it's very noticeable when you're trying to aim your rocket launcher at the attacker(s) causing the mini WW3 around you and one moment your rocket launcher is pointing slightly to the right and then, half a second later when the next frame finally arrives, your giant bazooka barrel is pointing clear across your entire screen to the far left.
Mario Party should be full speed once the optimizations are done. That game isn't too intensive. F-Zero is going to be tougher. I hardly ever tested Iggy's, Banjo, or Doom 64. I haven't tested Clock Town, but I'm sure Zelda MM can run full speed after applying more optimizations.Snowstorm64 wrote:Other than the games that are mentioned in this thread (like Vigilante 8), I could think of F-Zero X (the cartridge port of the Expansion Kit, because the N64 version isn't working right now), the Clock Town part in Majora's Mask, the world hub in Mario Party. There's also Banjo-Kazooie, Doom 64, Star Wars Episode 1 - Racer (this is especially slow!), Iggy's Reckin' Balls (although I don't think this is a popular game, nor it's slow, but there's a scene that happens after the end of the level, where a bunch of colorful explosions makes the VI/s drop, like in Star Fox 64). All those games, except Iggy's Reckin' Balls, rarely pass the 50 VI/s point, even with -multithread on on my PC.
Star Wars Episode 1 - Racer is another one of those games that may have frameskip on console. This needs to be investigated. Interestingly, some parts of DK64 may also have frameskip (like the part where he misses the vine on most emulators).
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
True, Mario Party isn't that intensive, but it becomes slow in that particular place I have mentioned before, the world hub where there are the tube, the bank, the raft and some other buildings. I cannot think of any other similar places where the VI/s drops, though.AIO wrote: Mario Party should be full speed once the optimizations are done. That game isn't too intensive. F-Zero is going to be tougher. I hardly ever tested Iggy's, Banjo, or Doom 64. I haven't tested Clock Town, but I'm sure Zelda MM can run full speed after applying more optimizations.
Star Wars Episode 1 - Racer is another one of those games that may have frameskip on console. This needs to be investigated. Interestingly, some parts of DK64 may also have frameskip (like the part where he misses the vine on most emulators).
Zelda MM is a bit more intensive but it's playable enough, however it slows more when we are in Clock Town, especially in the south sector where VI/s can reach near 40 VI/s.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
Re: Optimizing the RDP
In some cases it's hard to speculate because on one hand, there's a lot of optimizations that even I haven't done yet. At the same time, idk exactly how much slower it will be after achieving cycle accurate accuracy. I tested that world hub scene and it has a lot of room for improvement.Snowstorm64 wrote: True, Mario Party isn't that intensive, but it becomes slow in that particular place I have mentioned before, the world hub where there are the tube, the bank, the raft and some other buildings. I cannot think of any other similar places where the VI/s drops, though.
Zelda MM is a bit more intensive but it's playable enough, however it slows more when we are in Clock Town, especially in the south sector where VI/s can reach near 40 VI/s.
An optimized dynarec will even allow you to run games much faster (especially those 2D games). I'll try profiling Clock Town sometime this week.
- Nintendo Maniac 64
- Posts: 185
- Joined: Fri Oct 04, 2013 11:37 pm
Re: Optimizing the RDP
I could probably check both of these since I have both games and my N64 is even hooked up and the like, though you might have to wait at least 2 days before I get results.AIO wrote:Star Wars Episode 1 - Racer is another one of those games that may have frameskip on console. This needs to be investigated. Interestingly, some parts of DK64 may also have frameskip (like the part where he misses the vine on most emulators).
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
This one instance is not frameskip - it s related to memory and DMA timings. If I fiddle with the memory latency in CEN64, I can get DK to grab the vine.AIO wrote:Interestingly, some parts of DK64 may also have frameskip (like the part where he misses the vine on most emulators).
Re: Optimizing the RDP
I tried running around Clock Town today and the game doesn't seem intensive tbh. I'm honestly surprised you don't get full speed with multi-threading. I profiled and saw that it largely used functions I haven't bothered optimizing yet, which is good news I guess since that means there a lot of room for improvement.Snowstorm64 wrote: Zelda MM is a bit more intensive but it's playable enough, however it slows more when we are in Clock Town, especially in the south sector where VI/s can reach near 40 VI/s.
Nice! That would be cool if you testedNintendo Maniac 64 wrote: I could probably check both of these since I have both games and my N64 is even hooked up and the like, though you might have to wait at least 2 days before I get results.

I can't say I am sure, but it seems like that part of the game is running extra slow. When using counter factor 1 or 2 in 1964, he misses the vine and the frame rate during that scene seems good. But if I use CF 3, the game runs at a slower framerate during that scene, but DK doesn't miss the vine. When I watched a youtube video, it seems that the console also has a bad frame rate in that scene.MarathonMan wrote:This one instance is not frameskip - it s related to memory and DMA timings. If I fiddle with the memory latency in CEN64, I can get DK to grab the vine.
- Nintendo Maniac 64
- Posts: 185
- Joined: Fri Oct 04, 2013 11:37 pm
Re: Optimizing the RDP
From what I can tell, in both SWEp1R and DK64, the gameplay itself slows down when the framerate drops, thereby implying a non-variable framerate.AIO wrote:Nice! That would be cool if you tested. I'm patient, so you can take your time.
However, I think both games might have a variable framerate when running above what seems to be 20fps. I'm less confident that DK64 does this, but I'm pretty sure SWEp1R does.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
Update 07/10/2016
Thought of some more ideas today. Now 7-8% faster over master.
Thought of some more ideas today. Now 7-8% faster over master.
- Attachments
-
- angrylion-rdp_4.png (40.69 KiB) Viewed 32755 times
- Nintendo Maniac 64
- Posts: 185
- Joined: Fri Oct 04, 2013 11:37 pm
Re: Optimizing the RDP
This is going to seem incredibly off-topic...MarathonMan, maybe I'm thinking of a completely different guy, but I thought you were a native of South America? I say this because only someone native to the US and/or it territories (and maybe Canada or Mexico) would use the M/D/Y format (AFAIK it's Y/M/D or D/M/Y everywhere else).MarathonMan wrote:07/10/2016
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
http://orig04.deviantart.net/95cb/f/201 ... 8flm4d.jpgNintendo Maniac 64 wrote:This is going to seem incredibly off-topic...MarathonMan, maybe I'm thinking of a completely different guy, but I thought you were a native of South America? I say this because only someone native to the US and/or it territories (and maybe Canada or Mexico) would use the M/D/Y format (AFAIK it's Y/M/D or D/M/Y everywhere else).MarathonMan wrote:07/10/2016
- Nintendo Maniac 64
- Posts: 185
- Joined: Fri Oct 04, 2013 11:37 pm
Re: Optimizing the RDP
Hey, I myself am a native of northeast Ohio.MarathonMan wrote:http://orig04.deviantart.net/95cb/f/201 ... 8flm4d.jpg

CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
(just "tsundere" makes people think of "Shana clones" *shivers*)
CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
With today's commits:
)
(Okay, I admit I have cheated a bit with the time trial mode, but hey, it's still good 
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
- MarathonMan
- Site Admin
- Posts: 692
- Joined: Fri Oct 04, 2013 4:49 pm
Re: Optimizing the RDP
Have you noticed any problem with frameskipping? It looks like sometimes, games are cutting frames (i.e., SPLiT's Nacho demo). I have to compare to the console and verify.
- Snowstorm64
- Posts: 303
- Joined: Sun Oct 20, 2013 8:22 pm
Re: Optimizing the RDP
I haven't tried too much games, but I believe it could be true what you are saying about frameskipping. But it's hard for me to compare between overclocked 60 Hz version on CEN64 and standard 50 Hz version of same games on my N64...MarathonMan wrote:Have you noticed any problem with frameskipping? It looks like sometimes, games are cutting frames (i.e., SPLiT's Nacho demo). I have to compare to the console and verify.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)
Who is online
Users browsing this forum: No registered users and 1 guest