angrylion RDP plugin almost threaded

Discuss any unrelated topics here.
User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

angrylion RDP plugin almost threaded

Post by MarathonMan » Sat Feb 06, 2016 5:22 pm

(Hopefully) I will be able to stabilize this. It isn't working for any first party ROMs yet, but when it does work, the results are quite fantastic.

I have an experimental branch where:
-multithread will use RSP/VI for one core, RDP for a second, and VR4300/AI/PI/SI for the third.

A fourth thread is used for rendering the framebuffer to the window, but that's already merged into everything. :)
Attachments
trithread.png
trithread.png (601.9 KiB) Viewed 13632 times

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Sat Feb 06, 2016 5:27 pm

Nice! How much does it impact on the emulator's accuracy (also performance)?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Sat Feb 06, 2016 5:37 pm

Snowstorm64 wrote:Nice! How much does it impact on the emulator's accuracy (also performance)?
Accuracy: well, it boots like... nothing. So quite bad right now lol.

Huge performance impact. I can hit 60VI/s in ROMs that I never could before on my ultrabook.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Sat Feb 06, 2016 7:14 pm

Seems to work for Star Fox 64, too:

https://www.youtube.com/watch?v=9BR9JLGEzDI

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Sat Feb 06, 2016 8:27 pm

Also Super Mario 64, although its graphics are a bit unstable, but it's playable. I have managed to play it overclocked(up to almost double the speed!) and get two stars before the games has crashed. :D
SM64multithreaded.png
SM64multithreaded.png (80.42 KiB) Viewed 13594 times
SM64multithreaded2.png
SM64multithreaded2.png (83.6 KiB) Viewed 13594 times
SM64multithreaded3.png
SM64multithreaded3.png (99.11 KiB) Viewed 13594 times
Last edited by Snowstorm64 on Sat Feb 06, 2016 8:31 pm, edited 3 times in total.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Sat Feb 06, 2016 8:28 pm

SM64multithreaded4.png
SM64multithreaded4.png (58.18 KiB) Viewed 13594 times
SM64multithreaded5.png
SM64multithreaded5.png (63.73 KiB) Viewed 13594 times
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: angrylion RDP plugin almost threaded

Post by iwasaperson » Sat Feb 06, 2016 8:43 pm

Getting 90 VI/s on my 6600K. We may need a frame limiter now.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Sat Feb 06, 2016 8:47 pm

Snowstorm64 wrote:Also Super Mario 64, although its graphics are a bit unstable, but it's playable. I have managed to play it overclocked(up to almost double the speed!) and get two stars before the games has crashed. :D
Nice find!! I am 60VI/s on this game as well!
iwasaperson wrote:Getting 90 VI/s on my 6600K. We may need a frame limiter now.
First world problems... :p

Turn on v-sync in the meantime? My system will lock at 60VI/s because my display configuration is setup for 60HZ.

I can't really debug a frame limiter, because I literally cannot get it to go over 60VI/s anyways...

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: angrylion RDP plugin almost threaded

Post by iwasaperson » Sat Feb 06, 2016 8:50 pm

MarathonMan wrote:
Snowstorm64 wrote:Also Super Mario 64, although its graphics are a bit unstable, but it's playable. I have managed to play it overclocked(up to almost double the speed!) and get two stars before the games has crashed. :D
Nice find!! I am 60VI/s on this game as well!
iwasaperson wrote:Getting 90 VI/s on my 6600K. We may need a frame limiter now.
First world problems... :p

Turn on v-sync in the meantime? My system will lock at 60VI/s because my display configuration is setup for 60HZ.

I can't really debug a frame limiter, because I literally cannot get it to go over 60VI/s anyways...
Using a CRT at 93Hz with Linux (OSS Intel drivers), so that's not an option.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Sat Feb 06, 2016 8:52 pm

iwasaperson wrote:Using a CRT at 93Hz with Linux (OSS Intel drivers), so that's not an option.
Ah I see.

And I rescind my comment - I can debug it; I'll just have to debug it in a headless mode.

tl;dr: on the TODO list it goes.

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: angrylion RDP plugin almost threaded

Post by iwasaperson » Sat Feb 06, 2016 9:15 pm

MarathonMan wrote: Ah I see.

And I rescind my comment - I can debug it; I'll just have to debug it in a headless mode.

tl;dr: on the TODO list it goes.
Sounds good.
Really impressed with the spike in progress lately.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Sat Feb 06, 2016 10:12 pm

iwasaperson wrote:
MarathonMan wrote: Ah I see.

And I rescind my comment - I can debug it; I'll just have to debug it in a headless mode.

tl;dr: on the TODO list it goes.
Sounds good.
Really impressed with the spike in progress lately.
Thanks! :mrgreen:

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Sun Feb 07, 2016 12:51 pm

I wonder if at this point we could afford a tighter sync on the various components, in order to achieve better accuracy, or not...Maybe an option to set the looseness of the sync?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
asiga
Posts: 24
Joined: Fri May 30, 2014 5:35 pm

Re: angrylion RDP plugin almost threaded

Post by asiga » Sun Feb 07, 2016 5:51 pm

Any try with World Driver Championship? IIRC, Angrylion RDP could emulate it, at least partially.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Mon Feb 08, 2016 2:04 am

Snowstorm64 wrote:I wonder if at this point we could afford a tighter sync on the various components, in order to achieve better accuracy, or not...Maybe an option to set the looseness of the sync?
Yes, it is absolutely possible to tighten the sync on higher end systems.

It's probably cheap to do detect and set based on user hardware or something.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Mon Feb 08, 2016 6:24 pm

Good job!

I was wondering: With threading in place, any hope to have numbers about which thread group take in computing time?

I'm interested by the RDP runtime cost.

Thanks in advance! :)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Mon Feb 08, 2016 10:54 pm

No, I haven't profiled it at all.

I can say that when I went from RCP thread + VR4300 thread to the "tri-thread" solution, the Mario head in the Super Mario 64 intro went from about 45 VI/s to ~60 VI/s. So quite a leap.

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: angrylion RDP plugin almost threaded

Post by Nintendo Maniac 64 » Tue Feb 09, 2016 11:33 pm

Hey, maybe N64 overclocking can actually be useful in Cen64 now. :P

Anyway, if the previous 2-thread multithreading could be called the equivalent of bsnes's "balanced" profile, then perhaps this would be the equivalent of bsnes's "performance" profile?
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
tony971
Posts: 15
Joined: Sun Feb 01, 2015 1:02 pm

Re: angrylion RDP plugin almost threaded

Post by tony971 » Tue Feb 16, 2016 3:40 pm


User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Tue Feb 16, 2016 3:51 pm

Unfortunately, this would not really help Cen64 (nor angrylion plugin) as it only rely on OpenGL for window rendering. Vulkan would not improve performance.

User avatar
asiga
Posts: 24
Joined: Fri May 30, 2014 5:35 pm

Re: angrylion RDP plugin almost threaded

Post by asiga » Tue Feb 16, 2016 6:10 pm

Narann wrote:Unfortunately, this would not really help Cen64 (nor angrylion plugin) as it only rely on OpenGL for window rendering. Vulkan would not improve performance.
OpenGL and Vulkan are different concepts. GPUs are no longer graphics accelerators, but massively parallel SIMD machines, designed for general purpose computing provided it fits in a massive parallel flow. Yes, Vulkan has a graphics/gaming flavor, but it's much more low-level than any other GPU API.

The bad news about Vulkan is that it comes in a very bad moment: tech companies no longer wish to establish standards, but proprietary APIs: Apple fights for Metal and even invent their own language (Swift). NVIDIA pushes for CUDA and tries to pretend OpenCL doesn't exist. AMD pushes for Vulkan, which is mainly based on Mantle (an AMD API) -and that's not good: I'm not sure how friendly are going to be other vendors with an API coming from AMD.

Bad times for standards :cry:

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Tue Feb 16, 2016 6:37 pm

I know what Vulkan is, it does not change the point that using it in Cen64 is pointless.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Tue Feb 16, 2016 10:15 pm

Well, it's true that Angrylion RDP is software rendering, and CEN64's VI is the only place here that use OpenGL(and it's quite simple and minimal), along with the backends in the os directory. Still, I wonder how well a Vulkan-based RDP (different from the Angrylion's one) would do against the software rendering, though...
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Wed Feb 17, 2016 3:55 am

I think it would be quite bad still. You would need a way to effectively map the GPU's view of memory and the CPU's view of memory in the same spot.

It doesn't matter how flashy your API is, there's still a lot of latency that has to be accounted for somewhere.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Wed Feb 17, 2016 4:54 pm

Snowstorm64 wrote:Well, it's true that Angrylion RDP is software rendering, and CEN64's VI is the only place here that use OpenGL(and it's quite simple and minimal), along with the backends in the os directory. Still, I wonder how well a Vulkan-based RDP (different from the Angrylion's one) would do against the software rendering, though...
The problem is which accuracy do you want to achieve using GPU (I'm not talking about cycle accuracy, more pixel/depth accuracy). If you rely on GPU internal rasterizer (the one exposed by Vulkan/OpenGL), you are clearly doing HLE and will fight against it to get consistent results.

If you don't rely on it, you (roughly) have to write shaders to emulate N64 Rasterizer, write the result on a texture (your supposed emulated framebuffer) and rasterize the texture. Emulated N64 Rasterizer on CPU is tought. While emulate N64 Rasterizer using SPIR-V would be possible, what would you expect?

Better resolution? Maybe but this time you will fight against "original" values to keep the increased resolution consistent. Can be tricky, but still possible (It's what HLE RDP does). What is the R4300 CPU is supposed to modify the framebuffer image (some games use this for some fog effects) and this image is not in native resolution?

Performance? Not sure: The written image will be in GPU memory, meaning you would have to retrieved it back to the CPU RAM to continue R4300 emulation. What ever the framebuffer resolution is (native or not), you will have to wait to have the framebuffer back to CPU memory before alow R4300 to modify it. Original N64 hardware does not have to wait anything as the memory is unified (Both CPU and RDP share the same memory).

So while it's possible, I'm not sure it worth the effort. I would rely on CPU threads to improve rasterizer performance, each thread rendering few lines/tiles of pixels on the framebuffer and use Vulkan/OpenGL to reproduce CRT screen and, why not, VI emulation. This would provide a true N64 rendering experience.

But if you go for a HLE RDP, Vulkan is defenetly the way to go. By design Vulkan should avoid the classical OpenGL implementation weed smoking fighting. You could also write you own internal low level SPIR-V generator providing more in control compared to GLSL vendor specific implementation behavior (I don't know how Fixed Point values can be handles, I should check this in SPIR-V specs).

Don't get me wrong: You will still have to fight against the GPU rasterizer because what GPU rasterizer provide (bigger resolution, etc...) will obviously reach (not so) corner cases. Depth and framebuffer read/write CPU access are the dark beast of HLE. All of them can be solve, as usual, using hacks.

My 2 cts.

User avatar
wareya
Posts: 16
Joined: Tue May 19, 2015 5:44 pm

Re: angrylion RDP plugin almost threaded

Post by wareya » Thu Feb 18, 2016 1:02 am

Just emulate the entire system on GPU /s

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Thu Feb 18, 2016 11:57 am

Narann wrote:-cut-
What about LLE? Isn't z64gl supposed to be low level hardware rendering, and thus it can emulate the graphics without the problems that HLE encounters like you have said before? Or am I wrong? If someone makes a Vulkan-based LLE RDP, could this be more accurate than anything other except for the software rendering?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Thu Feb 18, 2016 3:22 pm

Snowstorm64 wrote:What about LLE? Isn't z64gl supposed to be low level hardware rendering, and thus it can emulate the graphics without the problems that HLE encounters like you have said before? Or am I wrong? If someone makes a Vulkan-based LLE RDP, could this be more accurate than anything other except for the software rendering?
I should check again but from what I remember, z64gl is not a LLE RDP per see but a HLE RDP relying on LLE RSP.

Actually the original release post confirm this:
it is mainly an RDP emulator implemented in OpenGL. Contrary to usual graphics plugins, it doesn't emulate the RSP part, so it requires a functionnal RSP emulator plugin to give any results.
What does it change? Once again, the rasterizer is not "accurate" (in a sense of "properly emulated") because it rely on the on local GPU one. I would be interested to know if the red point works in Pokemon Snap with z64gl as this game need a properly emulated depth buffer.

On real HW, RSP and RDP work together but most HLE RDP plugin (eg. Rice video) "catch" some "RDP-related" RSP GBI commands (eg. matrix stack) and emulate them the "High Level" way (eg. emulate matrix stack with true "IEEE754" floats). This bring to inaccurate (but very fast) matrix results.

There is not a straight frontier between HLE and LLE so don't get confused by those definitions, specially in the case of Reality CoProcessor wich is composed of multiple components (RCP = RSP + RDP + <other things actually>).

So, if you use Vulkan to do a LLE plugins "ala z64gl" (aka: "Need LLE RSP to work but is not a LLE RDP per see") you will defenetly have more control but will not avoid the hacks here and there.

If you use Vulkan to do a "true" LLE RDP plugin (ala angrylion) you will not befenefit from the API because of everything discussed above (which are not problems related to API but how CPU+GPU works).

The only situation where you "could" actually write a LLE RDP plugin using local GPU is on unified memory architectures writting bare metal (architecture specific) GPU code (Raspberry Pi with its VC4 is a good example). So you considere the hardware as "finites" but your code will never work elsewhere. You don't rely on any API, you directly write on GPU registers (like if you where writting a driver...). This is a massive amount of work (I have no idea if it would even work) and... Yeah, nobody would ever do this but if you have to write a LLE RDP on a console, this is the way to go. This bring to the situation where (IMHO) LLE RDP plugin relying on a proper CPU threading model could be far easier to write and maintain and keep good performance.

User avatar
asiga
Posts: 24
Joined: Fri May 30, 2014 5:35 pm

Re: angrylion RDP plugin almost threaded

Post by asiga » Thu Feb 18, 2016 5:47 pm

wareya wrote:Just emulate the entire system on GPU /s
Indeed (well, maybe not the whole system, but the RDP could be 100% implemented in GPU in a 100% pixel exact way). The fact that the rest of N64 "emulators" use fixed-operation OpenGL and they completely trash the N64 experience, has created a generalized bad opinion about using GPUs for doing the emulation. However, GPUs can be programmed nowadays just like you write C, and you can even get IEEE fp accuracy compliance if you ask to. The RDP is designed for by-pixel operations and by-vertex operations, so the GPU massive parallelism fits in the scenario. Yes, and with 100% pixel accuracy. But of course not using OpenGL.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Thu Feb 18, 2016 6:42 pm

asiga wrote:well, maybe not the whole system, but the RDP could be 100% implemented in GPU in a 100% pixel exact way).
If it's even possible, it will be quite hard. For example, depth value take multiple fixed point format during RDP pipeline. Simulate this can be tricky even on CPU. Why would you bother to hack this on the GPU side? What will you win compared to CPU? Performance? I already explained why it's not relevant above.
asiga wrote:The fact that the rest of N64 "emulators" use fixed-operation OpenGL and they completely trash the N64 experience, has created a generalized bad opinion about using GPUs for doing the emulation.
No, the fact N64 architecture is very different a than any modern console does :mrgreen: . Trust me, if RDP could be emulated easely using GPU everyone would jump on this solution.
asiga wrote:However, GPUs can be programmed nowadays just like you write C
Writting a kernel (or a shader) in C is not complicate yes but it doesn't mean you have no more complexity: You still have to handle CPU<=>GPU memory. Something you don't have to on a CPU only solution.
asiga wrote:you can even get IEEE fp accuracy compliance if you ask to.
For proper RDP emulation you actually never want that.
asiga wrote:The RDP is designed for by-pixel operations
True
asiga wrote:and by-vertex operations
False :mrgreen: : RDP commands actually never deal with vertices directly (RSP commands do) they use "rasterizer coefficents" computed by RSP. RDP have no idea where each vertice is.
asiga wrote:so the GPU massive parallelism fits in the scenario. Yes, and with 100% pixel accuracy. But of course not using OpenGL.
But the original question remain, what do you expect using GPU? Why is GPU so good? Performances? Once again, if you want accurate hack-free results you need native resolution so the number of pixel to compute is not such high. Plus, doing so, you will have to sync GPU and CPU memory. While the amount of data is not big, the simple access to GPU memory in a non unified memory architecture can decrease performances a lot.

TLDR: While possible, it's complex for virtually no benefit.

User avatar
asiga
Posts: 24
Joined: Fri May 30, 2014 5:35 pm

Re: angrylion RDP plugin almost threaded

Post by asiga » Sat Feb 20, 2016 6:18 am

Narann wrote: But the original question remain, what do you expect using GPU? Why is GPU so good? Performances? Once again, if you want accurate hack-free results you need native resolution so the number of pixel to compute is not such high. Plus, doing so, you will have to sync GPU and CPU memory. While the amount of data is not big, the simple access to GPU memory in a non unified memory architecture can decrease performances a lot.

TLDR: While possible, it's complex for virtually no benefit.
Well, I'm not going to try to convince anybody about the benefits of general purpose computing on GPU. Even Intel is jumping into this wagon by applying GPU concepts and design to their Xeon Phi (Knights Landing/Knights Hill/etc) HPC products.

Your comments show you hold a position against exploiting GPUs for accelerating general purpose algorithms. I could argue that all current-generation (and previous generation) GPUs fully support running general purpose C algorithms on them. For sure GPU<>CPU memory transfers can ruin your performance, but, however, once you have the program and data running on the GPU, transfers can be minimized to the really needed amount. Also note that the N64 framebuffer is just a few MB. Just detect any write/read by the CPU in the framebuffer address map, and do the transfer only when needed, and using a fast memory mode. Note that games where the CPU doesn't access the framebuffer might not even need any CPU<>GPU transfer at all (except for vertices data -yes, sorry, it's the RSP, which, by the way, would benefit even more from a GPU implementation).

Any way, as I said, you do have a position, and I'm not going to try to convince you. A reason for choosing your way is that it's a bad time for standards, as I said in a previous post. For example, Pixar is in the process of adding GPU acceleration to RenderMan (note that RenderMan really needs fully C programmability, it cannot be done by OpenGL nor "shaders"), but they're studying how to do so, and I guess it's because there's currently a fight of standards in this area and they wish to do a wise choice in terms of future support (CUDA vs OpenCL vs Metal vs C++ translation engines -see GPUopen by AMD- ) .

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: angrylion RDP plugin almost threaded

Post by Nintendo Maniac 64 » Sun Feb 21, 2016 8:53 pm

Narann wrote:The only situation where you "could" actually write a LLE RDP plugin using local GPU is on unified memory architectures writting bare metal (architecture specific) GPU code (Raspberry Pi with its VC4 is a good example).
...so basically HSA on a modern AMD APU?
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Mon Feb 22, 2016 2:57 am

Nintendo Maniac 64 wrote:...so basically HSA on a modern AMD APU?
It could be a good candidate yes!

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: angrylion RDP plugin almost threaded

Post by Nintendo Maniac 64 » Mon Feb 22, 2016 7:07 pm

Narann wrote:
Nintendo Maniac 64 wrote:...so basically HSA on a modern AMD APU?
It could be a good candidate yes!
The only thing is that MarathonMan has previously stated that Cen64 is so latency-sensitive that SMT is actually faster than two separate CPU cores, so the likes of an on-die GPU (even with shared memory and all) would be slower still due to even worse latency.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Mon Feb 22, 2016 7:50 pm

The only thing is that MarathonMan has previously stated that Cen64 is so latency-sensitive that SMT is actually faster than two separate CPU cores
In which situation?

Not sure I get the point. What MarathonMan was trying to do with CPU cores? r4300i on one core + RSP on another? Or use multiple core for r4300i emulation? The first situation should improve performances, the second one can actually be slower yes.

In the LLE RDP situation, what I suggest (aka: I'm trying to achieve) is to use one core for a particular set of framebuffer lines. So roughly: 4 Cores for a 320x240 framebuffer mean Core0 emulate RDP on line 0 to 60, Core1 61 to 120, Core2 121 to 180, Core3 181 to 240.

Very rougly, it's like if you where running four angrylion RDP plugin each of them rendering a particular set of line. Because there is no dependencies between each pixel, you should be able to improve performances.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Mon Feb 22, 2016 11:01 pm

SMT does result in lower latency and still is not good enough [if you want to sync the cores every cycle and remain 100% accurate].

Multithreading as it's implemented now only works due to the fact that I found a way to sacrifice a very small amount of accuracy for a disproportionally large amount of parallelism.

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: angrylion RDP plugin almost threaded

Post by Nintendo Maniac 64 » Mon Feb 22, 2016 11:52 pm

MarathonMan wrote:Multithreading as it's implemented now only works due to the fact that I found a way to sacrifice a very small amount of accuracy for a disproportionally large amount of parallelism.
So basically what bsnes's "Balanced" core does with regards to performance vs accuracy.

(I do realize bsnes is single-threaded though)
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
wareya
Posts: 16
Joined: Tue May 19, 2015 5:44 pm

Re: angrylion RDP plugin almost threaded

Post by wareya » Mon Mar 07, 2016 4:34 pm

bsnes is single-core, but it does use "threads" in a sense - cooperative multithreading. It's basically a linear state machine, but with the interleaving done by "inner" code instead of the "outer" code, sort of.

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: angrylion RDP plugin almost threaded

Post by Nintendo Maniac 64 » Tue Mar 08, 2016 12:32 am

wareya wrote:bsnes is single-core
Which is why I said "I do realize bsnes is single-threaded though".

My point was the idea of a large speedup with minimal sacrifice in accuracy (as far as I can tell, the only real-world difference other than performance between single and multi-threaded in Cen64 is that some games fail to boot).
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Tue Jul 12, 2016 12:47 am

I nearly stabilized this commit. There is the occasional total freeze after a few minutes of play with some games, but I'm confident that I'll be able to figure it out. I'll toss up a YouTube video in a bit.

Getting 60VI/s on these titles:
Goldeneye 007
Vigilante 8
Mario Tennis (on court, not in menus)
Banjo Kazooie
Super Smash Bros.

etc...

EDIT: Enjoy! https://www.youtube.com/watch?v=Jy8IOxcj8r4

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: angrylion RDP plugin almost threaded

Post by Narann » Tue Jul 12, 2016 6:31 am

That's truly impressive! o_O

I'm surprise SSMB lag so much, it was supposed to be a graphic-cheap game to actually be fast.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Tue Jul 12, 2016 12:20 pm

Narann wrote:I'm surprise SSMB lag so much, it was supposed to be a graphic-cheap game to actually be fast.
I think I was only recording at 30FPS :oops:

It's definitely running 1:1 FPS to VI/s.

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: angrylion RDP plugin almost threaded

Post by iwasaperson » Tue Jul 12, 2016 1:38 pm

MarathonMan wrote:I nearly stabilized this commit. There is the occasional total freeze after a few minutes of play with some games, but I'm confident that I'll be able to figure it out. I'll toss up a YouTube video in a bit.

Getting 60VI/s on these titles:
Goldeneye 007
Vigilante 8
Mario Tennis (on court, not in menus)
Banjo Kazooie
Super Smash Bros.

etc...

EDIT: Enjoy! https://www.youtube.com/watch?v=Jy8IOxcj8r4
Wow. Single threaded runs about the same for me on my 6600K (OCed 200MHz with turbo boost), so a multithreaded RDP would probably run full speed all the time.

Also, I've noticed that CEN64 wants to use the JACK audio server. What advantages does this have over ALSA for emulation? I already use JACK for audio production, so I just have it on whenever I'm running CEN64 anyway.

A couple of other questions, why in both OoT and Majora's Mask, does CEN64 seem to run slower at the save select screen than on actual 3D parts? Also, when will the VI filter be added as an option? The dithering is getting annoying, and the libretro guys already figured out how to do it with shaders: https://github.com/libretro/common-shad ... 4-vifilter

EDIT: https://ipfs.pics/ipfs/QmbZkhSpC3NDSKeo ... qXfikp7CNY
What's going on here? I know my ROM is fine since I got it from the GoodSet and it works perfectly on my EverDrive. I also instantly died when the intro finished.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: angrylion RDP plugin almost threaded

Post by Snowstorm64 » Tue Jul 12, 2016 2:07 pm

iwasaperson wrote: Also, I've noticed that CEN64 wants to use the JACK audio server. What advantages does this have over ALSA for emulation? I already use JACK for audio production, so I just have it on whenever I'm running CEN64 anyway.
CEN64 uses OpenAL, not JACK. I had some issues with that a while ago, it turned out that OpenAL Soft wasn't configured properly. I have fixed it using the tool "alsoft-conf" (that you can find in the repository, with same name) and pointed to it the backend I use. (PulseAudio, but can also be JACK or ALSA)
iwasaperson wrote: EDIT: https://ipfs.pics/ipfs/QmbZkhSpC3NDSKeo ... qXfikp7CNY
What's going on here? I know my ROM is fine since I got it from the GoodSet and it works perfectly on my EverDrive. I also instantly died when the intro finished.
This is because the FlashRAM save isn't loaded into CEN64, you need to do it in order to make Majora's Mask work properly. To do it, you have to launch from shell something like this:

Code: Select all

cen64 -flash tlozmajorasmask.fla pifdata.bin tlozmajorasmask.z64
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: angrylion RDP plugin almost threaded

Post by iwasaperson » Tue Jul 12, 2016 3:05 pm

Snowstorm64 wrote:
iwasaperson wrote: Also, I've noticed that CEN64 wants to use the JACK audio server. What advantages does this have over ALSA for emulation? I already use JACK for audio production, so I just have it on whenever I'm running CEN64 anyway.
CEN64 uses OpenAL, not JACK. I had some issues with that a while ago, it turned out that OpenAL Soft wasn't configured properly. I have fixed it using the tool "alsoft-conf" (that you can find in the repository, with same name) and pointed to it the backend I use. (PulseAudio, but can also be JACK or ALSA)
It already works just fine without JACK running. I guess it falls back to ALSA in that case. Also alsoft-conf doesn't give JACK as an option, it's just using ALSA ATM.
iwasaperson wrote: EDIT: https://ipfs.pics/ipfs/QmbZkhSpC3NDSKeo ... qXfikp7CNY
What's going on here? I know my ROM is fine since I got it from the GoodSet and it works perfectly on my EverDrive. I also instantly died when the intro finished.
This is because the FlashRAM save isn't loaded into CEN64, you need to do it in order to make Majora's Mask work properly. To do it, you have to launch from shell something like this:

Code: Select all

cen64 -flash tlozmajorasmask.fla pifdata.bin tlozmajorasmask.z64
That worked. Thanks.

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: angrylion RDP plugin almost threaded

Post by Nintendo Maniac 64 » Tue Jul 19, 2016 5:24 pm

MarathonMan wrote:Getting 60VI/s on these titles:
Goldeneye 007
Super Smash Bros.
Just how much CPU headroom is left over (if any) for whatever your particular CPU model is? I'm particularly interested in if there's enough headroom available via overclocking for 60VI/s to still be possible even if Cen64 is compiled to run at overclocked N64 speeds (125MHz).
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
CluelessGuy
Posts: 2
Joined: Sun Jul 24, 2016 1:55 am

Re: angrylion RDP plugin almost threaded

Post by CluelessGuy » Mon Jul 25, 2016 3:41 am

Can someone break this down for me? I'm not super tech savvy. Does this mean CEN64 will take advantage of system with multiple cores now, and before it was only using one? I've got an 18 core machine, which each clocked around 2.3ghz. Should I expect very good performance moving forward with this update?

User avatar
asiga
Posts: 24
Joined: Fri May 30, 2014 5:35 pm

Re: angrylion RDP plugin almost threaded

Post by asiga » Mon Jul 25, 2016 6:50 am

CluelessGuy wrote:Can someone break this down for me? I'm not super tech savvy. Does this mean CEN64 will take advantage of system with multiple cores now, and before it was only using one? I've got an 18 core machine, which each clocked around 2.3ghz. Should I expect very good performance moving forward with this update?
I'm also curious about how many cores can be used by CEN64, because I'm going to buy a new machine next month and, although it could seem exaggerated by some, CEN64 could play a role in the decision of the number of cores of my machine.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: angrylion RDP plugin almost threaded

Post by MarathonMan » Mon Jul 25, 2016 7:16 am

CluelessGuy wrote:Can someone break this down for me? I'm not super tech savvy. Does this mean CEN64 will take advantage of system with multiple cores now, and before it was only using one? I've got an 18 core machine, which each clocked around 2.3ghz. Should I expect very good performance moving forward with this update?
Yes, it can take advantage of up to ~3-4 cores. However, CEN64 prefers fast cores, rather than lower clocks and lots of cores. A 2.3ghz 18-core CPU is not ideal for high performance.
asiga wrote:I'm also curious about how many cores can be used by CEN64
I don't see it really scaling beyond a quad core. Only 3 of the cores are really "busy" (the fourth is just to handle the GUI, render the screen, etc.) For the best performance, you want a true quad core, though -- there's a notable difference between a dual-core hyperthreaded CPU, and one that actually has 4 cores.

In the future, I hope to use fewer cores once I finish the rewrite. The rule of thumb is probably thus: get the highest clocked quad core you can find. If you're on a budget, get the highest clocked dual core with hyperthreading you can find.

User avatar
CluelessGuy
Posts: 2
Joined: Sun Jul 24, 2016 1:55 am

Re: angrylion RDP plugin almost threaded

Post by CluelessGuy » Mon Jul 25, 2016 8:27 am

MarathonMan wrote:
CluelessGuy wrote:Can someone break this down for me? I'm not super tech savvy. Does this mean CEN64 will take advantage of system with multiple cores now, and before it was only using one? I've got an 18 core machine, which each clocked around 2.3ghz. Should I expect very good performance moving forward with this update?
Yes, it can take advantage of up to ~3-4 cores. However, CEN64 prefers fast cores, rather than lower clocks and lots of cores. A 2.3ghz 18-core CPU is not ideal for high performance.
asiga wrote:I'm also curious about how many cores can be used by CEN64
I don't see it really scaling beyond a quad core. Only 3 of the cores are really "busy" (the fourth is just to handle the GUI, render the screen, etc.) For the best performance, you want a true quad core, though -- there's a notable difference between a dual-core hyperthreaded CPU, and one that actually has 4 cores.

In the future, I hope to use fewer cores once I finish the rewrite. The rule of thumb is probably thus: get the highest clocked quad core you can find. If you're on a budget, get the highest clocked dual core with hyperthreading you can find.

Thanks for the information and all of the hard work on this emulator. I've been watching the project for a long time and and I'm so glad to see your progress on it.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest