Short burst of progress...

News from administrators.
Post Reply
User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Short burst of progress...

Post by MarathonMan » Tue Oct 21, 2014 9:41 am

Short burst of progress:
  • Wrote the scalar unit, CP0, etc. of the new RSP core this weekend. Completely redesigned, using the same optimization techniques that proved advantageous to the VR4300. Due to the RSP's smaller instruction set on the scalar side, there may be an even greater increase in performance over what the VR4300 saw. I have an additional technique that I haven't tried that I may be able to apply to the SSE vector operations, too. Even without that, the new core is shaping up to pummel both the performance and compatibility of the old core.
  • Did a handful of micro-optimizations to the VR4300 core and main loop of CEN64, which resulted in a >5% improvement on my two test machines (different architectures, speeds, etc.).
  • CEN64 now simulates MCI (multi-cycle instruction) interlocks properly. Heavy FPU-based ROMs especially benefited in both accuracy and performance after I began simulating the interlock.
  • Merged angrylion's newest batch of RDP fixes and optimizations, which also resulted in a performance boost. Unfortunately, as shown by cen64-backport, this seems to have broken some ROMs, such as DK64. Others, like Zelda: OoT run noticeably faster.
I'd also like to take the time to thank:
  • krom: for his N64 (and in particular, RSP) test programs. Having the source to a bunch of small, targeted test programs saved me hours during the rewrite of the RSP this weekend. His triangle rotation/RDP programs also helped me determine that implementing MCI would speedup simulation considerably in some cases.
  • beannaich: whose works have mostly yet to be unveiled... but has a sick new website design in that is nearing completion. 8-) Most notably, his website design includes a compatibility tracker/list that will make the efforts of the testing community a little easier, I hope! :D
  • Snowstorm64: for pointing out my sloppy/broken code changes and compatibility problems with the new core. I might have determined the cause of another *large* bug that has been sitting latent in the core for quite some time using the most recent list.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Short burst of progress...

Post by Snowstorm64 » Tue Oct 21, 2014 1:58 pm

I'm glad I helped you for that. :D All exciting news, except for RDP regressions, but I'm sure we will overcome these soon or later.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
chriztr
Posts: 38
Joined: Sun Oct 06, 2013 4:15 pm

Re: Short burst of progress...

Post by chriztr » Tue Oct 21, 2014 2:41 pm

Great!
Attachments
1404767434723.jpg
1404767434723.jpg (21.56 KiB) Viewed 25331 times

User avatar
The Extremist
Posts: 29
Joined: Sun Nov 03, 2013 6:11 pm
Location: Canadian Prairie

Re: Short burst of progress...

Post by The Extremist » Thu Oct 23, 2014 2:14 am

Swell!

:D

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: Short burst of progress...

Post by Nacho » Fri Oct 24, 2014 4:26 pm

Awesome!

Did you crushed the cache bugs that were affecting the keyboard?

Aside of that, are you going to rewrite the RDP/RSP from scratch, or are you going to reuse the existing code?

Toughts about sound?

End of the interview :P
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Sat Oct 25, 2014 12:02 pm

I found a bug in uncached reads and writes, though I don't think it'll fix the input bug. I have a inkling about what that 'cache' bug may be, though it's not a cache bug per se. It's more of an issue with timing or a race condition.

I'm using MAME/angrylion RDP as from before, RSP is being rewritten from the ground up. I actually already finished the RSP scalar unit, CP0 & DMAs, etc. The only thing that's left are vector reads and writes (LWC2/SWC2) and vector operations (of which both I have started). In the last release, the RSP was hampering performance and this rewrite is going to drastically accelerate things if everything falls into place.

On Linux, with VR4300 + a partially-implemented RSP, CEN64 is only 64KiB. :)

Hopefully after I get the RSP working with the new core, I will have time to look at audio. :D

User avatar
Breadwinka
Posts: 54
Joined: Fri Oct 04, 2013 11:35 pm

Re: Short burst of progress...

Post by Breadwinka » Sat Oct 25, 2014 3:51 pm

MarathonMan wrote:Hopefully after I get the RSP working with the new core, I will have time to look at audio. :D
Audio *Swoons*

User avatar
juef
Posts: 31
Joined: Sun Oct 27, 2013 10:19 pm

Re: Short burst of progress...

Post by juef » Sat Oct 25, 2014 9:43 pm

MarathonMan, your posts are sooooooo good to read! :D Thanks for keeping us updated and for the great work!

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: Short burst of progress...

Post by iwasaperson » Sat Oct 25, 2014 10:11 pm

MarathonMan wrote:I found a bug in uncached reads and writes, though I don't think it'll fix the input bug. I have a inkling about what that 'cache' bug may be, though it's not a cache bug per se. It's more of an issue with timing or a race condition.

I'm using MAME/angrylion RDP as from before, RSP is being rewritten from the ground up. I actually already finished the RSP scalar unit, CP0 & DMAs, etc. The only thing that's left are vector reads and writes (LWC2/SWC2) and vector operations (of which both I have started). In the last release, the RSP was hampering performance and this rewrite is going to drastically accelerate things if everything falls into place.

On Linux, with VR4300 + a partially-implemented RSP, CEN64 is only 64KiB. :)

Hopefully after I get the RSP working with the new core, I will have time to look at audio. :D
If Angrylion's RDP is pixel-accurate, why doesn't it have AA like my real N64? Very noticeable in the intro to OoT on the hills. Is that just a side-effect of composite video, or is there something missing? Can't wait for audio, btw.

EDIT: This does AA too, and it's also based on Angrylion's plugin: http://forum.pj64-emu.com/showthread.php?t=4422

EDIT2: Comparison: https://imgur.com/a/6o8Ra

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Sun Oct 26, 2014 12:09 am

iwasaperson wrote:If Angrylion's RDP is pixel-accurate, why doesn't it have AA like my real N64? Very noticeable in the intro to OoT on the hills. Is that just a side-effect of composite video, or is there something missing?
I stripped out the VI filters because there was a lot of Windows-specific code that I didn't understand. What you're seeing is literally the raw, unfiltered, un-antialiased output from the framebuffer, which is why you see all the dithering artifacts and whatnot. Ordinarily, all of that would get post-processed before being output to the TV, but that isn't emulated yet.

EDIT: Actually I lied. What you see is almost the raw output. CEN64 emulates the VI scaling in a non-accurate way. But other than the scaling, the image is unmodified from what lies in the framebuffer.

User avatar
iwasaperson
Posts: 49
Joined: Tue Apr 22, 2014 12:50 am

Re: Short burst of progress...

Post by iwasaperson » Sun Oct 26, 2014 8:15 pm

MarathonMan wrote:
iwasaperson wrote:If Angrylion's RDP is pixel-accurate, why doesn't it have AA like my real N64? Very noticeable in the intro to OoT on the hills. Is that just a side-effect of composite video, or is there something missing?
I stripped out the VI filters because there was a lot of Windows-specific code that I didn't understand. What you're seeing is literally the raw, unfiltered, un-antialiased output from the framebuffer, which is why you see all the dithering artifacts and whatnot. Ordinarily, all of that would get post-processed before being output to the TV, but that isn't emulated yet.

EDIT: Actually I lied. What you see is almost the raw output. CEN64 emulates the VI scaling in a non-accurate way. But other than the scaling, the image is unmodified from what lies in the framebuffer.
Didn't know. All in due time, I guess.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Tue Oct 28, 2014 9:25 am

MarathonMan: You are awesome!
MarathonMan wrote:I stripped out the VI filters because there was a lot of Windows-specific code that I didn't understand. What you're seeing is literally the raw, unfiltered, un-antialiased output from the framebuffer, which is why you see all the dithering artifacts and whatnot. Ordinarily, all of that would get post-processed before being output to the TV, but that isn't emulated yet.
Oh yes please leave the pixel perfect. We will add shaders later.

@iwasaperson

What you see IS anti aliased, but an anti alias from N64 (which does not always affect the whole image like current GPUs). It's noticeable on the top left part (the edge left to the particle):

Image

The other picture is just a blurred one of the first one.
MarathonMan wrote:EDIT: Actually I lied. What you see is almost the raw output. CEN64 emulates the VI scaling in a non-accurate way. But other than the scaling, the image is unmodified from what lies in the framebuffer.
And it should stay like this. Only GPU shader would post effect the output IMHO.

Keep the good work!

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Tue Oct 28, 2014 10:49 am

Narann wrote:Only GPU shader would post effect the output IMHO.
Yes, that's definitely what I hope to do some day. I have never written a shader before, but it doesn't look too daunting to write shader code.

Unfortunately, there is a lot of branching in angrylion's VI rendering code:
https://code.google.com/p/angrylions-st ... o.cpp#1369

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Tue Oct 28, 2014 2:51 pm

MarathonMan wrote:Yes, that's definitely what I hope to do some day. I have never written a shader before, but it doesn't look too daunting to write shader code.
lol you wrote a cycle accurate emulator. You should not be scared by a tiny GLSL shaders. :D
MarathonMan wrote:Unfortunately, there is a lot of branching in angrylion's VI rendering code:
https://code.google.com/p/angrylions-st ... o.cpp#1369
I don't get the "Unfortunately" and the relation between GLSL and angrylion's code.

1) Angry's code is intent to reproduce RDP behavior right? Nothing to do with post process effect.
2) I guess if there is all this branching is that RDP is a very complicate state machine that has no clear hardware spec (am I right?)

For the GLSL stuff: Cen64 will render a 320x240 frambuffer on the RAM (not VRAM). This will be uploaded to the GPU (using glBufferSubData I guess) and a pixel GLSL shader will be used to apply scaling and and post effects on the fly.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Tue Oct 28, 2014 5:28 pm

Narann wrote:I don't get the "Unfortunately" and the relation between GLSL and angrylion's code.

1) Angry's code is intent to reproduce RDP behavior right? Nothing to do with post process effect.
2) I guess if there is all this branching is that RDP is a very complicate state machine that has no clear hardware spec (am I right?)
1) angrylion wrote a RDP/VI filter all-in-one plugin. His plugin's VI portion mimics the console's post-processing effects (that "muddy" image look that a lot of people hate).

2) RDP won't be doable in GLSL. I'm assuming you mean VI -- that's what I linked. Branching in GLSL code quickly results in huge performance loss, no? His algorithms, while very much correct, are a lot of if/else chains. Not saying they can't be folded, as I haven't looked, but folding something that complex is definitely hard.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Tue Oct 28, 2014 6:33 pm

We don't get ourself. ^^'

1) "angrylion wrote a RDP/VI filter all-in-one plugin. His plugin's VI portion mimics the console's post-processing effects" What do you mean by post-processing effects? Dirthering (Is this not part of RDP actually? VI is only for scaling and sync right?)? If we write a CRT GLSL shader, this effect will became very nice actually and would reproduce perfectly what a N64 would output on a true CRT TV (Which is the point no?).
2) "RDP won't be doable in GLSL": Of course it wont! That's why I suggest to put the GLSL stuff outside the "N64 result".
3) "I'm assuming you mean VI": Not not necessary. VI could (should?) be software (it's only scaling and sync right?). The only thing a GLSL shader should take is a flat bitmap. I see GLSL stuff mainly to reproduce CRT effect. Not N64 emulation IMHO.
4) "Branching in GLSL code quickly results in huge performance loss, no?": Yes, that's why I would not suggest to do RDP or any VI filter in GLSL (Without even stating that you can't have a true N64 output using GLSL because of all the various effect RDP can do GLSL can't like thin Z depth and anti aliasing (aka coverage) control). In my Rice RDP rewrite, the branching is actually a list of compiled shaders ("programs" in the GLSL terminology) linked to a set of "RDP options". I compile shader each time I encounter a new set of RDP options. But this is definitely not a good approach. I do this because I have no other choices

TL;DR: Be "N64 perfect" as far as you can only using CPU (because only CPU can truly reproduce RDP behavior), output the perfect FrameBuffer the N64 would give to the Video Converter (the one that totally break the image and that HDMI hack try to skip). This FrameBuffer should be 320x240 or 640x240 depending on VI registry (I don't remember exactly) and use GLSL shader only for scale up the stuff and reproduce the subtle result a true CRT TV would do. So GLSL is only a "CRT emulator".

Not sure if I'm clear. Don't hesitate to tell me if I'm saying bullshit but trust me: You can't reproduce the RDP behaviour with any GPU shading language (GLSL, HLSL, CG). Only a CPU RDP emulator can produce a true N64 result. :)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Tue Oct 28, 2014 8:49 pm

1) I'm far from experienced on the RDP/VI portion of things, but as far as I understand it the RDP only puts the dithering noise in. The VI then 'blurs' the dithering noise to create a smooth gradient.

For a fact, I do know that the VI does far more than scale and display the image. There is definitely lots of filtering that it does to the image. There's at least a divot filter and second AA pass of some sort and brightness adjustment that is done to the image by the console FWIW. Concerning the brightness adjustment, this is absolutely necessary (I think even you will agree if you look at Star Fox 64 for example, which is very 'dark' especially at the main menu).

2) Glad we agree. :D

3) I was hoping that the above mentioned divot filter, AA pass, and brightness adjustment, ??, etc. could be done in GLSL in addition to the CRT filter. The CRT filter can certainly be done in GLSL, but I was hoping the other stuff could be offloaded as well. If not, it's likely that I'll try to offload it to another processor core or something since it's not really possible to emulate scanlines in an time-accurate manner on LCDs anyways AFAIK.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Tue Oct 28, 2014 11:21 pm

1) I wasn't aware about VI blur the dithering. But you are right! And yes, gamma is part of the equation!

Code: Select all

        [1:0]   pixel_size
                  0: blank (no data, no sync)
                  1: reserved
                  2: 5/5/5/3 ("16" bit - really 18 bit)
                  3: 8/8/8/8 (32 bit)
        [2]     gamma_dither_enable (normally on, unless "special effect")
        [3]     gamma_enable (normally on, unless MPEG/JPEG)
        [4]     divot_enable (normally on if antialiased, unless decal lines)
        [5]     vbus_clock_enable (off always)
        [6]     serrate (always on if interlaced, off if not)
        [7]     test_mode (for diagnostics, not used in normal operation)
        [9:8]   aa_mode[1:0] (anti-alias mode)
          0: aa & resamp (always fetch extra lines)
          1: aa & resamp (fetch extra lines if needed)
          2: resamp only (treat as all fully covered)
          3: neither (replicate pixels, no interpolate)
        [11]    kill_we (for diagnostics, not used in normal operation)
        [15:12] pixel_advance (always 3 for optimal operation)
        [16]    dither_filter_enable (normally on for 16 bit, off for 32 bit)
I'm still wondering if an accurate CPU VI filter would not be simpler to write than a GLSL one.
3) I was hoping that the above mentioned divot filter, AA pass, and brightness adjustment, ??, etc. could be done in GLSL in addition to the CRT filter.
IMHO, the question is: Does GLSL shader would be as "accurate" than a CPU process? I mean, Cen64 is so close to what N64 Hardware deliver. I would prioritize an accurate solution over a performance compromise.

About all this VI filters/passes: Do we have any accurate doc about how they behave? I guess it's all analog no? If so it's only based on perception so AngryLion code is the only thing we have (despite spend times with electronic equipment...).
The CRT filter can certainly be done in GLSL, but I was hoping the other stuff could be offloaded as well. If not, it's likely that I'll try to offload it to another processor core or something since it's not really possible to emulate scanlines in an time-accurate manner on LCDs anyways AFAIK.
Let's forget the CRT stuff for now: It seems VI filter should be doable in GLSL (if we don't have any in depth infos about how it work). What they call VI AA is just resampling. I have no idea how VI dither filter where working. What seems interesting is that (once again: if we don't have any doc) VI filter should be doable in a "one line result" (no branch) GLSL shader or the way I did with Rice (shader compiled on the fly for each new set of VI options).

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Wed Oct 29, 2014 12:18 am

Narann wrote:IMHO, the question is: Does GLSL shader would be as "accurate" than a CPU process? I mean, Cen64 is so close to what N64 Hardware deliver. I would prioritize an accurate solution over a performance compromise.
Hm, this never seemed to bother me. I'm of the camp that as long as it works, and produces the correct result, doesn't matter how it gets computed. If I could speed up RSP with AVX (256-bit vector operations), I would... but I don't see how to, so I stick with SSE (128-bit vector operations), similar to what the RSP had.
Let's forget the CRT stuff for now: It seems VI filter should be doable in GLSL (if we don't have any in depth infos about how it work). What they call VI AA is just resampling. I have no idea how VI dither filter where working. What seems interesting is that (once again: if we don't have any doc) VI filter should be doable in a "one line result" (no branch) GLSL shader or the way I did with Rice (shader compiled on the fly for each new set of VI options).
This is promising. :)

User avatar
mckimiaklopa
Posts: 1
Joined: Wed Oct 29, 2014 6:37 am

Re: Short burst of progress...

Post by mckimiaklopa » Wed Oct 29, 2014 7:01 pm

Why not do both?
Add an optional cpu-based VI filter for accuracy(it is already working on anglion's plugin accurately right?) and an optional gpu-based VI shader for performance.Once the shader becomes as accurate as the cpu-based filter,the cpu one can now be safely removed.

A cycle accurate emulator would not be perfect without pixel perfect graphics in my opinion
The VI filter is a must for pixel perfect accuracy since it is a filter(adjust brightness,applies AA,applies a blur,blends the dithering etc.) applied by the n64 itself before being displayed on a tv(not really sure though).

Without the VI filter,games will look darker(starfox 64),the dithering will be much more obvious than it is supposed to and everything will be much more jaggy even if you apply bilinear filtering,a crt shader or even by just plugging the emulator to a crt.Adding filters like supereagle,FXAA,brz etc would just be cheating(though it will be fine if you add support for other filters and shaders anyway as long as the VI filter is properly emulated)

No VI filter= Not pixel perfect(but very close though)

But I am sure you already know the things I posted.So take your time in learning how to implement it and of course,the other parts of cen64 is more important than the graphics I suppose.

Keep up the good work

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Wed Oct 29, 2014 8:51 pm

mckimiaklopa wrote:Why not do both?
Time. :p

Development time for indie projects like this is really hard to come by for me anymore. So I like to plan things out as much as possible as to increase my efficiency when I do find the time to work on things.

I think I may have figured out how to push the frame data to another thread (along w/ the VI registers) so that would allow me to move some more stuff off the main path as well as provide another core's worth of computing power for fumbling with filters.
mckimiaklopa wrote:Keep up the good work
Thanks!

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Short burst of progress...

Post by Snowstorm64 » Thu Oct 30, 2014 12:20 pm

MarathonMan wrote: Time. :p

Development time for indie projects like this is really hard to come by for me anymore. So I like to plan things out as much as possible as to increase my efficiency when I do find the time to work on things.

I think I may have figured out how to push the frame data to another thread (along w/ the VI registers) so that would allow me to move some more stuff off the main path as well as provide another core's worth of computing power for fumbling with filters.
Well, the most important things have already been done, right? Currently CEN64, along with old RSP and RDP, is working almost fine with many games. At this point, when the new RSP will be rolled out, those things won't still be implemented so far, I think:
  • CIC Chips other than 6102 (so: 6101, 6103, 6105, 6106)
  • Eeprom, SRAM, FlashRAM support
  • ControllerPak emulation
  • Audio emulation
  • RumblePak emulation
  • VI Filter
They don't look a hard task to do (except VI Filter, and maybe audio), right? I hope so...There are also the VRU, Real-Time Clock Access, TransferPak, etc...But I doubt those things will be ever implemented at all. However, I think we're close from having a cycle-accurate N64 emulator that is capable to run almost all games fine. :D
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Sat Nov 01, 2014 10:07 pm

Snowstorm64 wrote:Well, the most important things have already been done, right? Currently CEN64, along with old RSP and RDP, is working almost fine with many games. At this point, when the new RSP will be rolled out, those things won't still be implemented so far, I think:

...

They don't look a hard task to do (except VI Filter, and maybe audio), right? I hope so...There are also the VRU, Real-Time Clock Access, TransferPak, etc...But I doubt those things will be ever implemented at all. However, I think we're close from having a cycle-accurate N64 emulator that is capable to run almost all games fine. :D
I guess, in some ways. There's still a major bug or two lying around, too.

The poor-ish performance is still bugging me, though. I'm constantly exploring new avenues of optimization, which is another major time sink.

As an example, as of now, the Unix build of CEN64 is now multi-threaded (and the Windows build is broken :P). All the GL operations are done in a separate thread, but even a (rather large) feat like this yields only a 5-10% performance boost for most things that I measured.

EDIT: I haven't rebased on the cen64-backport build in forever, either... oops. At this point I should probably just blow out the old one.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Sat Nov 01, 2014 10:23 pm

MarathonMan wrote:The poor-ish performance is still bugging me, though. I'm constantly exploring new avenues of optimization, which is another major time sink.
Is dynamic recompilation an option?
MarathonMan wrote:As an example, as of now, the Unix build of CEN64 is now multi-threaded (and the Windows build is broken :P). All the GL operations are done in a separate thread, but even a (rather large) feat like this yields only a 5-10% performance boost for most things that I measured.
Interesting! I guess you don't create a new thread at each frame? Do you think there is any internal part of N64 hardware that could run in two separate threads? If so, what would be the benefits?

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Sat Nov 01, 2014 10:34 pm

Narann wrote:Is dynamic recompilation an option?
Unfortunately, no. At least, it hasn't been successful when I've tried it. Each simulated instruction has the potential to stall, raise exceptions, etc. and you'd just end up polluting the dynarec buffers with swaths of inlined code (or branches) when performing all the necessary checks to remain accurate. You also have to cycle around each core/controller in the system in round-robin fashion to be cycle-accurate, whereas in HLE dynarec you can just execute one context for hundreds/thousands of instructions at a time.
MarathonMan wrote:Interesting! I guess you don't create a new thread at each frame? Do you think there is any internal part of N64 hardware that could run in two separate threads? If so, what would be the benefits?
Nope; the console just calls os_render_frame, which just copies the frame to-be-rendered to a buffer and signals the events/rendering thread. I was thinking about deferring the VI filtering to the second rendering thread as well, but I'm not even sure how something seemingly promising like that would work if the VI registers are updated halfway through a frame.

TBH, the only reason I could accurately get around using another thread to render frames is because CEN64 doesn't have to worry about scanlines. If I had to emulate scanlines, using a separate rendering thread might have negatively affected performance due to all the locking to maintain synchronization.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Short burst of progress...

Post by Snowstorm64 » Sun Nov 02, 2014 9:30 am

MarathonMan wrote: I guess, in some ways. There's still a major bug or two lying around, too.

The poor-ish performance is still bugging me, though. I'm constantly exploring new avenues of optimization, which is another major time sink.

As an example, as of now, the Unix build of CEN64 is now multi-threaded (and the Windows build is broken :P). All the GL operations are done in a separate thread, but even a (rather large) feat like this yields only a 5-10% performance boost for most things that I measured.

EDIT: I haven't rebased on the cen64-backport build in forever, either... oops. At this point I should probably just blow out the old one.
Well, that's still huge. Even without the new RSP and the multithread, this version, when running Namco Museum, has already gained ~4 VI/s compared to cen64-backport, bringing the average to ~54 VI/s! :) I cannot wait for dat MK64 at 60 VI/s. :D
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Sun Nov 02, 2014 4:34 pm

Thanks for the infos MarathonMan.
MarathonMan wrote:Nope; the console just calls os_render_frame, which just copies the frame to-be-rendered to a buffer and signals the events/rendering thread. I was thinking about deferring the VI filtering to the second rendering thread as well, but I'm not even sure how something seemingly promising like that would work if the VI registers are updated halfway through a frame.
After some checks I realize the last CPU step is to transfer FB from memory to the VI. The VI then transfer this "digital datas" to the Video DAC that convert the given digital datas to analog signal for TV. This mean VI is still a fully digital process (I was thinking it was an analog process). So you get a point: VI registers could be updated during the VI process. But a pragmatic question here:
- If so what would be the impact on the final image? I don't know the VI rendering pattern but you would have a weird image with half of it generated with some VI options and others generated with other VI options.I guess modify VI registers during VI process is more a side effect than a true motivated rendering feature. I guess in real games, VI registers are not modified in game but at some special step (entering menus, entering loading, game start, etc...). I just suppose, I have never debug VI use on real N64 myself.
- Is there a way to get the VI output back for CPU/RDP modifications. I guess no. If we have the guaranty this output datas will never been reuse in any way (except horrible Video DAC we don't want) I wonder if there is any impact on the "cycle accurate" stuff. I really wonder what would be the point to emulate the "change the VI registers during renderer".
- Does the VI send any digital event once it terminate it transfers to the Video DAC? What are the purpose of such event? Are they uploaded to the CPU loop? (does the games known about "VI to DAC is over"? I guess no one (neither os or game) is aware of the relation between VI and DAC (but this have to be confirmed).

TL;DR: What would be the side effect of changing the VI registers during VI processing on real hardware? If it's just screen blink during two changing game states without any internal modifications (except the registers themself of course) we could consider to do not emulate the "change VI registers during VI processing" (so our "VI processing" would not have such "atomic" behavior) and have a monolithic VI filter emulation in another thread.
TBH, the only reason I could accurately get around using another thread to render frames is because CEN64 doesn't have to worry about scanlines. If I had to emulate scanlines, using a separate rendering thread might have negatively affected performance due to all the locking to maintain synchronization.
Why would CEN64 have to emulate scanline? You mean "emulate Video DAC" (which is part of N64 hardware)? lol if there is no reason to keep this part leave it out, we really don't want it. :D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Mon Nov 03, 2014 9:47 am

Narann wrote:- If so what would be the impact on the final image? I don't know the VI rendering pattern but you would have a weird image with half of it generated with some VI options and others generated with other VI options.I guess modify VI registers during VI process is more a side effect than a true motivated rendering feature. I guess in real games, VI registers are not modified in game but at some special step (entering menus, entering loading, game start, etc...). I just suppose, I have never debug VI use on real N64 myself.
Yep, likely just a side effect. But a side-effect that I want to emulate. ;) One of CEN64's purposes is to serve as a development tool. If a developer accidentally does something like update a VI register midway through a frame and it has an impact, CEN64 should recreate that effect rather than mask it and leave the user surprised when he/she tries it an actual console.
Narann wrote:- Is there a way to get the VI output back for CPU/RDP modifications. I guess no. If we have the guaranty this output datas will never been reuse in any way (except horrible Video DAC we don't want) I wonder if there is any impact on the "cycle accurate" stuff. I really wonder what would be the point to emulate the "change the VI registers during renderer".
I kinda have another "see if we can cheat" approach like this in another part of CEN64 that works pretty well. It's worth a shot.
Narann wrote:- Does the VI send any digital event once it terminate it transfers to the Video DAC? What are the purpose of such event? Are they uploaded to the CPU loop? (does the games known about "VI to DAC is over"? I guess no one (neither os or game) is aware of the relation between VI and DAC (but this have to be confirmed).
I only understand the VI on a very primitive level. :(
Narann wrote:TL;DR: What would be the side effect of changing the VI registers during VI processing on real hardware? If it's just screen blink during two changing game states without any internal modifications (except the registers themself of course) we could consider to do not emulate the "change VI registers during VI processing" (so our "VI processing" would not have such "atomic" behavior) and have a monolithic VI filter emulation in another thread.
This is also an approach that I might take. I dislike it only because it feels like, "FU -- if you don't play by my inaccurate rules then you must suffer the toll of horrible performance!"
Narann wrote:
TBH, the only reason I could accurately get around using another thread to render frames is because CEN64 doesn't have to worry about scanlines. If I had to emulate scanlines, using a separate rendering thread might have negatively affected performance due to all the locking to maintain synchronization.
Why would CEN64 have to emulate scanline? You mean "emulate Video DAC" (which is part of N64 hardware)? lol if there is no reason to keep this part leave it out, we really don't want it. :D
Yes, that's what I meant. In order to accurate emulate a DAC, you would need to be able to drive the output signal, which a virtually impossible task even on a realtime system. So CEN64 just shovels the frame into the video card at the end of a frame, like any sane emulator.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Mon Nov 03, 2014 1:16 pm

Yep, likely just a side effect. But a side-effect that I want to emulate. ;) One of CEN64's purposes is to serve as a development tool. If a developer accidentally does something like update a VI register midway through a frame and it has an impact, CEN64 should recreate that effect rather than mask it and leave the user surprised when he/she tries it an actual console.
This make sense. This also mean: You need an atomic CPU VI filter. :D
I was wondering: Does modify the VI registers during process generate a concurrent access? (and so hardware crash?). I guess VI process (like any other) is threaded but I have no idea how N64 deal with register access concurency.
But having a cycle accurate VI filter (lol this is insane) does not solve all. Have we any idea of how N64 render it's pixels? I mean, the pixel's order VI use to process. Simple scanline?
This is also an approach that I might take. I dislike it only because it feels like, "FU -- if you don't play by my inaccurate rules then you must suffer the toll of horrible performance!"
I would love Cen64 having an accurate VI filter, but from my perspective and after some digging into how N64 VI filter work and you would realize you will defenetely have bad performances.

And we even don't talk about RDP lol.

Anyway, cycle accurate VI filter is possible but would mean to emulate electronic circuits. Do we really want that?

User avatar
gamax92
Posts: 22
Joined: Mon Oct 28, 2013 2:07 pm

Re: Short burst of progress...

Post by gamax92 » Mon Nov 03, 2014 8:29 pm

Why don't we just do what Visual 6502 has done, but for every N64 component, then it'd be super accurate!

oh, because of how horrible the lag would be ...

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: Short burst of progress...

Post by Nintendo Maniac 64 » Mon Nov 03, 2014 11:07 pm

MarathonMan wrote:it's not really possible to emulate scanlines in an time-accurate manner on LCDs anyways AFAIK.
No love for OLED? While it's just recently become available and isn't quite fully baked for general PC use*, we shouldn't rule things out just because the incumbant technology can't handle it.



*the only OLED options currently are TVs with poor input lag and only 60hz input, reference monitors with crazy price tags, or head-mounted displays that aren't designed for 2D images.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Tue Nov 04, 2014 4:57 pm

Narann wrote:Does modify the VI registers during process generate a concurrent access? (and so hardware crash?). I guess VI process (like any other) is threaded but I have no idea how N64 deal with register access concurency.
No, that would be an enormous HW bug. The registers basically control a state machine that's constantly DMA-ing small blocks from RDRAM. More likely than not, the registers either update midway through the current frame, or the next frame (if SGI was kind enough to buffer the writes until the next frame).

Two big optimizations coming (unfortunately, because of intrinsic limitations, these heavily/totally favor GCC):
  • RSP accumulator registers (HI,MD,LO) are now cached in xmm13-15 on x86_64 and 5-7 on IA32. Instead of spilling these registers out to memory and pulling them from memory for multiplies and other operations, CEN64 now just holds a few of the processor's (unused) vector registers hostage throughout execution and reserves them for this purpose. Helps multiply-heavy (triangle-heavy scenes) a good deal by lowering cache traffic and instruction counts. This is essentially statically-allocated register-caching and was dynarec inspired.
  • Hot/cold section and code size optimizations. A lot of CEN64 functions are now attributed either 'hot' or 'cold'. More notably, 'cold' functions are compiled using the fewest number of instructions possible and put away with other cold functions in a rarely-used section of the binary. Reset exception handlers, initialization code (create a GL window, other component initialization code) is all shoved into this cold section. This allowed me to both bump optimization to -O3 (all targets) and -maccumulate-outgoing-args (IA32/x86_64) optimizations.
Namco Museum is now sitting at a pretty 60VI/s on a stock i7-4770. I only have one or two ROMs that boot and are unable to maintain 60VI/s on this machine... so RSP, here we come. :D

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Short burst of progress...

Post by Narann » Tue Nov 04, 2014 5:28 pm

MarathonMan wrote:No, that would be an enormous HW bug. The registers basically control a state machine that's constantly DMA-ing small blocks from RDRAM. More likely than not, the registers either update midway through the current frame, or the next frame (if SGI was kind enough to buffer the writes until the next frame).
This was my through, thanks for the info! I guess it's not trivial to create a such behavior on CPU (emulate HW 64 threading lol).
MarathonMan wrote:Namco Museum is now sitting at a pretty 60VI/s on a stock i7-4770. I only have one or two ROMs that boot and are unable to maintain 60VI/s on this machine... so RSP, here we come. :D
Good to read this! :D

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Short burst of progress...

Post by Snowstorm64 » Tue Nov 04, 2014 6:12 pm

MarathonMan wrote: Two big optimizations coming (unfortunately, because of intrinsic limitations, these heavily/totally favor GCC):
Not an ideal situation, but since it's an open source compiler that is available on every platform, it shouldn't be a problem. :)
MarathonMan wrote: Namco Museum is now sitting at a pretty 60VI/s on a stock i7-4770. I only have one or two ROMs that boot and are unable to maintain 60VI/s on this machine... so RSP, here we come. :D
Awesome! Have you ever tried Dig-Dug? It's less demanding, resources-wise, than others in Namco Museum 64, I think it could even overspeed here...I hope that a VI limiter will be implemented after the new RSP has been rolled out. :D
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Short burst of progress...

Post by MarathonMan » Wed Nov 05, 2014 10:30 am

Snowstorm64 wrote:Awesome! Have you ever tried Dig-Dug? It's less demanding, resources-wise, than others in Namco Museum 64, I think it could even overspeed here...I hope that a VI limiter will be implemented after the new RSP has been rolled out. :D
I've just been keeping V-sync enabled on my graphics card (monitor refreshes at 60Hz). If you do this, a limiter isn't necessary as the graphics component will take care of it for you.

But yes, a limiter is definitely possible in the future, especially with the incorporation of the render/events thread.

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: Short burst of progress...

Post by Nintendo Maniac 64 » Wed Nov 05, 2014 5:23 pm

MarathonMan wrote:But yes, a limiter is definitely possible in the future, especially with the incorporation of the render/events thread.
Also vsync increases input lag which is already going to be increased vs the original console.

Then there's things like gsync and adaptive sync, but I don't think Cen64 supports fullscreen yet anyway...
Last edited by Nintendo Maniac 64 on Sun Nov 09, 2014 11:22 pm, edited 1 time in total.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: Short burst of progress...

Post by beannaich » Thu Nov 06, 2014 8:49 pm

Nintendo Maniac 64 wrote:
MarathonMan wrote:But yes, a limiter is definitely possible in the future, especially with the incorporation of the render/events thread.
Also vsync increases input lag which is already going to be increased vs the original console.

Then there's things like gsync and adaptive sync, but I don't think Cen64 supports fullscreen yes anyway...
Cool story.

Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests