New core/version 0.3

News from administrators.
krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Tue Aug 05, 2014 2:13 pm

MarathonMan wrote:It's all timing differences:

cen64-backport:

Code: Select all

 18 // Currently used a fixed value...
 19 #define MEMORY_CODE_CYCLE_DELAY 50
 20 #define MEMORY_DATA_CYCLE_DELAY 0
 21 #define ICACHE_ACCESS_DELAY 50
This reminds me of the bus timing differences in SNES emulation, that is needed to get lots of cycle timing intensive SNES games running correctly...
Would it be possible to brute force / tweak these numbers, to get the demo mgc_2011.z64 to run without the famed "FAIL." message on the current cen64-backport code?
Or will this system be super-seeded by a more accurate per instruction type cycle timing / delay?

If it is possible to tweak these numbers to get perfect N64 cycle timing, and get cycle timing intensive roms to correctly run, I would be happy to spend a good deal of time to try and find the correct numbers =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Tue Aug 05, 2014 5:07 pm

krom wrote:
MarathonMan wrote:It's all timing differences:

cen64-backport:

Code: Select all

 18 // Currently used a fixed value...
 19 #define MEMORY_CODE_CYCLE_DELAY 50
 20 #define MEMORY_DATA_CYCLE_DELAY 0
 21 #define ICACHE_ACCESS_DELAY 50
This reminds me of the bus timing differences in SNES emulation, that is needed to get lots of cycle timing intensive SNES games running correctly...
Would it be possible to brute force / tweak these numbers, to get the demo mgc_2011.z64 to run without the famed "FAIL." message on the current cen64-backport code?
Or will this system be super-seeded by a more accurate per instruction type cycle timing / delay?

If it is possible to tweak these numbers to get perfect N64 cycle timing, and get cycle timing intensive roms to correctly run, I would be happy to spend a good deal of time to try and find the correct numbers =D
Evetually I'd like to have CEN64 model everything -- the interfaces and CPUs to their fullest -- including the RAC. Once the RAC and RDRAM ICs are simulated properly, those constant/fixed delays can be removed.

Until then, yeah, it should be possible to fiddle with those values to get things booting for individual ROMs. Hopefully, there's one setting that gets something close enough for all ROMs for now, but who knows if that exists or not.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Tue Aug 05, 2014 5:16 pm

MarathonMan wrote:Evetually I'd like to have CEN64 model everything -- the interfaces and CPUs to their fullest -- including the RAC. Once the RAC and RDRAM ICs are simulated properly, those constant/fixed delays can be removed.
Sounds great to me, cheers for the good explanation of things to come =D
MarathonMan wrote:Until then, yeah, it should be possible to fiddle with those values to get things booting for individual ROMs. Hopefully, there's one setting that gets something close enough for all ROMs for now, but who knows if that exists or not.
Cool, I might try to code up some demos to test instruction timings to try to zone in on the correct numbers, will be interesting to see if we can find one setting that correctly boot everything =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Tue Aug 05, 2014 5:33 pm

krom wrote:Cool, I might try to code up some demos to test instruction timings to try to zone in on the correct numbers, will be interesting to see if we can find one setting that correctly boot everything =D
I already did something that just fills and invalidates instruction cache and data cache lines on the VR4300... see here: http://cen64.com/viewtopic.php?f=15&t=35#p379

Instruction cache: 52 pclocks/line (@ 93.75 million pclocks/sec = 55.02 MiB/s)
Data cache: 48 pclocks/line (@ 93.75 million pclocks/sec = 25.01 MiB/s)

were my findings. Let's see if you get something similar! :D

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Tue Aug 05, 2014 5:47 pm

MarathonMan wrote:I already did something that just fills and invalidates instruction cache and data cache lines on the VR4300...
Heh cool, cheers for showing me a link to your findings, I will try to check it all out =D

I was thinking of doing a lame test that just counts how many CPU ADD operations a real N64 can compute (within a single VI frames timing), and display the count followed by a "PASS" or "FAIL" depending on the count matching the known correct figure.
Then I will on the same screen pass or fail every other CPU opcode... I could then do a similar test for the CP1 opcodes =D

So I do not know if I will get exact figures like yours for the pclocks/sec, but I should hopefully be able to check if individual instructions are computing in their correct individual cycle timings =D

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Tue Aug 05, 2014 6:41 pm

Then why shouldn't we do a thing like this:
Image
(The gif shows the so-called Laplace orbital resonance)

Don't take me for a fool but...IMHO it's the perfect example that could provide us a temporary fix for the issue we are discussing). If we want to obtain a beautifully, harmonious system, we should make a ratio instruction cache/data cache like 52:48, 26:24 or even better 13:12.

So if we do this:

Code: Select all

#define MEMORY_CODE_CYCLE_DELAY 26
#define MEMORY_DATA_CYCLE_DELAY 24
#define ICACHE_ACCESS_DELAY 26
(Forgive me, but I don't know a lot about these values :P)

The result? We don't have timing issues either for the Fire Demo by LaC (PD) and Soncrap Intro by RedboX (PD), although at the cost of slower performance.

I said 26:24 because I have tried 13:12 before, and the result was that Fire Demo didn't have issues, Soncrap Intro was partially fixed (the text in the half top still flicker).
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Tue Aug 05, 2014 7:02 pm

Snowstorm64 wrote:So if we do this:

Code: Select all

#define MEMORY_CODE_CYCLE_DELAY 26
#define MEMORY_DATA_CYCLE_DELAY 24
#define ICACHE_ACCESS_DELAY 26
You will want to use values like this... once I implement the DCACHE. :P Then, it will run at the "right" speed.

Without the DCACHE, every miss will incur a 25 cycle penalty (yes, 25... not 24... I need to fix that!). This is far too much as there are a lot of reads and writes to the stack, which is going to get cached for most accesses. The result is an incredibly slow framerate.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Wed Aug 06, 2014 6:41 am

I have made 2 CPU & CP1 instruction timing demos:
https://github.com/PeterLemon/N64/tree/ ... TIMINGNTSC
https://github.com/PeterLemon/N64/tree/ ... TIMINGNTSC

Here are some notes about the demos:
All the tests were timed on NTSC N64 hardware, & I am sure the figures would be different for a PAL N64 System.
I always wait for the N64 Vertical Counter to be zero for the start of each test, then while the test is running I wait for the Vertical Counter to reach #512 before printing each result.
As there is a branch followed with a counting addi instruction for every loop of the test, this affects the overall final count of each test to be lower than one would expect.

The real N64 hardware can fluctuate in some of the timings of instructions, so I have chosen the exact numbers that came up the most on each run, to try and get the most correct figures.
The largest fluctuation seems to only be +1 to -1 on CPU, & +2 to -2 on CP1, & repeated running of the demos on hardware will always pass the test for any given instruction on one of those runs.
(I only get between 1 to 4 failed instructions on the CPU test when running on real hardware on any single test run).

The figures output from the N64 hardware are quite different from the output of the current cen64, which fails all of the CPU/CP1 timing tests.
I think we need to account for the CPU division & multiply instructions, how they write to the 2 HI & LO registers, to maybe account for their wild differences from the normal instructions timings.
Because right now they take a similar time to compute as the faster normal instructions in cen64.

Also maybe the Vertical counter emulation could be off a little, which could affect the results.

We must also make sure we are counting cycles of branch & delay slot instructions correctly as this would effect the results too.

P.S If you want I can AND out the lowest 4-bits (nibble) of each tests count to always "PASS" correctly on all CPU/CP1 instructions every time on real hardware... But this would be only accurate to within 15 instruction counts.

Hope this helps =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Wed Aug 06, 2014 9:27 am

Thank you, krom! Those tests are going to be very difficult to pass because most of the latency is coming from loading the VI # each pass of the loop! :P No promises on these ones for awhile anyways. ;)

To correctly pass this test will require precise timing of the bus arbitration logic, RAC/RDRAM, and VI state machine, I would think...

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Wed Aug 06, 2014 11:42 pm

Heh no probs, I wanted to make tests like this for ages so I have got them out the way now so I can work on more interesting stuff =D
It will be amazing when cen64 gets to the level of accuracy to start passing these tests.

P.S I was surprised by some of the results e.g the square roots on the CP1 test performs quite fast and is comparable in speed to the more simple instructions like addition!
(In my experience, square root usually performs as one of the slowest FPU instructions on many CPU's)

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Thu Aug 07, 2014 9:02 am

krom wrote:P.S I was surprised by some of the results e.g the square roots on the CP1 test performs quite fast and is comparable in speed to the more simple instructions like addition!
(In my experience, square root usually performs as one of the slowest FPU instructions on many CPU's)
A hardware lookup table? O_O

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Aug 07, 2014 1:23 pm

krom wrote:P.S I was surprised by some of the results e.g the square roots on the CP1 test performs quite fast and is comparable in speed to the more simple instructions like addition!
(In my experience, square root usually performs as one of the slowest FPU instructions on many CPU's)
This is because of all the hard-to-emulate-things going on. Each time you read from the VI MMIO address space, there's a lot of things that happen over many cycles as a result:
  • Request makes it's way through the MIPS interface.
  • Wait until it's access is arbitrated by the bus.
  • The VI state machine to form a response.
  • Wait until the VI response is arbitrated by the bus.
  • The response to make it back through the MIPS interface.
Oh yeah, and then after all that either a LDI interlocking fault is present for a cycle, unless the LW is followed with a NOP (and even then, you're still out an extra cycle). Everything that has to occur in order for a simple read of a VI register means two things: there is a lot of legwork that CEN64 has to do to simulate things properly, and there's a big fat interlock in the VR4300 that's going to block the pipeline while it all occurs.

So, why you are seeing such a small difference in total instructions executed/scanline (?) is because the CPU doesn't spend a majority of it's time computing the instructions -- it spends it doing VI accesses -- and thus, it all gets amortized. If you execute SQRT instructions in a very tight, cached loop you will see that a SQRT instruction takes about 29 cycles for floats and 58 for doubles (compared to ~3 for both floats and doubles when doing additions).

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Aug 07, 2014 2:02 pm

Ah cool, cheers for the great explanation of the actual flow of the N64 state machine & bus =D
Your figures for the very tight cached loop make lots more sense.
Also thanks for sharing the actual ~cycle counts as I now have a much better idea of the speeds of the float & double instructions now!

I'll try to make much better tests in the future to try to show these sorts of results.
I'll possibly use interrupt timing & blocks of the same instruction with no branching (if that would suffice)...

@Narann Cheers for the link about doing square roots using a lookup table, I had heard of the famous "magic number" (Quake algorithm) style before, but not the lookup table way!

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Aug 07, 2014 2:19 pm

krom wrote:I'll try to make much better tests in the future to try to show these sorts of results.
I'll possibly use interrupt timing & blocks of the same instruction with no branching (if that would suffice)...

@Narann Cheers for the link about doing square roots using a lookup table, I had heard of the famous "magic number" (Quake algorithm) style before, but not the lookup table way!
Not sure what you mean by interrupt timing (?) unless you mean on each VI interrupt. For the cycle counts and other latencies I usually unroll a loop with the instruction and sample from CP1 $9 (count) around the loop:

Code: Select all

uint32_t startTicks = getTicks(); // MFC r..., $9
for (i = 0; i < ITERATIONS; i++)
  __asm__ __volatile__("sqrt f0, f2, f4\n\t" ::: "f0");
uint32_t endTicks = getTicks(); // MFC r..., $9;

// Count increments at half the pclock, so double it.
uint32_t pcycles = (endTicks - startTicks) * 2;

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Aug 07, 2014 2:38 pm

MarathonMan wrote:Not sure what you mean by interrupt timing (?) unless you mean on each VI interrupt. For the cycle counts and other latencies I usually unroll a loop with the instruction and sample from CP1 $9 (count) around the loop
Yep I meant the VI interrupt as this would enable me to cut out the VI accesses from the old test, but your way sounds much better!

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: New core/version 0.3

Post by Nintendo Maniac 64 » Wed Aug 13, 2014 10:53 pm

Phew, I just read the last 20 pages all in one sitting.

--------
beannaich wrote: This sounds similar to the "rounding error" in the RSP audio sampling register, where you can never get a true 32 KHz sampling rate.
You use of "rounding error" makes it sound like using a true 32000Hz is actually a possibility, but wouldn't that result in emulation timing issues?

(although it would be useful for an N64 music player :P)

--------
MarathonMan wrote:If anyone has a Pentium 4, they'll probably be rubbing their hands together though. That, and maybe K8 users... ;)
Does the new core not require SSSE3 like the earlier cores did? Because both Netburst and K8 predate that instruction...

--------
Snowstorm64 wrote:Some demo like Plasma Demo have seen a huge boost performance, like 30 VI/s, making it run at 85 VI/s. That's a bit overkill! :lol:
Not for an overclocked N64. ;) Overclocked N64s are in fact preferred for 4-player Smash Bros. tournament matches:
http://www.ssbwiki.com/Tournament_legal_(SSB) wrote:Overclocked n64's are preferred as there is lag otherwise in doubles.
--------
MarathonMan wrote: I'll have to look into getting some kind of system setup that forces VSync or something...
Considering the likes of G-Sync and Adaptive-Sync, I'd highly recommend not using vsync but rather using a simple VI/s limiter.

Also a simple VI/s limiter would give the opportunity to manually set the limiter to speeds higher or lower than 60 VI/s - see the Dolphin emulator for example.

--------
MarathonMan wrote:The uop cache that I've been trying to optimize around is only present in SNB (i*-2xxx cores), IVB (i*-3xxx) and Haswell (i*-4xxx). If you don't have any of these, you probably want a binary that's a little more unrolled.
It would seem that AMD's Steamroller CPU architecture contains a similar feature as well:
http://www.anandtech.com/show/6201/amd-details-its-3rd-gen-steamroller-architecture/2 wrote: Although AMD doesn’t like to call it a cache, Steamroller now features a decoded micro-op queue. As x86 instructions are decoded into micro-ops, the address and decoded op are both stored in this queue. Should a fetch come in for an address that appears in the queue, Steamroller’s front end will power down the decode hardware and simply service the fetch request out of the micro-op queue. This is similar in nature to Sandy Bridge’s decoded uop cache, however it is likely smaller. AMD wasn’t willing to disclose how many micro-ops could fit in the queue, other than to say that it’s big enough to get a decent hit rate.
--------
Wouldn't these tests give different results on an overclocked N64?
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Aug 14, 2014 1:11 am

Nintendo Maniac 64 wrote:Wouldn't these tests give different results on an overclocked N64?
Yes they would give different results on an overclocked N64 (The number of executed instructions in each loop would rise as the clock rate increases)
Also you would see different results on a stock PAL N64 which has a slightly underclocked CPU compared to NTSC N64 hardware...
These tests were pretty crappy anyway for actual N64 instruction timing, I'll do some better tests using what I have learnt from MarathonMan.

I will only produce timing tests that pass on stock N64 units, as I do not posses a working overclocked N64.
I have already tried to overclock a PAL N64 unit, but it crashes after about 20 seconds of booting MarioKart64... NTSC N64's are much better for overclocking if anyone is reading this!

NTSC timing is easier for me to test as I bought a NTSC N64 flash cart.
Once I have written a better timing test for the stock NTSC N64 timing, I can then produce a PAL timing example, as I have a working Doctor64 setup for my PAL N64 units.

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Thu Aug 14, 2014 8:00 pm

Nintendo Maniac 64 wrote:You use of "rounding error" makes it sound like using a true 32000Hz is actually a possibility, but wouldn't that result in emulation timing issues?
I know it wasn't a proper use of rounding error, hence the quotation marks. But it does involve rounding, and precision problems, as I'll now painstakingly show:

Code: Select all

93,750,000 / 32,000 = 2,929.6875 <-- divider needed for true 32 KHz sample rate.

Since only integer dividers can be specified, let's try by rounding down to 2,929:

93,750,000 / 2,929 = ~32,007 Hz

Damn, that didn't quite work, let's try rounding up to 2,930:

93,750,000 / 2,930 = ~31,996 Hz

Closer, but still not a true 32 KHz.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Fri Aug 15, 2014 11:22 pm

I have completed my N64 RDP textured triangle tests:
https://github.com/PeterLemon/N64/tree/ ... reTriangle
https://github.com/PeterLemon/N64/tree/ ... reTriangle

They are visually identical to my textured rectangle demos, but are built up instead using 2 triangles in place of each rectangle.
Next I will working on shaded & Z-buffered triangle types =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Wed Aug 20, 2014 2:33 pm

I have added partial support for ARM (some ROMs run). FPU intrinsics are needed before the ARM support is on par with x86_64. The ARM port will have use NEON in the same way that SSS3 is used provide a means of "hardware acceleration" for the RSP functionality. In fact, I have already written some RSP functions using NEON intrinsics.

ARM currently uses GL as a backend for rendering, but I should have EGL worked in shortly.

Along with the ARM port comes full-blown IA-32 (x86, 32-bit) support. IA-32 will require SSE2 for the time being (and soon, SSSE3).

I've also begun rewriting the RSP pipeline in my spare time.
Nintendo Maniac 64 wrote:
MarathonMan wrote:If anyone has a Pentium 4, they'll probably be rubbing their hands together though. That, and maybe K8 users... ;)
Does the new core not require SSSE3 like the earlier cores did? Because both Netburst and K8 predate that instruction...[/quote[

Right now, everything requires SSE2 (which is part of the x86_64 ISA). The RSP intrinsics that I'm about to commit will use the SSSE3 intrinsics that the old code did to reduce the amount of potential variability in the switch to the new core. However, the way the new core is written makes it substantially easier (trivial, almost) to rerwite the very small pieces that are currently SSSE3-ized. Initial releases will still require SSSE3, though.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Aug 21, 2014 4:01 am

MarathonMan wrote:I have added partial support for ARM (some ROMs run). FPU intrinsics are needed before the ARM support is on par with x86_64. The ARM port will have use NEON in the same way that SSS3 is used provide a means of "hardware acceleration" for the RSP functionality. In fact, I have already written some RSP functions using NEON intrinsics.
Great work, it will be amazing to check out the code once it is finished.
It's great that you have taken the time to work out which intrinsics will best serve all the N64 hardware functions, on the various CPU's targeted by cen64.
MarathonMan wrote:I've also begun rewriting the RSP pipeline in my spare time.
It will be cool to see the code changes =D
Will the RSP rewrite increase the (already fast) speed of the RSP emulation in cen64?

I have been working on the ZBuffer portion of the RDP pipeline, I've made some test demos to show howto setup the ZBuffer, and draw Triangles & Rectangles using Per-Pixel & Per-Primitive ZBuffering:
https://github.com/PeterLemon/N64/tree/ ... gle320x240
Image

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Thu Aug 21, 2014 7:22 am

Cool! :o

Just a question: Why do you need to Sync_Pipe between each triangle? What will happen if you don't do that?

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Aug 21, 2014 9:49 am

Narann wrote:Just a question: Why do you need to Sync_Pipe between each triangle? What will happen if you don't do that?
Hi Narann, good question I'll try to explain...
The RDP fills triangles from top to bottom on the screen, & from left to right on each scanline...
If I remove every Sync_Pipe from that exact demo & run it on cen64 everything will look perfect because the RDP emulation always finishes each triangle before starting the next.
However if you run that same demo on real hardware, you will see the last ~32 pixels drawn in every triangle is the wrong color, as it loads in the next triangles "blend color" before it has finished drawing the current triangle.

I need to wait using the Sync_Pipe which stalls the pipeline, until preceding primitives completely finish, & I set the triangle blend color after this to make sure every triangle is correctly drawn.
I do something similar in my texture demos, where I need to "Sync_Tile" to make sure the texture has been fully read to correctly display each primitive.

Hope this helps =D

User avatar
teres
Posts: 19
Joined: Fri Apr 11, 2014 9:44 am

Re: New core/version 0.3

Post by teres » Thu Aug 21, 2014 11:41 am

MarathonMan wrote:I have added partial support for ARM (some ROMs run).
Nice, but I don't even want to know how many FPS (here: frames per semester) you're getting. (Moore's law FTW though.)

Btw, hardware acceleration is nice and all, but is there a "clean", arch agnostic code path too, or is that somewhere towards the bottom of the TODO list for now?

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Aug 21, 2014 12:01 pm

krom wrote:
MarathonMan wrote:I've also begun rewriting the RSP pipeline in my spare time.
It will be cool to see the code changes =D
Will the RSP rewrite increase the (already fast) speed of the RSP emulation in cen64?
I'm applying many of the same techniques that I used with the rewrite of the VR4300 (and proved to help), so I would hope so.

More importantly I've been going through the RSP with a fine toothed comb in the rewrite now that I have something that works. I think I have found a couple bugs that may be the source of the weird black lines that appeared in the overhead of Link's room in OoT and such... that's what I'm more interested in. :D
teres wrote:
MarathonMan wrote:I have added partial support for ARM (some ROMs run).
Nice, but I don't even want to know how many FPS (here: frames per semester) you're getting. (Moore's law FTW though.)

Btw, hardware acceleration is nice and all, but is there a "clean", arch agnostic code path too, or is that somewhere towards the bottom of the TODO list for now?
It's not fast. My test system is an Samsung Exynos 5250 (Cortex-A15 @ 1.7GHz). Currently getting ~5-10VI/s or so. However, I've disabled the CPU's loop buffer due to a hardware bug in my silicon (?) and there's no graphical acceleration. If I resize the window and make it very small, the VI/s jumps up considerably. I'm crossing my fingers that hardware-accelerated EGL will speed things up a good deal.

One of my primary motivations for doing the port was to get CEN64 working on as many architectures and compilers as possible as it helps to expose bugs that may not be seen when working primarily in one environment.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Thu Aug 21, 2014 1:44 pm

This is very cool, MarathonMan! :D I wonder what benefits NEON can offer compared to SSE2/3.

Excellent work, krom, as usual!
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Thu Aug 21, 2014 4:41 pm

Thanks krom, I was thinking about something like this.

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: New core/version 0.3

Post by Nacho » Thu Aug 21, 2014 5:17 pm

The new commits broke Namco Museum 64. It exits with a segfault. Both cen64 and cen64-backport

Anyway, two general questions.... (maybe a bit naïve, but...)

1. Is there any compatibility list with commercial ROMs? Which commercial ROMs are booting?
2. Is there support for controller?
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Thu Aug 21, 2014 7:05 pm

Nacho wrote:The new commits broke Namco Museum 64. It exits with a segfault. Both cen64 and cen64-backport

Anyway, two general questions.... (maybe a bit naïve, but...)

1. Is there any compatibility list with commercial ROMs? Which commercial ROMs are booting?
2. Is there support for controller?
I didn't a compatibility list because of the lack of TLB support and RDP implementation, also CEN64 doesn't offer controller support. Without these, it's useless to write up any compatibility list, IMHO.

If I recall enough, with cen64-backport a lot of third party games like Rayman 2, Iggy's Reckin' Balls, Turok, etc. and a very few Nintendo games like Mario Party, Mario Golf, Super Mario 64, Sin and Punishment, Animal Forest can boot successfully, but with controller pak error or crashes.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Aug 21, 2014 7:53 pm

Nacho wrote:The new commits broke Namco Museum 64. It exits with a segfault. Both cen64 and cen64-backport

Anyway, two general questions.... (maybe a bit naïve, but...)

1. Is there any compatibility list with commercial ROMs? Which commercial ROMs are booting?
2. Is there support for controller?
Thanks, looks like 7116582a6b87862a1cc20a596ec21570e5630c0e broke that.

No commercial ROMs other than Namco Museum should boot until the RSP/RDP are emulated.

Controller support is reliant on interfacing with input libraries with the various OS-es, which I haven't done yet. I'm still undecided as to how to handle mapping buttons on controllers. Probably best to do it dynamically, which results in a lot of questions...

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Thu Aug 21, 2014 10:22 pm

MarathonMan wrote:Controller support is reliant on interfacing with input libraries with the various OS-es, which I haven't done yet. I'm still undecided as to how to handle mapping buttons on controllers. Probably best to do it dynamically, which results in a lot of questions...
In all my years of emulation, I have never found a good solution to the input problem. It always feels awkward and shitty, and I never liked doing it.

PS: Builds are completing successfully again :) got the deployment scheme worked out yet? :)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Aug 21, 2014 10:25 pm

beannaich wrote:PS: Builds are completing successfully again :) got the deployment scheme worked out yet? :)
Not yet... I haven't had internet access for the better part of the past two weeks almost now.

Tonight was catching up on fixing bug reports and merging a handful of branches together. :)

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Fri Aug 22, 2014 8:47 am

Mario Kart 64 now boots successfully and running with CEN64-backport's last commit! ;)

EDIT: also Blast Corps!
Last edited by Snowstorm64 on Fri Aug 22, 2014 8:57 am, edited 1 time in total.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Fri Aug 22, 2014 8:53 am

Snowstorm64 wrote:Mario Kart 64 now boots successfully and running with CEN64-backport's last commit! ;)
Hmm... never even realized it was broken before, heh. OoT has been working fine for quite some time, and that's been my primary test ROM.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Fri Aug 22, 2014 9:18 am

Does OoT work for you? For me it has never managed to successfully boot with cen64-backport.

Also, I report that Blast Corps now boots, but it crashes at main screen.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Fri Aug 22, 2014 9:26 am

I haven't added the CIC detection/handling code, and it assumes 6102. If you want other ROMs, change the top of si/controller.c:

Code: Select all

// Initializes the SI.
int si_init(struct si_controller *si,
  struct bus_controller *bus, const uint8_t *rom) {
  si->bus = bus;
  si->rom = rom;

  si->ram[0x26] = 0x3F;
  si->ram[0x27] = 0x3F;
  return 0;
}
6101:

Code: Select all

  si->ram[0x25] = 0x04;
  si->ram[0x26] = 0x3F;
  si->ram[0x27] = 0x3F;
6102:

Code: Select all

  si->ram[0x26] = 0x3F;
  si->ram[0x27] = 0x3F;
6103:

Code: Select all

  si->ram[0x26] = 0x78;
  si->ram[0x27] = 0x3F;
6105:

Code: Select all

  si->ram[0x26] = 0x91;
  si->ram[0x27] = 0x3F;
6106:

Code: Select all

  si->ram[0x26] = 0x85;
  si->ram[0x27] = 0x3F;
For OoT, you'd want 6105.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Fri Aug 22, 2014 9:32 am

Thank you! Later I'll go to experiment with these changes. :D
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Fri Aug 22, 2014 9:36 am

BTW, Blast Corps. probably requires TLB support:

https://code.google.com/p/mupen64plus/i ... ail?id=280

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Fri Aug 22, 2014 10:35 am

Well, I have some interesting results:

-Diddy Kong Racing: for some reasons, doesn't complain about controller pak and it works, but has some weird issues (I guess it's a Z-buffer issue or a RSP one, I cannot tell. MarathonMan, can you investigate on this? )
-Donkey Kong 64: It complains about expansion pak not inserted, well, dang.
-Jet Force Gemini: Looks like it still have same timing issues, but less severe.
-Some games suffer by timing issue, like Super Smash Bros, Pokémon Snap, Blast Corp, etc, resulting in a freeze... :( ( I can tell it's related to cycles & timings, for example you change the value of MEMORY_WORD_DELAY that can produce different result, for example with a value like 12, Blast Corp doesn't freeze anymore at main screen, or Super Smash Bros freezes much before.)
-These are little things, but games like Excitebike 64 and Pokémon Stadium have managed to show the N64 logo! :D
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Fri Aug 22, 2014 1:09 pm

I still haven't implemented all the required cache operations, so that is likely the reason for the freezing.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sat Aug 23, 2014 5:03 pm

Snowstorm64 wrote:Well, I have some interesting results:

-Donkey Kong 64: It complains about expansion pak not inserted, well, dang.
Fixed -- system defaults to 8MiB now. I fixed the MFC1 assertion. DK64 now runs for a few seconds, and looks substantially better than it did with the old core. Likely needs some more cache work to progress further.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Sat Aug 23, 2014 7:07 pm

MarathonMan wrote:
Snowstorm64 wrote:Well, I have some interesting results:

-Donkey Kong 64: It complains about expansion pak not inserted, well, dang.
Fixed -- system defaults to 8MiB now. I fixed the MFC1 assertion. DK64 now runs for a few seconds, and looks substantially better than it did with the old core. Likely needs some more cache work to progress further.
That's a great news! :D

DK64 looks much more nice, indeed! But now Super Mario 64 doesn't boot anymore.... I guess it is still lacking TLBR support.

EDIT: Report Excitebike 64

Code: Select all

Unimplemented instruction: BREAK [0x000101CD] @ 0xFFFFFFFF8000EB48
cen64-6103-debug: /data/emulators/cen64-backport/vr4300/functions.c:794: VR4300_INV: Assertion `0 && "Unimplemented instruction encountered."' failed.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Sat Aug 23, 2014 8:06 pm

MarathonMan wrote:Fixed -- system defaults to 8MiB now. I fixed the MFC1 assertion. DK64 now runs for a few seconds, and looks substantially better than it did with the old core. Likely needs some more cache work to progress further.
This is what DK64 looks like on Windows:
Image

.. :( Tried with z64 and n64 formats. Also tried Legend of Zelda, The - Ocarina of Time (USA), same results.

Also, if I don't specify any command line args, I don't get any error message as the source would indicate:

Code: Select all

  if (argc < 3) {
    printf("%s <pifrom.bin> <rom>\n", argv[0]);
    return 255;
  }
I get no such message at all.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sat Aug 23, 2014 8:12 pm

Is that cen64.git or cen64-backport.git? It needs to be the latter. Also, I've never tried the RSP/RDP in Windows, so...
beannaich wrote:Also, if I don't specify any command line args, I don't get any error message as the source would indicate:

Code: Select all

  if (argc < 3) {
    printf("%s <pifrom.bin> <rom>\n", argv[0]);
    return 255;
  }
I get no such message at all.
That issue I'm aware of. Some fancy-pants Windows console stuff needs to be written.

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Sat Aug 23, 2014 8:14 pm

MarathonMan wrote:Is that cen64.git or cen64-backport.git? It needs to be the latter. Also, I've never tried the RSP/RDP in Windows, so...
Oh, :D. Would have been good to know. Want me to change the CI server to cen64-backport.git?

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sat Aug 23, 2014 8:15 pm

beannaich wrote:
MarathonMan wrote:Is that cen64.git or cen64-backport.git? It needs to be the latter. Also, I've never tried the RSP/RDP in Windows, so...
Oh, :D. Would have been good to know. Want me to change the CI server to cen64-backport.git?
cen64-backport won't build in Windows without a couple patches. Somebody mentioned how to do it a few pages back.

Hopefully, cen64-backport won't be needed for much longer. It's just a test to see what ROMs are doing; I'd hold off.

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: New core/version 0.3

Post by Nacho » Sun Aug 24, 2014 6:27 pm

MarathonMan wrote:
Snowstorm64 wrote:Mario Kart 64 now boots successfully and running with CEN64-backport's last commit! ;)
Hmm... never even realized it was broken before, heh. OoT has been working fine for quite some time, and that's been my primary test ROM.
OoT worked for me, but after a while at the intro, it freezes with "BadVAddr: 0x0000000000000000" and the yellow bar.

But not relevant, I guess, until the data cache is fully implemented.
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sun Aug 24, 2014 8:17 pm

It is implemented now... there's some coherency bug or something somewhere that I haven't been able to track down.

EDIT: Finally found the bug I've been looking for... for ages. A two-liner was the reason why ROMs would randomly freeze and the BadVAddr message would show up all the time.

Figures. :lol:

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Sun Aug 24, 2014 9:23 pm

Super Mario 64 is now dead, but at least we got Star Fox 64 booting for the first time with new core. Oddly enough, SM64 Shindou Edition still works too.

EDIT: Animal Forest crashes almost at boot, and it shows a red line. I wonder if OoT debug trick works also with this game....
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Sun Aug 24, 2014 9:38 pm

Goemon's Great Adventure booting for the 1st time, looks really great!

Amazing work MarathonMan =D

Locked

Who is online

Users browsing this forum: No registered users and 1 guest