Thinking about multi-threading...

Discuss any unrelated topics here.
Post Reply

How many cores does you have?

1
1
2%
2
10
21%
3
0
No votes
4
34
72%
6
1
2%
8
1
2%
10
0
No votes
12
0
No votes
14
0
No votes
16
0
No votes
 
Total votes: 47

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Thinking about multi-threading...

Post by MarathonMan » Fri Jan 30, 2015 10:52 am

Please answer the poll.

Do NOT include hyper-threaded/SMT cores. (i.e., a Core i7 2600 has 4 cores, even though it has 8 threads. A Core i7 2500 has 4 cores as well.)

This will give me an idea of what's available in terms of headroom for design purposes. I'm particularly interested to see if anyone is still rocking single-core CPUs since the SSE2 builds seem to be quasi-popular.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Thinking about multi-threading...

Post by Snowstorm64 » Fri Jan 30, 2015 11:46 am

I have an Intel i7 4770K, so 4 cores.

I don't want to be mean towards those users, but I wonder if it is worth to drop the support for systems with a mono-core CPU, since those aren't powerful enough to run CEN64 at an acceptable speed and they likely still won't do so in the future, and instead to focus on making CEN64 work at his best with 2 CPU cores?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Fri Jan 30, 2015 12:09 pm

True, but... I thought the same thing with SSE2 vs. SSSE3 and look where that got me. ;)

If I am able to figure out multi-threading, I'm not expecting to sport a 2x speedup (or anything near that range), so leaving the single-core functionality attached somehow is probably desirable.

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Thinking about multi-threading...

Post by OldGnashburg » Fri Jan 30, 2015 1:18 pm

What kind of Multi Threading do you have in mind?
Gnash, Gnash, Gnash...

ShadowFX
Posts: 86
Joined: Sat Oct 05, 2013 2:08 am
Location: The Netherlands

Re: Thinking about multi-threading...

Post by ShadowFX » Fri Jan 30, 2015 3:01 pm

Using 4 cores here ;)
"Change is inevitable; progress is optional"

OS: Windows 10 Pro x64
Specs: Intel Core i7-7700K @ 4.2GHz, 16GB DDR4-RAM, NVIDIA GeForce GTX 1080 Ti
Main build: AVX (official)

User avatar
klarthailerion
Posts: 4
Joined: Fri Oct 04, 2013 7:20 pm

Re: Thinking about multi-threading...

Post by klarthailerion » Fri Jan 30, 2015 7:55 pm

HTPC: i5-2300 @ 2.8 GHz - 4 cores
laptop: i7-4790 @ 3.6 GHz - 4 cores

Why does my wife's laptop used for writing her dissertation have a better processor than my primary computer, anyway?

User avatar
siggie
Posts: 5
Joined: Mon Nov 04, 2013 2:23 pm

Re: Thinking about multi-threading...

Post by siggie » Sat Jan 31, 2015 5:11 am

klarthailerion wrote:laptop: i7-4790 @ 3.6 GHz - 4 cores
Are you sure it's this type? Though the 4790 was only available as a desktop cpu.

ShadowFX
Posts: 86
Joined: Sat Oct 05, 2013 2:08 am
Location: The Netherlands

Re: Thinking about multi-threading...

Post by ShadowFX » Sat Jan 31, 2015 7:22 am

siggie wrote:Are you sure it's this type? Though the 4790 was only available as a desktop cpu.
To my knowledge, it is only available in desktops.
Also, as far as I can see, there is no mobile CPU stock clocked at 3.6GHz.

Source: http://en.wikipedia.org/wiki/List_of_In ... processors
"Change is inevitable; progress is optional"

OS: Windows 10 Pro x64
Specs: Intel Core i7-7700K @ 4.2GHz, 16GB DDR4-RAM, NVIDIA GeForce GTX 1080 Ti
Main build: AVX (official)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sat Jan 31, 2015 10:29 am

OldGnashburg wrote:What kind of Multi Threading do you have in mind?
I'll probably start by deferring VI filters to another thread or something at first.

Eventually something based on this:
http://en.wikipedia.org/wiki/Software_t ... nal_memory

Basically: it's too hard to synchronize the state of the console a la bsnes at N64 frequencies (the locking contention is and probably will be too much for modern processors). So to "loosen" the coherency, you can basically have individual devices predict whether or not they will be interrupted or not in the future. In the event you guess wrong, you "roll back" things to a earlier state and "replay" the section where you guessed wrong with the knowledge that you were wrong and proceed forward.

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Thinking about multi-threading...

Post by OldGnashburg » Sat Jan 31, 2015 12:23 pm

Wouldn't that result in stuttering or data corruption? What makes it seamless?
Gnash, Gnash, Gnash...

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: Thinking about multi-threading...

Post by beannaich » Sat Jan 31, 2015 1:38 pm

OldGnashburg wrote:Wouldn't that result in stuttering or data corruption? What makes it seamless?
You keep records of the system state at key points, so one would only have to roll back to the last "key" point, and continue from there after resolving whatever condition is necessary.

For example, say you have a CPU that is interrupted by a graphics processor at 60Hz for frame synchronization (VSync). Then you'd record the state of the GPU at each frame interrupt event, and the CPU every X cycles, where X is the amount of time in CPU time between each interrupt you're expecting. Then the two can seamlessly roll back any changes to the last frame event, and synchronize 60 times a second, instead of 93.75 million times a second. :D

The issues with this approach are memory consumption to be able to record all these snapshots. But emulators like http://www.exodusemulator.com have used this to provide major speed ups to their cycle accurate implementations.

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: Thinking about multi-threading...

Post by Nacho » Sat Jan 31, 2015 1:43 pm

Then, with that approach, the more cores the host proccessor has, the higher is the speed of the simulation?

I mean, a 8 cores processor would perform faster than a 4 cores, asuming the same clock speed. Right?
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: Thinking about multi-threading...

Post by beannaich » Sat Jan 31, 2015 2:08 pm

Nacho wrote:Then, with that approach, the more cores the host proccessor has, the higher is the speed of the simulation?

I mean, a 8 cores processor would perform faster than a 4 cores, asuming the same clock speed. Right?
More cores (and specifically, more threads) just allows you to run more components on their own thread, which should allow for better throughput. But synchronization is still needed, and costly, so the amount of increase might be great, or it might be negligible depending on the circumstances.

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Thinking about multi-threading...

Post by OldGnashburg » Sat Jan 31, 2015 4:51 pm

To add to that, I do not believe MarathonMan would be willing to create several multi-threaded variations of CEN64, with each one supporting computers with a different amount of cores. Almost all CPU's that could run at an almost reasonable speed are Quad-Core, along with the fact (I may be generalizing so forgive my bias) but most modern Intel i5's and almost all i7's (which are becoming more mainstream), have 4 cores, and given that most people (well at least middle class Westerner's like me) IMHO are getting mid-range multimedia or (very) low end gaming PC/Laptops. It will only be a matter of time until everybody will have 4 Cores. It just doesn't make sense to have multiple versions that are threaded for 2, 4, 6 and 8 core CPU's when the minimal requirement would be 4 cores which is also more mainstream.
Gnash, Gnash, Gnash...

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: Thinking about multi-threading...

Post by Nacho » Sat Jan 31, 2015 5:16 pm

Well, I don't believe that the number of threads/cores would be hardcoded on CEN64.
So to "loosen" the coherency, you can basically have individual devices predict whether or not they will be interrupted or not in the future.
So, CEN would launch so many devices as possible, launching it in a programatically way depending of the host machine. There's no need of different versions.

So...

monocore: no threads and CEN will be as faster as a snail. But I doubt any monocore would be ever able to run CEN64 at 60VI/s.
2 cores: 2 devices running at parallel?
4 cores: 3 devices and 1 thread for sync?
8 cores: 7 devices and 1 thread for sync?
etc.

So, what MarathonMan is trying to do is to decide if a STM approach is interesting enough to spend time developing it. I would say it worths it. But I have no real experience coding things like CEN, so I'm only guessing.
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

AIO
Posts: 51
Joined: Wed Nov 05, 2014 4:56 pm

Re: Thinking about multi-threading...

Post by AIO » Sat Jan 31, 2015 5:50 pm

MarathonMan wrote:I'll probably start by deferring VI filters to another thread or something at first.
Speaking of VI filters, Angrylion's applies them every frame, even though most games are not 60 fps. Is there a feasible way to implement automatic frame skip for VI filters?

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sun Feb 01, 2015 2:43 am

I'm not an expert in the VI, but:

From an HLE prospective, you might be able to get away with just checking if the VI_ORIGIN didn't change and not re-rendering/re-computing the frame.

From a cycle-accurate perspective, you have to simulate the memory transactions that the VI is doing anyways, so it doesn't make sense to do something like that from my prospective.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Thinking about multi-threading...

Post by Snowstorm64 » Sun Feb 01, 2015 5:51 pm

Well, I think that's enough, with 30 votes. The results are clear, some multithreading love would be good for CEN64 and its users.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

AIO
Posts: 51
Joined: Wed Nov 05, 2014 4:56 pm

Re: Thinking about multi-threading...

Post by AIO » Sun Feb 01, 2015 6:39 pm

MarathonMan wrote:I'm not an expert in the VI, but:

From an HLE prospective, you might be able to get away with just checking if the VI_ORIGIN didn't change and not re-rendering/re-computing the frame.
Thanks! Tested it and seems like a decent solution for my standards :D .
MarathonMan wrote:From a cycle-accurate perspective, you have to simulate the memory transactions that the VI is doing anyways, so it doesn't make sense to do something like that from my prospective.
Ya, it's not even safe to assume it will work perfectly for every game. I just like to experiment with things :) .

Do you think RSP yielding would be easier to implement with multi-threading?

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: Thinking about multi-threading...

Post by Nintendo Maniac 64 » Sat Feb 07, 2015 6:34 pm

MarathonMan wrote:I'm particularly interested to see if anyone is still rocking single-core CPUs since the SSE2 builds seem to be quasi-popular.
MarathonMan wrote:I thought the same thing with SSE2 vs. SSSE3 and look where that got me. ;).
I would imagine that the popularity of the SSE2 builds is just because AMD didn't add SSSE3 until Bobcat and Bulldozer; all multi-core K8 and K10 CPUs only support SSE3 (K10 also supports SSE4a).

...speaking of SSE4a, I wonder if that could be used at all to help K10's lack of SSSE3.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sat Feb 07, 2015 9:49 pm

Nintendo Maniac 64 wrote:...speaking of SSE4a, I wonder if that could be used at all to help K10's lack of SSSE3.
Nope. The useful aspect of SSSE3 is pshufb, which is not present from SSE4a (nor is there a similar substitute).

Merrep
Posts: 2
Joined: Sun Jan 25, 2015 8:09 am

Re: Thinking about multi-threading...

Post by Merrep » Sun Feb 08, 2015 9:25 am

Snowstorm64 wrote:Well, I think that's enough, with 30 votes. The results are clear, some multithreading love would be good for CEN64 and its users.
Agree. I think the key difference vs the SSE2 decision is that CEN64 optimised for 4 cores would still probably run on a 1, 2, or 3 core CPU (albeit with additional performance penalties), whereas an AVX build will never be able to run on a SSE2 processor.

I think there are three groups of people interested in CEN64:
  1. Probably the largest group: those who want to be able to play accurately emulated N64 games at full speed, with hardware sufficient to cope with a well optimised cycle-accurate emulator.
  2. People with a casual interest in the N64 emulation scene, who want to see what new developments are underway and probably would like to run the game at full speed, but lack the hardware to do it, and therefore will probably stick to Mupen etc for their actual gameplay needs.
  3. Those with either an academic interest in emulation and the N64 platform, or people who demand accurate emulation regardless of the speed (e.g. people creating TAS')
Even with well-written multithreading and the incredible optimisations MarathonMan manages to keep making, it seems extremely unlikely that anything inferior to an i5 is ever going to be able to run the games at full speed.

Group 1 will all be grateful for optimisation for four cores, as they will all have them by definition. Group 2 may be upset, as they'll probably see some loss of performance, but they never had a hope of reaching 60vi/s anyway, however the emulator will still at least function, for interests sake. Group 3 may well have sufficient computing power to run CEN64 at full-speed, but even those who don't likely value accuracy over speed in any case.

Whilst my practical experience of parallel computing is somewhat limited, I suspect that an STM approach will require a reasonably extensive restructuring of CEN64 in order to appropriately divide the workload between an appropriate number of threads, and optimising this for a different number of threads would probably be rather a lot of work (as opposed to having to implement some processor instructions in multiple different ways, then setting some compiler flags for instruction set extensions). It would make maintenance a nightmare, likely introduce further bugs, and slow the pace of development.

Personally I think optimising for a nearly 6 year old CPU is extremely reasonable, and compromising the speed of development to support dying hardware that doesn't have a hope in hell of running the emulator at close to reasonable speed would be a great shame. Supporting <SSE4.1 was rather generous!

Anyway, first post and just wanted to say thanks for all the incredible work MarathonMan. After so many years of complete stagnation with N64 emulation, it's amazing to see someone with such incredible skills not just taking an interest, but going to such extraordinary lengths to create an emulator which thoroughly eclipses everything that has come before it. Good job!

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sun Feb 08, 2015 11:44 am

I agree with just about all of your points.
Merrep wrote:it seems extremely unlikely that anything inferior to an i5 is ever going to be able to run the games at full speed.
Yep, my "goal" has always been to target a 3GHz or so processor. I'd call that a win in my book.
Merrep wrote:Anyway, first post and just wanted to say thanks for all the incredible work MarathonMan. After so many years of complete stagnation with N64 emulation, it's amazing to see someone with such incredible skills not just taking an interest, but going to such extraordinary lengths to create an emulator which thoroughly eclipses everything that has come before it. Good job!
Thanks for the kind words! Hopefully I will be able to see this thing through. :D

EDIT: Just for fun, I've attached a patch that demonstrates the "upper bound" on what kind of performance a multi-threaded CEN64 would probably realize. The patch is designed for POSIX threads, and thus won't work on Windows platforms. It basically runs the RCP in one thread, and the VR4300 in another. They synchronize every 2,000 RCP clock cycles (or 3,000 VR4300 pcycles).

Ironically, the patch actually seems very stable. Just sacrifices accuracy under the hood. :D

Code: Select all

diff --git a/device/device.c b/device/device.c
index 542e788..3f3d8ed 100644
--- a/device/device.c
+++ b/device/device.c
@@ -148,22 +148,67 @@ void device_run(struct cen64_device *device) {
   fpu_set_state(saved_fpu_state);
 }
 
+#include <pthread.h>
+pthread_mutex_t rcp_sync_lock = PTHREAD_MUTEX_INITIALIZER;
+pthread_cond_t rcp_sync_cv = PTHREAD_COND_INITIALIZER;
+bool rcp_ready;
+bool vr4300_ready;
+
+// Continually cycles the device until... forever!
+static void *device_rcp_spin(void *opaque) {
+  struct cen64_device *device = (struct cen64_device *) opaque;
+  unsigned i;
+
+  while (1) {
+    for (i = 0; i < 2000; i++) {
+      rsp_cycle(&device->rsp);
+      vi_cycle(&device->vi);
+    }
+
+    pthread_mutex_lock(&rcp_sync_lock);
+
+    if (!vr4300_ready) {
+      rcp_ready = true;
+      pthread_cond_wait(&rcp_sync_cv, &rcp_sync_lock);
+      rcp_ready = false;
+      pthread_mutex_unlock(&rcp_sync_lock);
+    }
+
+    else {
+      pthread_mutex_unlock(&rcp_sync_lock);
+      pthread_cond_signal(&rcp_sync_cv);
+    }
+  }
+
+  return NULL;
+}
+
 // Continually cycles the device until setjmp returns.
 int device_spin(struct cen64_device *device) {
-  if (setjmp(device->bus.unwind_data))
-    return 1;
+  pthread_t rcp_thread;
+  unsigned i;
 
-  while (1) {
-    unsigned i;
+  rcp_ready = false;
+  vr4300_ready = false;
+  pthread_create(&rcp_thread, NULL, device_rcp_spin, device);
 
-    for (i = 0; i < 2; i++) {
+  while (1) {
+    for (i = 0; i < 3000; i++)
       vr4300_cycle(&device->vr4300);
-      rsp_cycle(&device->rsp);
-      vi_cycle(&device->vi);
 
+    pthread_mutex_lock(&rcp_sync_lock);
+
+    if (!rcp_ready) {
+      vr4300_ready = true;
+      pthread_cond_wait(&rcp_sync_cv, &rcp_sync_lock);
+      vr4300_ready = false;
+      pthread_mutex_unlock(&rcp_sync_lock);
     }
 
-    vr4300_cycle(&device->vr4300);
+    else {
+      pthread_mutex_unlock(&rcp_sync_lock);
+      pthread_cond_signal(&rcp_sync_cv);
+    }
   }
 
   return 0;
diff --git a/rsp/cp0.c b/rsp/cp0.c
index e9fd1b8..a99d255 100644
--- a/rsp/cp0.c
+++ b/rsp/cp0.c
@@ -54,7 +54,8 @@ uint32_t rsp_read_cp0_reg(struct rsp *rsp, unsigned src) {
 
   switch(src) {
     case RSP_CP0_REGISTER_SP_RESERVED:
-      if (!rsp->regs[RSP_CP0_REGISTER_SP_RESERVED]) {
+      if (!pthread_mutex_trylock(&rsp->rsp_semaphore)) {
+      //if (!rsp->regs[RSP_CP0_REGISTER_SP_RESERVED]) {
         rsp->regs[RSP_CP0_REGISTER_SP_RESERVED] = 1;
         return 0;
       }
@@ -187,8 +188,10 @@ void rsp_write_cp0_reg(struct rsp *rsp, unsigned dest, uint32_t rt) {
       break;
 
     case RSP_CP0_REGISTER_SP_RESERVED:
-      if (rt == 0)
+      if (rt == 0) {
+        pthread_mutex_unlock(&rsp->rsp_semaphore);
         rsp->regs[RSP_CP0_REGISTER_SP_RESERVED] = 0;
+      }
 
       break;
 
diff --git a/rsp/cpu.c b/rsp/cpu.c
index 736768f..64edebf 100644
--- a/rsp/cpu.c
+++ b/rsp/cpu.c
@@ -27,6 +27,7 @@ static void rsp_connect_bus(struct rsp *rsp, struct bus_controller *bus) {
 
 // Releases memory acquired for the RSP component.
 void rsp_destroy(struct rsp *rsp) {
+  pthread_mutex_destroy(&rsp->rsp_semaphore);
   arch_rsp_destroy(rsp);
 }
 
@@ -37,6 +38,7 @@ int rsp_init(struct rsp *rsp, struct bus_controller *bus) {
   rsp_cp0_init(rsp);
   rsp_pipeline_init(&rsp->pipeline);
 
+  pthread_mutex_init(&rsp->rsp_semaphore, NULL);
   return arch_rsp_init(rsp);
 }
 
diff --git a/rsp/cpu.h b/rsp/cpu.h
index b004be3..f04a9bd 100644
--- a/rsp/cpu.h
+++ b/rsp/cpu.h
@@ -14,6 +14,7 @@
 #include "os/dynarec.h"
 #include "rsp/cp2.h"
 #include "rsp/pipeline.h"
+#include <pthread.h>
 
 enum rsp_register {
   RSP_REGISTER_R0, RSP_REGISTER_AT, RSP_REGISTER_V0,
@@ -64,6 +65,7 @@ struct rsp {
   // TODO: Only for IA32/x86_64 SSE2; sloppy?
   struct dynarec_slab vload_dynarec;
   struct dynarec_slab vstore_dynarec;
+  pthread_mutex_t rsp_semaphore;
 };
 
 cen64_cold int rsp_init(struct rsp *rsp, struct bus_controller *bus);

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Thinking about multi-threading...

Post by OldGnashburg » Sun Feb 08, 2015 2:11 pm

I have a question about accuracy. Assuming CEN64 works well being multi-threaded will it still be cycle-accurate (in the same bsnes/Higan is cycle accurate), assuming you write your own pixel AND cycle accurate RDP (which IMHO has to happen (I would actually beg for it)), will it still be 99.98% (or so) accurate to the console? I am not being critical at all about your work (your work is amazing), however if one is going to make a cycle accurate emulator, cycle accuracy and accuracy in general must take priority over anything.
Gnash, Gnash, Gnash...

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: Thinking about multi-threading...

Post by Snowstorm64 » Sun Feb 08, 2015 2:45 pm

MarathonMan wrote: Just sacrifices accuracy under the hood. :D
What kind of accuracy do you mean? Cycle accuracy?

EDIT: I have tried the patch and the games ( that I have tested like MK64, SM64, Majora's Mask, etc.) yield some performance improvements, and that is great, but CEN64 may hang occasionally.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sun Feb 08, 2015 3:52 pm

OldGnashburg wrote:I have a question about accuracy. Assuming CEN64 works well being multi-threaded will it still be cycle-accurate (in the same bsnes/Higan is cycle accurate), assuming you write your own pixel AND cycle accurate RDP (which IMHO has to happen (I would actually beg for it)), will it still be 99.98% (or so) accurate to the console? I am not being critical at all about your work (your work is amazing), however if one is going to make a cycle accurate emulator, cycle accuracy and accuracy in general must take priority over anything.
No, that'll still be a ways off. The RDRAM controller, bus arbitration logic, and a slew of other things aren't even remotely cycle-accurate ATM... they just use approximations associated with each access. Dolphin is looking at doing something similar to create the illusion of cycle-accuracy, and CEN64 currently does the same (until the pipelines for everything are implemented as in hardware, at which point the guestimates can be replaced with the result of the model).
Snowstorm64 wrote:
MarathonMan wrote: Just sacrifices accuracy under the hood. :D
What kind of accuracy do you mean? Cycle accuracy?
Correct.
Snowstorm64 wrote:and that is great, but CEN64 may hang occasionally.
That's expected, as the patch is essentially a giant hack that relies on the coherency of the host CPU to keep this in a quasi-synchronized state. Think of the patch as a case-study to determine the feasibility of multi-threading CEN64. Especially with my limited amount of developer time, I like to quantify completely radical ideas before I go ahead and spend hours upon hours implementing something that might yield peanuts in return. :)

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: Thinking about multi-threading...

Post by Nintendo Maniac 64 » Sun Feb 08, 2015 4:06 pm

Merrep wrote:it seems extremely unlikely that anything inferior to an i5 is ever going to be able to run the games at full speed.
My Pentium G3258 takes offense to that. :P
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
Breadwinka
Posts: 54
Joined: Fri Oct 04, 2013 11:35 pm

Re: Thinking about multi-threading...

Post by Breadwinka » Sun Feb 08, 2015 6:51 pm

Nintendo Maniac 64 wrote:
Merrep wrote:it seems extremely unlikely that anything inferior to an i5 is ever going to be able to run the games at full speed.
My Pentium G3258 takes offense to that. :P

Well to be fair the G3258 is a great cpu just have to OC, and you get i5 like performance. Haswell at 4.5Ghz on stock cooler for less $100 crazy.

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Thinking about multi-threading...

Post by OldGnashburg » Sun Feb 08, 2015 7:45 pm

Speaking of overclocking CPU's, does anybody know how I can overclock my Quad-Core Intel i7-2670QM @ 2.2 to 3.1 GHz. Is there a program I can use? I looked up overclocking my (i7-2670QM) processor, but can't find anything, can anybody give me some pointers? My laptop is an ASUS K53SD DS-71.

EDIT:
@ MarathonMan, do you have any time frame in which you will be finished multithreading, and have everything in place for starting on Cycle Accuracy? Any basic description on how this will work?
Gnash, Gnash, Gnash...

User avatar
Breadwinka
Posts: 54
Joined: Fri Oct 04, 2013 11:35 pm

Re: Thinking about multi-threading...

Post by Breadwinka » Sun Feb 08, 2015 8:45 pm

OldGnashburg wrote:Speaking of overclocking CPU's, does anybody know how I can overclock my Quad-Core Intel i7-2670QM @ 2.2 to 3.1 GHz. Is there a program I can use? I looked up overclocking my (i7-2670QM) processor, but can't find anything, can anybody give me some pointers? My laptop is an ASUS K53SD DS-71.

EDIT:
@ MarathonMan, do you have any time frame in which you will be finished multithreading, and have everything in place for starting on Cycle Accuracy? Any basic description on how this will work?
you would have to go into the bios but most of the time laptop's cannot be overclocked.

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: Thinking about multi-threading...

Post by Nintendo Maniac 64 » Sun Feb 08, 2015 9:04 pm

I think he's looking to keep his CPU in its highest turbo pstate rather than overclocking it.
Breadwinka wrote:Well to be fair the G3258 is a great cpu just have to OC, and you get i5 like performance. Haswell at 4.5Ghz on stock cooler for less $100 crazy.
Exactly! My use of emulation was one of the main reasons I got it over an AMD processor - more cores wouldn't have done me any good.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Thinking about multi-threading...

Post by OldGnashburg » Sun Feb 08, 2015 10:55 pm

Nope, I actually want to raise the base clock and turbo clock speeds (and whatever else there is) to the highest level (and as long as my computer can handle it) while still being safe. Is there a program I can use to do this (get past the bios and overclock)? If so, how would I do this? I'm hoping for a safe base clock of 2.8 to 3.0 GHz and a Turbo Clock of around 3.6 to 3.8 GHz. Can anybody with experience help me out? Of course I might have to get a cooling pad for my laptop.
Gnash, Gnash, Gnash...

User avatar
Nintendo Maniac 64
Posts: 185
Joined: Fri Oct 04, 2013 11:37 pm

Re: Thinking about multi-threading...

Post by Nintendo Maniac 64 » Sun Feb 08, 2015 10:59 pm

I'm like 99.99% sure the answer is no.

However, you'd be better off to ask on an actual PC enthusiast forum like Overclock.net or similar.
CEN64 Forum's resident straight-male kuutsundere
(just "tsundere" makes people think of "Shana clones" *shivers*)

CPU+iGPU: Pentium G3258 @ 4.6GHz/1.281v
dGPU: Radeon HD5870 1GB
RAM: Vengeance 1600 4x4GB
OS: Windows 7

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Thinking about multi-threading...

Post by Narann » Fri Jan 15, 2016 5:15 pm

I thought about deadlocks we were talking about yesterday.

I realize it's maybe because angrylion plugin doesn't emulate interrupts properly.

I will try to explain more clearly here from my humble knowledges:

Disclaimer: I'm not a threading expert.

On real hardware, CPU and RCP run independently and only rely on few interrupts.

This mean the program running on the CPU is already supposed to handle deadlock and race condition or (I guess) it would crash on real HW too. That's the purpose of all the Sync<suff> macros provided by GBI: To stop the CPU until a particular interrupt is free (like a hardware mutex I guess).

My proposal is to not handle threading after a certain amount of cycle (like the patch I've seen wich check at an arbitrary moment) but to detect "this is the interrupt I was waiting!".

There is many way to do this. The RCP could inform the CPU thread when a particular interrupt is triggered. The CPU thread look if the interrupt the CPU is waiting has changed. If so it continue the CPU thread. If not, it still hang and wait for the next RCP interrupt change.

Another way (the blind way) is to assume the CPU know what it does: If the CPU ask the RDP to load a particular texture to its TMEM (LoadBlock/LoadTile), then run this as a thread. If the program is properly written, it will finally run thought a SyncTile wich could be translate as "now, wait the thread".

If we could identifiate those interrupts and access them in an "atomic" way (I'm not a threading expert but I mean: "access them in a way that doesn't throw a race condition") this would mean we could rely on "race management" already present in the program because N64 had the same problem.

Most RDP plugins, emulate the interrupt at FullSync RDP command (example here) and call the zilmar spec CheckInterrupt() while all the SyncPipe/Full/Tile are empty. This is not enough, but because it hasn't been a problem before, no RDP plugin actually emulate interrupts properly.

And this is why you have to do the weird "run N cycle then check" loop.

In a perfect world, it's not up to Cen64 to fix it's threading problem. Cen64 should focus on interrupts and RDP plugins should focus on emulate interrupts properly.

Hope it's clear enough. :)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Fri Jan 15, 2016 10:42 pm

Yes, this is along the lines of what the problem is... I'm quite certain when I was thinking about it yesterday...

Though not necessarily limited to the RCP, there are a lot of mask registers lying around that the RCP thread and VR4300 thread are probably stepping on each other. The easiest way to remedy the prompt is to either use atomic primitives to control those flags.

I also realized that my synchronization of the two cores is probably done in a less-than-ideal way right now and I have an idea how to fix that as well.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sat Jan 16, 2016 5:19 pm

The synchronization problem has been fixed. The performance actually seems slightly (if ever so) higher than before the fix.

I have been walking around Kokiri Forest for some time now without issues.

The sound also seems to be more reliable, but still sounds horrendous. There's definitely bugs lurking around somewhere.

Unfortunately, even though both the RCP and VR4300 thread show ~60% utilization of the CPU cores each, it seems to hover at about 40-45VI/s on my machine. The OpenGL thread is chewing up around 20% of the core that is shared with the VR4300, so I wonder if that is causing some of the issues.
Attachments
Multithreading.png
Multithreading.png (180.82 KiB) Viewed 39716 times

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Sat Jan 16, 2016 8:39 pm

Now that the atomicity issue is fixed, I found that I could also just do away with the core syncing all together.

Most ROMs have a race condition that this can trigger early on, but after the ROMs have started, things usually hit 50-60VI/s.

Of course, without syncing the cores, you sacrifice a good deal of accuracy.
Attachments
sm64.png
sm64.png (147.54 KiB) Viewed 39698 times

User avatar
max_power
Posts: 6
Joined: Sat Oct 05, 2013 6:01 am

Re: Thinking about multi-threading...

Post by max_power » Thu Jan 21, 2016 10:03 am

How often do you have to sync the cores anyway? That is, how fast can some sync event, or any change of shared state, propagate from say the VR4300 to the RCP?

Suppose that takes 10 VR cycles, couldn't you then record every such event in the VR thread for 10 cycles, and if any happened in that batch tell the RCP thread to look at the record and apply these sync events on the cycle they're supposed to appear there. That should cut down syncing the threads from every simulated cycle to every 10 (or whatever it actually is).

On the other hand, that's so obvious that somebody would've already thought of that ;)
So there's probably a good reason why it doesn't work, or is irrelevant. Could someone enlighten me?

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Fri Jan 22, 2016 1:52 am

Transactional simulations (or at least that's what I call it) is something I looked at. The problem(s) are mainly:
- Interrupts are bidirectional. The RCP can be interrupted, or the RCP can interrupt the VR4300. Who wins? Bus arbitration will tell you that. But that means you have to now simulate the bus arbitration. But bus arbitration is dependent upon who 'rings in' first in most cases. Who rings in first? Well we can record when each core puts out a signal and compare all that at the sync point, and then check over it, and then possibly roll back, and ... you get the picture. The amount of work you put, and the amount of ...
- Memory required! OK, so let's say you figured out that the VR4300 should have been interrupted 3 cycles ago. Well maybe you're having a bad day and it updated a TLB entry, flushed a cache line, and overwrote some registers in that past few cycles. You need to checkpoint the entire state of the processor in order to rollback to any arbitrary cycle in the past. Of course, you need not make a complete copy of the state, but then you have the overhead of tagging each state update and walking back the tags on a transactional-esque abort.

I thought about these things for many a hour. :)

Also: the sync overhead is barely manageable at ~1,000 cycles. Now, keep in mind that you have to do this 62,500,000/1,000 = 62,500 times a second or > 1000 times each VI (60 VI/s sec, so 16.666... ms). Dropping that to 10 cycles, or increasing the overhead by a factor of 100, would be torturous. If you could loosen up the sync window to a million cycles or so (which seems to bork many games randomly), the VI/s are so much higher and there is so much less jitter overall.

User avatar
izy
Posts: 25
Joined: Tue Jun 02, 2015 11:34 am

Re: Thinking about multi-threading...

Post by izy » Mon Feb 15, 2016 1:16 pm

MarathonMan wrote:the sync overhead
How is that done?

edit: i can't find that code from the patch http://forums.cen64.com/viewtopic.php?f=5&p=2406#p1867

User avatar
izy
Posts: 25
Joined: Tue Jun 02, 2015 11:34 am

Re: Thinking about multi-threading...

Post by izy » Tue Feb 16, 2016 12:58 pm

I see that the multithreading code has been updated now.
I don't know if that code was written that way for any particular reason, such as to give chances to add new features in the future.
At the moment it seems to me that the code is only used to create a thread barrier ?
Using a mutex (a locking mechanism to get the exclusive access to variables) and a condition variable (to wait for updates of variables) could be the easiest way to implement a barrier everywhere even though that isn't likely the best way.
Pthread supports thread barriers natively and they can be implemented better in the library. It could be faster to implement a barrier using pthreads... just by calling pthread_barrier_wait (too easy to use, i won't give an example of this). Another possible way to create a barrier is by using native kernel-backed objects. A futex or a simple eventfd can be used to implement a barrier.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Thinking about multi-threading...

Post by MarathonMan » Wed Feb 17, 2016 3:51 am

The problem with pthread_barrier_wait is that I'm placing an implicit dependency on pthreads. :(

I actually tried playing with barriers in place of a mutex/CV and the difference was too small to notice, so I chose the portable/simple route.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest