The new version will feature dynamically recompiled code and multi-threading as I have thought of a way to make use of both of these amenities. The multi-threading, ironically, may itself not provide a performance boost due to the overhead of synchronization across cores, but is an inherent requirement of using dynamically recompiled code in the method that I am using it.
Another thing I am working on during the rewrite is trying to become more Windows-friendly. The new version compiles on MSVC without any quirks or adjustments. In addition, I've also chosen to switch to CMake for the build system as to prevent the need for GNU make. So, those who are locked into the clutches of Microsoft can now rejoice.
So far, I have written the dynamic recompiler and tested the feasibility of using it in the method that I intend. The dynamic recompiler (hereafter ECG -- emulator code generator) is a reusable code generator for emulators that can generate code for a wide variety of targets (though I've only implemented an x86_64 backend at the moment). Instead of writing x86_64 ASM, emulator developers can use the ECG API to generate native machine code for any backend that ECG supports, without having to have anything that locks them into a specific architecture in their code. Thus, CEN64 will be ARM-ready from the get go (and anything else, should it crop up). I'll hopefully post it on GitHub in the coming days for developers who wish to play with it.
CEN64 is currently dispatch limited. Each cycle, it uses a table to look up which kind of instruction it should execute, much like a switch statement, and then performs that function. This, as it turns out, it wreaking absolute havoc on most CPU pipelines and is probably my biggest reason for a rewrite. Instead, ECG will be used to dynamically inline these instructions into the C pipeline code. Each instruction will result in a template of the following native machine code being generated. The branches that were once unpredictable are essentially removed and replaced with two calls to the same, predictable location that can easily be taken care of by the BTBs and RAS.
Code: Select all
r4300i_instruction: call r4300i_writeback_and_datacache; <inlined, recompiled MIPS code> call r4300i_decode_and_ifetch; if pipeline_stalled jmp r4300i_instruction # # Branches generate code to move to a different # cycle block using ECG here. Otherwise omitted. # if external_interrupts_pending ret