New core/version 0.3

News from administrators.
User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Fri Jul 25, 2014 2:10 pm

Yes, in the Dextrose Xtralife demo if I resize a little more, black borders are created around the VI window. However, that slight distortion is, well, unnoticeable there. But it's noticeable for example if you run krom's CPUOR.N64 test (although there you cannot see black borders...), exactly where is the "ORI" word, you should see a duplicated line.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Fri Jul 25, 2014 3:03 pm

I'll have to look at that ROM and compare it to the TV then...

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Fri Jul 25, 2014 3:30 pm

There is no need to check the TV, the image itself is perfectly fine, I meant that the window doesn't set the correct resolution (it's 640x480, whereas some demo/test like krom's CPUOR test seem to run at inferior resolution, like 1 pixel less in one side, causing that slight distortion). I thought the window would be aligning itself to match the running ROM's native resolution, but instead, if I'm guessing correctly, it does always set at 640x480.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Fri Jul 25, 2014 4:36 pm

The window should always be 640x480 as that's what the TV outputs.

If the output is garbled, and the window is the correct resolution, then it is a problem with the VI scaling algorithm. Which is totally possible, since it's currently using OpenGL hardware acceleration to do it (i.e.: it's a hack!).

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Fri Jul 25, 2014 5:13 pm

Snowstorm64 wrote:Awesome! Will you also work on the RDP?
Well aside from the many graphical RDP demos I want todo in my RDP directory, I could do CPUTest/RDP bit perfect tests by saving the videoram from a real N64 to the test rom, & comparing each byte to the emulated videoram, this would provide similar bit-perfect tests for the RDP =D
I'll start todo these after I have uploaded all the RDP/Triangle demos I am working on...
MarathonMan wrote:I will use these CPU tests when porting to IA32 (DMULTU and DDIVU don't compile right now), as well as for ARM
Sweet, if you need any help on the ARM side, I have done lots of funky stuff on the Raspberry Pi ARM FPU & Vector Unit!
MarathonMan wrote:The RSP assembler may give you some troubles... let me know if it's not working out, and check the output of a program with only one or two instructions before scratching your head.
Cheers MarathonMan, I am gonna start with real small steps just to get a feel for what is going on in the RSP. I am really glad I know the author of the assembler if I need any help =D
Snowstorm64 wrote:I have noticed that CEN64, when launched, has a incorrect resolution, resulting in a slight screen distortion, but if I try to resize a bit, this problem disappears. I don't know if it's CEN64 itself or my desktop environment (GNOME) that may have some weird settings. Maybe making CEN64's windows unresizable could fix this issue...
I should have really mentioned this earlier, ever since you asked me to put screenshots on my github...
I noticed exactly the same problem, e.g cen64 displays a few extra scanlines for all my 640x480 demos, I edited the screenshots on my github to not show these flaws.
If you run the same demos in MESS N64 driver & take a screen shot pressing F12, you will notice all of my demos save a screen shot that is 640x474 (6 pixels off the correct height "480" I am going for) with no distortions.
If you run any of my 320x240 demos in MESS and take a screen shot, the picture resolution is 320x237 (3 pixels off the correct height)...
The reason for this is that cen64 uses a static 640x480 OpenGL screen resolution, which blits the N64 videoram using OpenGL textures Nearest-Neighbor pixel resizing, this mode just adds scanlines to make the picture scale up correctly.
I may have a bug in my NTSC 320x240 & 640x480 setup, so I will look into this, but I have used exact figures from the official Nintendo documentation to make these screen resolution modes...

If my code is correct, then the N64 HW is a little off on it's Y resolutions, and we need to set a static screen size of 640x474 in the cen64 Opengl display which should show perfect pixels with Nearest-Neighbor OpenGL Pixels.
Another way to fix this would be to use OpenGL texture Linear mode to scale the pixels, which should create smoother transitions from each scanline to the next.

I'll look into all of this and update you guys with my findings =D

PS I can confirm all X resolutions are working correctly in cen64 it is only the Y scanline that are a problem right now...

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Sat Jul 26, 2014 10:33 am

krom wrote:PS I can confirm all X resolutions are working correctly in cen64 it is only the Y scanline that are a problem right now...
I wonder if it has anything to do with interlaced video? I work in the broadcast industry, and you'd be stunned at the amount of problems 525 scanline NTSC causes. Could it be that internally the N64 adjusts for the 2 fields in some way which causes little edge cases on the vertical axis?

This sounds similar to the "rounding error" in the RSP audio sampling register, where you can never get a true 32 KHz sampling rate.

Or, it could just be a bug in the code somewhere :D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sat Jul 26, 2014 10:37 am

I'm not at all experienced with the VI/RDP yet... maybe krom will know more. :P

Found another bug that prevented bcopy() and memcpy() from working correctly.

EDIT: krom, would you be so kind as to write tests for LWL/LWR/SWL/SWR? There are quite a few corner cases that I'm still worried about...

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sat Jul 26, 2014 2:40 pm

Bump.

Barring any bugs in LWL/LWR, once LDL/LDR and maybe SDL/SDR are implemented, commercial ROMs like Namco 64 should start booting.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Sun Jul 27, 2014 2:15 am

MarathonMan wrote:krom, would you be so kind as to write tests for LWL/LWR/SWL/SWR? There are quite a few corner cases that I'm still worried about...
Yeah sure I'll get to work on it right away! Please tell me if there are any specific edge cases that you want me to test, & I'll get those worked into the testrom =D
beannaich wrote:I wonder if it has anything to do with interlaced video? I work in the broadcast industry, and you'd be stunned at the amount of problems 525 scanline NTSC causes. Could it be that internally the N64 adjusts for the 2 fields in some way which causes little edge cases on the vertical axis?
Yes it could be that, also as the N64 uses antialiasing which has a set amount of pixels blurred between each scanline that could produce an extra scanline at the end etc...
In all my tests the official N64 documented figures for 320x240 & 640x480 screen setups for NTSC produce 320x237 & 640x434 screens respectively in commercial roms, e.g a screenshot from Mario64 in MESS creates a 320x237 resolution picture file etc...
Of course MESS might have buggy code to work out the N64 screen scanline height, so I can only think of one way to get to the bottom of this:
1. Make an RDP hardware test where I place a different scanline colours on every even scanline, color coded so I can see each one is displaying correctly.
2. Use my hacked Sony Trinitron flat screen CRT T.V, and check the color of the last scanline displayed in the NTSC Field Using a Real N64 running the test.
3. Save out the Video RAM of the N64 in it's entirity, and check exactly how it has placed each scanline in accordance to it's RDP antialiased output.

PS I don't want people to think I am an expert on the RDP as I still have so much to learn!! And the only way I can learn is to produce my RDP tests to find out what the heck is going on =D
(Hence me releasing all my source code, as I want other people to join in with testing the RDP, as there are much more clever people than me that can help out!)

User avatar
The Extremist
Posts: 29
Joined: Sun Nov 03, 2013 6:11 pm
Location: Canadian Prairie

Re: New core/version 0.3

Post by The Extremist » Sun Jul 27, 2014 2:32 am

krom wrote:3. Save out the Video RAM of the N64 in it's entirity, and check exactly how it has placed each scanline in accordance to it's RDP antialiased output.
People going to these lengths for accuracy is what I love to see! :)

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Sun Jul 27, 2014 3:13 am

The Extremist wrote:People going to these lengths for accuracy is what I love to see!
Heh it's nothing really, I love doing tests like this to work out stuff =D

I even helped in a similar way with the SNES emulator bsnes & made a Mode7 test to check which out of 2 known algorithms was the correct bit perfect representation of Mode7 on real hardware.
I used Mario Kart Mario Track 1 1024x1024 Mode7 picture data, & pumped in random numbers to all Mode7 registers, this created a crazy picture with certain patterns in the 2 known algorithms.
I then tested the same demo on real SNES hardware, & saw which of the algorithms was correct, this meant the author of bsnes (byuu) could delete the wrong algorithm from the bsnes source code =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sun Jul 27, 2014 7:27 am

Namco 64 booting and waiting for input. This confirms that a large majority of libultra is working as intended.

Now to put on the breaks for a little bit, do some cleanup, input handling, Windows lovin' and bug hunting...
Attachments
namco64.png
namco64.png (41.65 KiB) Viewed 15398 times

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Sun Jul 27, 2014 8:51 am

Great! It's like 2012 all over again! :P (Wasn't Namco Museum 64 the first commercial ROM to be booted with success on CEN64?)

Maybe you have already known it, but Mario Tennis and other games still requiring SDL instruction. But I guess it isn't time for it now.

Code: Select all

Unimplemented instruction: SDL [0xB0C10000] @ 0xFFFFFFFF803000EC
cen64-debug: /data/emulators/cen64/vr4300/functions.c:716: VR4300_INV: Assertion `0 && "Unimplemented instruction encountered."' failed.
EDIT: Wait, why Namco Museum 64 tells me it has detected a corrupt controller pak? Isn't it be supposed to not find the controller pak, at this state?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Sun Jul 27, 2014 9:40 am

Well done MarathonMan, great to see such progress in a short space of time =D

I have completed the word Load/Store CPU tests you requested, you can find them here:
https://github.com/PeterLemon/N64/tree/ ... /LOADSTORE
cen64 passes all the tests =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sun Jul 27, 2014 12:18 pm

Snowstorm64 wrote:EDIT: Wait, why Namco Museum 64 tells me it has detected a corrupt controller pak? Isn't it be supposed to not find the controller pak, at this state?
It's just a bit that can be flipped in the PIF response. It's currently set to indicate a controller pak present.
krom wrote:I have completed the word Load/Store CPU tests you requested, you can find them here:
https://github.com/PeterLemon/N64/tree/ ... /LOADSTORE
cen64 passes all the tests =D
Excellent! :D

However, do these check unaligned accesses? That's the interesting part... :shock:

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Sun Jul 27, 2014 1:28 pm

MarathonMan wrote:However, do these check unaligned accesses? That's the interesting part...
You are right, I have not checked different unaligned accesses... I'll update the demos with different unaligned access tests, straight after I check out the new build of cen64 =D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sun Jul 27, 2014 8:37 pm

Converted a good portion of the FPU to SSE/SSE2 (no more x87)!

The binary size actually remained fairly constant. Three reasons:
  • The MIPS FPU instructions have been incredibly challenging for me to understand and they've been implemented partially wrong all this time. MIPS handling of SNaN vs QNaN appears to differ ever-so-slightly from x86. The way they word things in the manual could certainly be improved, too...
  • I found some SSE/SSE2 intrinsics that I wasn't aware of that before.
  • I merged some of the exception-handling code to reduce size.
And another slow clap for krom, who's tool saved me from a lot of aggravation during this whole process. :lol:

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Sun Jul 27, 2014 9:26 pm

MarathonMan wrote:The MIPS FPU instructions have been incredibly challenging for me to understand and they've been implemented partially wrong all this time. MIPS handling of SNaN vs QNaN appears to differ ever-so-slightly from x86. The way they word things in the manual could certainly be improved, too...
Do you have a doc that list informations that are not in the manual? The point would be, in the end, to have a "MIPS 4300 companion" that gather your notes. Just an idea to avoid peoples to dig in the same tunnel than you in the futur. :)
MarathonMan wrote:I found some SSE/SSE2 intrinsics that I wasn't aware of that before.
And thoses make the job better (more accurately) than x87?

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Sun Jul 27, 2014 9:59 pm

I've just been using the manual for 99% of the time. I will eventually have to do some of my own tests, but I have yet to do that.

x87 has the ability to be just as accurate if you fumble around with the status word (which I wasn't doing). The accuracy gained will, unfortunately, be largely undetected as the only difference is some incredibly small rounding error.

If anyone has a Pentium 4, they'll probably be rubbing their hands together though. That, and maybe K8 users... ;)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Mon Jul 28, 2014 12:08 am

Bump: Just found a huge optimization. All ROMs that currently run are now hitting a consistent 60VI/s on my 3.2GHz (stock) Haswell. :D

I think it's safe to say that the new core wallops the old one in terms of efficiency.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Mon Jul 28, 2014 12:40 am

MarathonMan wrote:I think it's safe to say that the new core wallops the old one in terms of efficiency.
Great news MarathonMan, can't wait to see how it performs on my setup!!
Your hard work is really paying off =D

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Mon Jul 28, 2014 1:02 am

All I have to say is:

Image

Can I ask what is the optimization? I guess it's not the gcc one. Maybe here?

Anyway, that's very good to hear! :D

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Mon Jul 28, 2014 5:07 am

Horray! Excellent job, MarathonMan! You are our hero! :D ROMs are running beautifully on my i7 3,5 GHz!

However, I discovered another bug that I may have missed (EDIT: the reason is that this demo wasn't working properly before, it was freezing after 1 second): In the Soncrap Intro by RedboX (PD) demo, the text blinks, but with old core this doesn't happen. The debug output doesn't have anything useful.
Last edited by Snowstorm64 on Mon Jul 28, 2014 7:27 am, edited 1 time in total.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Mon Jul 28, 2014 7:16 am

@Narann: Yeah, it was the second one... the new core's approach of reducing indirect branch overhead went hand in hand with that one.All my machines are executing > 3 x86 instructions/cycle on average during emulation now. :shock:

EDIT: Confirmed timing issue.

EDIT 2: LOL, I goofed up somewhere big time. Easy fix though.

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Mon Jul 28, 2014 1:55 pm

Speaking of timing issue, is the new core flexible enough for games with unusual timing cycle e.g. DK64?

EDIT: I have noticed that you didn't get rid of assembly code completely, especially those compare logic instruction still have assembly code. Is there a reason for it?
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Mon Jul 28, 2014 4:43 pm

Hopefully... can't see without a RDP plugin. ;)

There are differences between MIPS and x86 FPUs. The difference between ucomiss and comiss (floating point comparison instructions) on x86, for example, is that the former only considered SNaNs to be unordered, whereas the latter considers either SNaNs or QNaNs to be unordered. MIPS... well, at least MIPS III, doesn't care whether it's QNaN or SNaN; if it's NaN, it's unordered. Moreover, MIPS says that for several instructions, if they are unordered, the condition resulting from the comparison is true sometimes, and other times false. In x86-land, if something's ordered, then it's going to set all the equivalent bits to true.

tl;dr: The inline assembly is an unfortunately requirement for hardware-acceleration of MIPS instructions on x86 due to semantic differences in the ISAs.

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Tue Jul 29, 2014 12:23 am

Hi MarathonMan, I just tried out the latest commit & it works really fast in windows too, massive speedups =D

I needed to change 1 line of source to get it to compile in my Mingw64 setup:
In /os/windows/gl_window.c near the end, change the line:

Code: Select all

void os_poll_events(struct gl_window *gl_window) {
To:

Code: Select all

void os_poll_events(struct bus_controller *bus, struct gl_window *gl_window) {
Hope this helps GCC windows users out there who want to test right away =D

Also here is a tutorial for windows GCC users to compile cen64 if you are having trouble using cmake:
1. Copy the file in the root of the source directory: common.h.in to a file called common.h
2. Edit the file common.h near the end, change the line:

Code: Select all

#cmakedefine DEBUG_MMIO_REGISTER_ACCESS
To:

Code: Select all

//#define DEBUG_MMIO_REGISTER_ACCESS
(You can unrem this define if you want the functionality)

3. Create a file called makefile in the root of the source directory, with this text inside:

Code: Select all

cen64 : cen64.c
	gcc -o cen64 cen64.c device.c ai/controller.c bus/controller.c bus/memorymap.c common/debug.c os/windows/gl_window.c os/windows/main.c pi/controller.c rdp/cpu.c rdp/interface.c ri/controller.c rsp/cpu.c rsp/interface.c si/controller.c vi/controller.c vr4300/cp0.c vr4300/cp1.c vr4300/cpu.c vr4300/dcache.c vr4300/decoder.c vr4300/fault.c vr4300/functions.c vr4300/icache.c vr4300/interface.c vr4300/opcodes.c vr4300/pipeline.c vr4300/segment.c -O3 -s -DNDEBUG -I. -I"./os/unix/fpu/x86_64" -flto -flto-partition=none -fdata-sections -ffunction-sections -funsafe-loop-optimizations -finline-limit=512 -march=native -mwindows -lopengl32 -lws2_32
Type make to compile your fresh cen64 =D
Last edited by krom on Thu Jul 31, 2014 2:17 pm, edited 1 time in total.

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: New core/version 0.3

Post by Nacho » Tue Jul 29, 2014 6:25 am

Well... I feel kinda dumb asking this, but when I try Plasma Demo or Fire Demo by LaC, I get hundreds of
...
bus_read_word: Failed to access: 0x00A10D78
bus_read_word: Failed to access: 0x00A10D7C
bus_read_word: Failed to access: 0x00A10D80
bus_read_word: Failed to access: 0x00A10D84
bus_read_word: Failed to access: 0x00A10D88
bus_read_word: Failed to access: 0x00A10D8C
bus_read_word: Failed to access: 0x00A10D90
bus_read_word: Failed to access: 0x00A10D94
bus_read_word: Failed to access: 0x00A10D98
bus_read_word: Failed to access: 0x00A10D9C
...
What's wrong? Did I downloaded the wrong .v64 file?

cen64 was compile in Release mode.
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Tue Jul 29, 2014 6:44 am

MarathonMan wrote:Hopefully... can't see without a RDP plugin. ;)
tl;dr: The inline assembly is an unfortunately requirement for hardware-acceleration of MIPS instructions on x86 due to semantic differences in the ISAs.
That is a shame. I guess there's nothing to do to get over the differences...

Nacho wrote:What's wrong? Did I downloaded the wrong .v64 file?
Maybe you need to convert the demo to .z64 format, because CEN64 is big-endian as the N64 hardware, and all ROM must be in a big-endian format (.z64, while .v64 is little-endian) in order to run on it.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: New core/version 0.3

Post by Nacho » Tue Jul 29, 2014 7:33 am

Snowstorm64 wrote: Maybe you need to convert the demo to .z64 format, because CEN64 is big-endian as the N64 hardware, and all ROM must be in a big-endian format (.z64, while .v64 is little-endian) in order to run on it.
Tried the .z64 file of plasma demo. Works fine ;) Thanks
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

User avatar
Sintendo
Posts: 25
Joined: Thu Oct 31, 2013 9:11 am

Re: New core/version 0.3

Post by Sintendo » Tue Jul 29, 2014 9:38 am

Snowstorm64 wrote:
MarathonMan wrote:Hopefully... can't see without a RDP plugin. ;)
tl;dr: The inline assembly is an unfortunately requirement for hardware-acceleration of MIPS instructions on x86 due to semantic differences in the ISAs.
That is a shame. I guess there's nothing to do to get over the differences...
It should be possible to write a software floating point backend that exactly matches MIPS floating points using integer arithmetic, but that would also be much slower.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Tue Jul 29, 2014 10:19 am

Sintendo wrote:It should be possible to write a software floating point backend that exactly matches MIPS floating points using integer arithmetic, but that would also be much slower.
Bingo. I've been trying to make OS/architecture/etc. components as modular as possible in this regards.

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Tue Jul 29, 2014 5:57 pm

I just saw you choosed to "return by value".
Is it me or creating a variable at each call can be costly? Instead of allocate the value once for all (before all calls) and use a reference?

Return value optimization (RVO) only rely on compiler. Is it not more efficient to use explicit references?

I just ask.

Also, reading this I realize you use struct struct_name everytime you need a struct somewhere. I humbly advise to define your structs using typedef struct and postfix their names with _t. In our case: struct_name_t, or, in the commit case: gl_window_t.

This is a well known simple and strict convention (typedef struct = _t) that make the code easier to read (IMHO).

What do you think of this? Is there any advantage to write/read struct blabla instead of blabla_t?

As there is no way to code review your code directly on the repos I do it there. :)

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Tue Jul 29, 2014 7:35 pm

Narann wrote:As there is no way to code review your code directly on the repos I do it there. :)
I would like to request that a stash server and some sort of CI server be set up for automated builds. I am willing to pay the $20 it would cost to set this up.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Wed Jul 30, 2014 12:05 am

Narann wrote:Return value optimization (RVO) only rely on compiler. Is it not more efficient to use explicit references?
The forced use of non-inlined-assembly on the Windows size was the key decision here. Because cl.exe doesn't allow x64 assembly to be inlined, you have to call a function. Since you're making a call, it's cheaper to just return through the (free) ABI return register instead of having to have the caller pass an additional pointer parameter on the stack, and have the callee write through it. Thoughts?
Narann wrote:This is a well known simple and strict convention (typedef struct = _t) that make the code easier to read (IMHO).

What do you think of this? Is there any advantage to write/read struct blabla instead of blabla_t?
I actually used to use that style, but have found myself aligning more with the kernel programming style guide lately, which specifically advises against this practice! In addition to what it mentions, the typedef scenario starts to get really hairy when you need to do forward references (IMO), because you need to redeclare the typedef of the forward reference as well (or use the original type).

Not my thing, I guess. ;)
beannaich wrote:I would like to request that a stash server and some sort of CI server be set up for automated builds. I am willing to pay the $20 it would cost to set this up.
Automated builds are something I"ve been wanted to get going for a long time. Combined with regression tests.... that'd be killer.

I (briefly) looked at Stash, but was really turned off by the user limit. 10 users is probably sufficient for now, but if it ever went over that I think I'd go broke ($1,800!). Moreover, I could put together automated Linux builds in a matter of minutes now that I'm using CMake and managing my own git server. Though I'm not so sure how MSVC builds would work. Anyone have any experience with that? I could use mingw64-gcc, but...

I'm also definitely not opposed to a bug-tracker, either. Seems like a lot of people are just using github for this anymore? I used matnis before at my former employers, but that seems a little overkill for a project of this size.

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: New core/version 0.3

Post by beannaich » Wed Jul 30, 2014 12:41 am

Automated builds are something I"ve been wanted to get going for a long time. Combined with regression tests.... that'd be killer.

I (briefly) looked at Stash, but was really turned off by the user limit. 10 users is probably sufficient for now, but if it ever went over that I think I'd go broke ($1,800!). Moreover, I could put together automated Linux builds in a matter of minutes now that I'm using CMake and managing my own git server. Though I'm not so sure how MSVC builds would work. Anyone have any experience with that? I could use mingw64-gcc, but...

I'm also definitely not opposed to a bug-tracker, either. Seems like a lot of people are just using github for this anymore? I used matnis before at my former employers, but that seems a little overkill for a project of this size.
For self hosting a git, stash is very nice. We use it at work, because we have to. Github and bitbucket work well too. Either one of those 3 can be easily implemented into a ci server with the hooks api. Id host a bamboo server for windows msvc builds if youd like.

For bug tracking, github or bitbucket are the easiest (and free). But many tools exist for this as well.

User avatar
bsmiles32
Posts: 8
Joined: Tue Jul 29, 2014 1:58 pm

Re: New core/version 0.3

Post by bsmiles32 » Wed Jul 30, 2014 2:11 am

This is a well known simple and strict convention (typedef struct = _t) that make the code easier to read (IMHO).
I used that style for some time, but I learned recently that it can conflict with POSIX names as "_t" is a reserved suffix:
http://pubs.opengroup.org/onlinepubs/00 ... 02_02.html

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Wed Jul 30, 2014 8:21 am

beannaich wrote:Id host a bamboo server for windows msvc builds if youd like.
That'd be awesome! I'd host it here, too, but I'm guessing that Bamboo needs to be run on top of a Windows server...? I know very little about web programming, tools, and the like...

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Wed Jul 30, 2014 8:56 am

I have released my 1st RDP/Triangle demos / tests on my github:
16-Bit: https://github.com/PeterLemon/N64/tree/ ... gle320x240
32-Bit: https://github.com/PeterLemon/N64/tree/ ... gle320x240
This is a culmination of 10 years work!! I wanted to make this type of demo at the start of 2004.
After about 1000 tests across the years, on my Doctor64 followed by using cen64 & my N64 flash cart =D
So thank you MarathonMan for making this possible, as without the old cen64 to test with, it would have taken me much longer =D

I have tried to make it simple enough for anyone to get an understanding of triangle setup on the N64,
I have included python scripts to calculate triangles depending on if you want Left or Right Major directions.
I have only done 2D filled triangle demos (No Z-Buffer) using the N64 fill mode, the simplest triangles, but they have the highest fill rate on the N64 hardware.
So if you want to statically play high polygon 3D video frames with pre-filled colors, or make a Starblade type shooter on rails these could be the triangles for you =D
They will also serve as great tests for MarathonMan, when he starts to add the RDP back into the new cen64.

Next I will make a quick 2D/3D Lib using the main N64 CPU to make a "Rotate Triangle" demo,
which will enable me to check I have my triangle slope calculations from my python scripts correct for all triangles.
Then I will experiment with converting the CPU Lib into vector maths & offloaded it onto the RSP.
I'll also make loads more N64 triangle demos showing off all the different hardware modes with shading textures & Z-Buffer too =D

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Wed Jul 30, 2014 11:29 am

I read the Kernel Coding Style - Chapter 5: Typedefs and they have valid arguments: Typedef _t are for types with zero portability and requiere low level accessor, struct are for... structures of portable types. All of this is more and more valid the project is big.

Today I get a new strong point against this typedef _t approach, thanks MarathonMan!

I never heard about this Linux Kernel Coding Style before, even if it's obivious their was existing since years) and (I didn't read them all yet but) they seems to be a relevant choice for ambitious C project.

MarathonMan wrote:In addition to what it mentions, the typedef scenario starts to get really hairy when you need to do forward references (IMO), because you need to redeclare the typedef of the forward reference as well (or use the original type).
I guess a way to deal with that is to have a strong header organization. This way you include only one header and have every typedef. But good point anyway, once again, on big projects (like Cen64) this problem can appear.
MarathonMan wrote:Automated builds are something I"ve been wanted to get going for a long time. Combined with regression tests.... that'd be killer.
I'm sure everybody here know this better than me but I just say:
Low level unit test: A set of unit test per function (aka test the function).
Result oriented unit test: Particulary relevant on hardware emulators: Apply a serie of complex operations to the reel hardware, dump the results (register, memory, etc..). Apply the same serie on the emulator and compare. I'm not sure if the N64 hardware allow to dump everything (TMEM and so).

About this, appleseed render engine had to deal with that. I strongly suggest you read this thread as using Travis-CI Nicholas has been able to also add regression test inside.

About auto build. Maybe this could be a good read. If you choose to use server side commit-builds (hugh!)
MarathonMan wrote:I'm also definitely not opposed to a bug-tracker, either. Seems like a lot of people are just using github for this anymore?
As your project is not on Github anymore there is no interest to use it just for bug tracking.
MarathonMan wrote:I used matnis before at my former employers, but that seems a little overkill for a project of this size.
Also used it in the past, I tend to agree (even if its certainly the best in it's category). You have Trac and Redmine. Both of them provide (bad) wiki and (good) roadmap. I also suggest to disable registering on the bug tracker and sync forum registration with it.

EDIT: I forgot Lighthouse which seems to be simple.

Off side, you also have Kaban approach. Not sure if it's a good thing for Open Source project but I tend to like the "dev communication using board". You see who is doing what.

@krom This is fantastic! QoQ

I will maybe seems a little boring but seriously: libdragon (the main/only open source lib for N64 development) lack of draw triangle supports (because as you seens for years, this is not easy at all). If one day you want to make all you gathered knowledges usable by everybody I strongly suggest you try to improve libdragon instead of create a new one (code is well written). A lot of peoples on the N64 dev scene would love this (and if you became a libdragon contributor, we could diminish the fragmentation your possible lib could create).

If you became libdragon contributor, your test roms could be integrated and so libdragon could became the main open source SDK for N64.

It your free time so it's up to you but this choice would be good for the N64 open source scene. I'm so affraid the "new move" of cycle accurate N64 emulator follow the horrible ways N64 HLE did in the past that bring to a massive level of fragmentation while the best option was to gather talented peoples instead of make them work alone.

Anyway, I'm very impressed, keep your code as clean. I read some "specific" parts of your roms and it's very straight forward.

To unit test the RDP, one approch could be to dump (on N64 hardware using GameShark I guess) the framebuffer using your roms. Store them in the unit test set and do a compare of the Cen64 Framebuffer.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Wed Jul 30, 2014 5:58 pm

MarathonMan wrote:I guess a way to deal with that is to have a strong header organization. This way you include only one header and have every typedef. But good point anyway, once again, on big projects (like Cen64) this problem can appear.
Sometimes, not even good header organization is good enough:

object_x.h:

Code: Select all

#ifndef OBJECT_X_H
#define OBJECT_X_H

struct object_x {
   ...
   struct object_y *y_ptr;
   ...
};

#endif
object_y.h:

Code: Select all

#ifndef OBJECT_Y_H
#define OBJECT_Y_H

struct object_y {
   ...
   struct object_x *x_ptr;
   ...
};

#endif
You would need a forward reference. :shock:
MarathonMan wrote:About auto build. Maybe this could be a good read. If you choose to use server side commit-builds (hugh!)
I was using mingw64-gcc for the "old" CEN64 builds. While it does produce fast binaries, the binaries get statically linked against GNU libc (I think, at least... I'm guessing). I might definitely do something like this as a stopgap measure, though...

Thanks for the bug tracking systems, btw... I will look into each of those!

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Thu Jul 31, 2014 5:07 am

What is this branch "cache" supposed to do? It seems to sinks the VI/s count by ~25, although I see hardly any difference in 'real' speed.

Yes, I restored the VI/s function. Edit the file vi/controller.c, add '#include <time.h>' to the top, and then go to line 68, where you can replace the part with this:

Code: Select all

    if (vi->frame_count++ == 9) {
      float vis = CLOCKS_PER_SEC * (float) 10 / (clock() - vi->start_time);

      printf("VI/s: %.2f\n", vis);
      vi->start_time = clock();
      vi->frame_count = 0;
    }
(Although I don't know if this method is correct, but it seems be credible, so it works for me :P)
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Jul 31, 2014 8:57 am

Snowstorm64 wrote:What is this branch "cache" supposed to do? It seems to sinks the VI/s count by ~25, although I see hardly any difference in 'real' speed.
It implements the instruction caches. Since the caches are now present, more work can be performed and simulated performance drops.

BUT:

There should be an additional CMake option -- "VR4300_BUSY_WAIT_DETECTION" -- that was also introduced alongside the caches. It currently defaults to OFF because it's experimental. However, if you turn it on, CEN64 will try to detect the busy wait loops that occur in libultra (i.e., the sections of libultra or other commercial ROMs that sit in an infinite loop and perform no work. The only way to break out of these loops is an interrupt. They're very common to see at the end of a frame when the ROM has computed all that it wants to compute for that frame.

Thus, when I'm able to detect such conditions, I can special case the simulation to check for very few things each cycle -- huge performance boost with no reduction in accuracy.
Snowstorm64 wrote:(Although I don't know if this method is correct, but it seems be credible, so it works for me :P)
That is the correct way to get VI/s. I need to do more work on the OS side of things before restoring it without GLFW...

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Jul 31, 2014 1:53 pm

Firstly I would like to say how impressed I am, I just compiled the newest cen64 with the new busy wait detection for windows, and lots of demos are running full speed now (60 VI's)!!
MarathonMan wrote:I was using mingw64-gcc for the "old" CEN64 builds. While it does produce fast binaries, the binaries get statically linked against GNU libc (I think, at least... I'm guessing). I might definitely do something like this as a stopgap measure, though...
You will be happy to know, that if you use the makefile from my mingw compile tutorial I wrote on this topic & edit the common.h file:

Code: Select all

#cmakedefine VR4300_BUSY_WAIT_DETECTION
To:

Code: Select all

#define VR4300_BUSY_WAIT_DETECTION
It produces a 64-bit windows binary that is only 90Kb in size, so it does not statically link the GNU libc anymore =D

There are a few reasons I think the cen64 windows build should be compiled by mingw rather than MSVC for our official builds:
1. It will produce faster binaries than MSVC.
2. It will produce more compatible binaries across windows platforms, e.g newer MSVC versions compile 64-bit executables that do not run on Windows XP 64 etc.
3. As the Linux build uses GCC, it makes cross platform stabilty better to use the same C compiler across all O.S binaries.

Anyway I thought I would share my thoughts on the subject, hope this helps =D

** EDIT
Also I just noticed that you are using the compile option "-fuse-linker-plugin", which I think is not needed in the new cen64...
The old cen64 (as you know!) was like lots of libs being linked together for the different portions of the emu code, and that option was needed to compile & link correctly all the stuff together.
Now cen64 is much more clean, I do not need it anymore to compile my mingw64 builds of the newest cen64, so I think you can omit it from the CMakeLists.txt file =D

User avatar
Snowstorm64
Posts: 303
Joined: Sun Oct 20, 2013 8:22 pm

Re: New core/version 0.3

Post by Snowstorm64 » Thu Jul 31, 2014 2:36 pm

MarathonMan wrote: It implements the instruction caches. Since the caches are now present, more work can be performed and simulated performance drops.

BUT:

There should be an additional CMake option -- "VR4300_BUSY_WAIT_DETECTION" -- that was also introduced alongside the caches. It currently defaults to OFF because it's experimental. However, if you turn it on, CEN64 will try to detect the busy wait loops that occur in libultra (i.e., the sections of libultra or other commercial ROMs that sit in an infinite loop and perform no work. The only way to break out of these loops is an interrupt. They're very common to see at the end of a frame when the ROM has computed all that it wants to compute for that frame.

Thus, when I'm able to detect such conditions, I can special case the simulation to check for very few things each cycle -- huge performance boost with no reduction in accuracy.

That is the correct way to get VI/s. I need to do more work on the OS side of things before restoring it without GLFW...
Thank you for the confirmation. :D As for the new option, an applause for you! Some demo like Plasma Demo have seen a huge boost performance, like 30 VI/s, making it run at 85 VI/s. That's a bit overkill! :lol: Even krom's demos are running at stable 69,5-70 VI/s on my PC (two exceptions are Julia and Mandelbrot that are running at 40 VI/s). You should add a 60 VI/s limit for when commercial ROMs will be bootable, otherwise they are a bit... unplayable. (I'm looking at MK64 that in certain cases it was running near at 50 VI/s with old core) ;)

Also, MarathonMan, I discovered another issue: Try to boot a non-existent ROM. It opens the window and CEN64 runs anyway, although it shouldn't happen for both.
OS: Debian GNU/Linux Jessie (8.0)
CPU: Intel i7 4770K @ 3.5 GHz
Build: AVX (compiled from git)

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Thu Jul 31, 2014 2:44 pm

Snowstorm64 wrote:Also, MarathonMan, I discovered another issue: Try to boot a non-existent ROM. It opens the window and CEN64 runs anyway, although it shouldn't happen for both.
In what way is it different from the N64 behavior? :P

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Jul 31, 2014 3:49 pm

krom wrote:There are a few reasons I think the cen64 windows build should be compiled by mingw rather than MSVC for our official builds:
1. It will produce faster binaries than MSVC.
2. It will produce more compatible binaries across windows platforms, e.g newer MSVC versions compile 64-bit executables that do not run on Windows XP 64 etc.
3. As the Linux build uses GCC, it makes cross platform stabilty better to use the same C compiler across all O.S binaries.
Excellent points. I'm totally convinced, especially with the 91KiB figure.

I'm definitely still keeping around Clang/MSVC support for the purposes of debugging and running the code through different compilers, though.

The really nice feature about mingw64 is that I can actually build binaries for both Linux and Windows anytime the git server on cen64.com gets a push (see the link that Narann mentioned yesterday). I'll still likely keep MSVC builds around somehow, as beannaich has been helping to setup some CI system w/ build agents.
Snowstorm64 wrote:Thank you for the confirmation. :D As for the new option, an applause for you! Some demo like Plasma Demo have seen a huge boost performance, like 30 VI/s, making it run at 85 VI/s. That's a bit overkill! :lol: Even krom's demos are running at stable 69,5-70 VI/s on my PC (two exceptions are Julia and Mandelbrot that are running at 40 VI/s). You should add a 60 VI/s limit for when commercial ROMs will be bootable, otherwise they are a bit... unplayable. (I'm looking at MK64 that in certain cases it was running near at 50 VI/s with old core) ;)

Also, MarathonMan, I discovered another issue: Try to boot a non-existent ROM. It opens the window and CEN64 runs anyway, although it shouldn't happen for both.
Two things:

1) I have been running with VSync on (and my monitor is 60Hz)... so I'll have to look into getting some kind of system setup that forces VSync or something...

2) I fixed that bug, a handful of others, and have another wave of optimizations that I'll push later...

krom
Posts: 72
Joined: Sat Oct 05, 2013 2:19 am

Re: New core/version 0.3

Post by krom » Thu Jul 31, 2014 4:07 pm

MarathonMan wrote:Excellent points. I'm totally convinced, especially with the 91KiB figure.
Sweet I am glad you saw my point of view =D
BTW the binary is only 90kb (I edited my post)!
P.S did you check out my edit above about the GCC compile option "-fuse-linker-plugin"? I think it makes sense to not use it...

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: New core/version 0.3

Post by MarathonMan » Thu Jul 31, 2014 4:29 pm

krom wrote:
MarathonMan wrote:Excellent points. I'm totally convinced, especially with the 91KiB figure.
Sweet I am glad you saw my point of view =D
BTW the binary is only 90kb (I edited my post)!
P.S did you check out my edit above about the GCC compile option "-fuse-linker-plugin"? I think it makes sense to not use it...
I did not see the edit --

You are most certainly correct! I'll push that fix along with a barrage of others tonight. Thank you!

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: New core/version 0.3

Post by Narann » Thu Jul 31, 2014 5:04 pm

I just bounce on the "it could be great to support multiple compilers".

Maybe it's not time for that yet but when the time to support binary bundles to end user, I suggest you choose a compiler to support build for (bugfix, performance, etc...) and, if you want (debugging is a valid point), add compiling/project support for other compilers on the dev side.

The point is:

1) To avoid a Windows user have to choose between different versions depending on a compiler:
  • Cen64_1.0.0-mingw.zip
    Cen64_1.0.0-clang.zip
    Cen64_1.0.0-msvc10.zip
    Cen64_1.0.0-msvc12.zip
    Cen64_1.0.0-msvc9.zip
2) Keep a consistent code and avoid a lot of #pragma or "compiler abstraction layer" headers* depending on compilers.

*"compiler abstraction layer" headers is an inefficient practice on a performance side (IMHO) because it considere every compiler work the same way and only some keywords changes.

Same for performances, some compilers will be more efficient depending on certain option/combinations and if you support them all (gcc/clang/msvc9-10-12), you could be seduce to "just add one pragma here" to win 2 VI/s for a specific build (and sometime even make you write two different functions doing exactly the same thing in two different maneer).

It's up to you but when I read the nice Cen64 code (seriously, I love read the various commit, even if I don't understand them all), it would be sad to make it dirty with "compiler branching" stuff. Even focus on one compiler can be a problem depending on the version (gcc4.4->4.9).

Anyway, as Cen64 is quite low level (no C++ templates everywhere) there is a hope that points I'm raising above will not be problematic.

Maybe you already think about this but the choice for the "main compiler" (the one you will focus performance sensitive work) is not something trivial. To be honnest, I would also push to choose one version of gcc to focus on an maybe update this version every year.

There is better devs than me here so I'm sure the good choice will be made but as the point asn't been discuss I just share my humble two cents on the "multi compiler" point. :)

Locked

Who is online

Users browsing this forum: No registered users and 2 guests