Cache fill rates

Discuss VR4300-related matter here.
Post Reply
User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Cache fill rates

Post by MarathonMan » Tue Nov 19, 2013 11:13 pm

I've measured cache read (load + invalidate) rates:
  • Instruction cache: 26 pclocks/line (@ 93.75 million pclocks/sec = 110.04 MiB/s)
  • Data cache: 24 pclocks/line (@ 93.75 million pclocks/sec = 55.02 MiB/s)
These come off the actual console. No other copies were occurring during this time (the RCP was idle).

The data cache line size is half that of the instruction cache, so that is the reason for the differences.

These values seem sound to me, but if anyone can confirm similar figures that would be great.

User avatar
juef
Posts: 31
Joined: Sun Oct 27, 2013 10:19 pm

Re: Cache fill rates

Post by juef » Wed Nov 20, 2013 8:45 am

I gladly would, but I'm not sure how to actually take such a measurement. I do have access to a 64drive and 3 different N64 consoles.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Cache fill rates

Post by MarathonMan » Wed Nov 20, 2013 10:54 am

You need to write code to measure it, I would think. At least that's how I did it. :D

It might be published or already discussed somewhere else, which is why I posted the question.

User avatar
juef
Posts: 31
Joined: Sun Oct 27, 2013 10:19 pm

Re: Cache fill rates

Post by juef » Wed Nov 20, 2013 2:13 pm

Ahhh, nevermind then, that's out of my league. Sorry!

User avatar
Devin
Posts: 14
Joined: Sun Oct 27, 2013 12:58 am

Re: Cache fill rates

Post by Devin » Wed Nov 20, 2013 4:57 pm

This looks related but I'm no programer so I cant say for sure.
http://www.dragonminded.com/n64dev/libd ... l#_details

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Cache fill rates

Post by MarathonMan » Wed Nov 20, 2013 8:53 pm

I'm glad you posted that, Devin. I scanned over the page and forgot CP1 incremented count on every other cycle. :shock:

This means that the caches are half as fast as I expected them to be:
  • Instruction cache: 52 pclocks/line (@ 93.75 million pclocks/sec = 55.02 MiB/s)
  • Data cache: 48 pclocks/line (@ 93.75 million pclocks/sec = 25.01 MiB/s)
:oops:

Man, I knew they said RDRAM was slow... but holy smokes!

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: Cache fill rates

Post by beannaich » Fri Nov 22, 2013 2:22 am

What's the average hit-rate for the instruction/data caches?

Also, if CP1 increases the counter register every other cycle, how can you determine cycle counts precisely? You might be off by one in all your measurements :P

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Cache fill rates

Post by MarathonMan » Fri Nov 22, 2013 9:45 am

beannaich wrote:What's the average hit-rate for the instruction/data caches?
Depends on the code being run... for the benchmark I used I artificially just filled and invalidated cache lines, so 100% miss.
beannaich wrote:Also, if CP1 increases the counter register every other cycle, how can you determine cycle counts precisely? You might be off by one in all your measurements :P
CP0?

I did 100k fills and averaged the time so if my measurements are a cycle or two off, its due to some cache operation interlock or something. :D

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: Cache fill rates

Post by beannaich » Sat Nov 23, 2013 1:36 am

MarathonMan wrote:Depends on the code being run... for the benchmark I used I artificially just filled and invalidated cache lines, so 100% miss.
I was very careful to word it as "what's the average" :P I wanted to get at what the impact of accurate cache timings would be. Perhaps you could keep track for debug builds so we can see what commercial games get? I think it would be cool to know how efficiently some games were programmed. Maybe output the hit rates along with frequency in the console?
MarathonMan wrote:CP0?

I did 100k fills and averaged the time so if my measurements are a cycle or two off, its due to some cache operation interlock or something. :D
Yeah, CP0, my bad. The testing procedure seems valid to me :D

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Cache fill rates

Post by MarathonMan » Sat Nov 23, 2013 8:54 am

beannaich wrote:I was very careful to word it as "what's the average" :P I wanted to get at what the impact of accurate cache timings would be. Perhaps you could keep track for debug builds so we can see what commercial games get? I think it would be cool to know how efficiently some games were programmed. Maybe output the hit rates along with frequency in the console?
Right, but even in this context, there is no average. libultra will likely have very little influence on the instruction and data-cache hit rates. It will certainly have an impact, just nominally. Most of the time, the caches are going to be dedicated to holding game data and functions. If a game uses a lot of indirect branches and function pointers, that game will likely observe a very poor instruction cache hit rate.

With instruction caches, you should be seeing at least 75% hit rates as the line size is 16 bytes (4 instruction words), and sequential code is the norm. There's also loops and other control flow structures will which cause an increase in this hit rate. The data caches are a crapshoot though; hopefully you'd see something at least 50%+, but I have no idea.

beannaich
Posts: 149
Joined: Mon Oct 21, 2013 2:43 pm

Re: Cache fill rates

Post by beannaich » Mon Nov 25, 2013 6:28 pm

beannaich wrote:Perhaps you could keep track for debug builds so we can see what commercial games get? I think it would be cool to know how efficiently some games were programmed. Maybe output the hit rates along with frequency in the console?
I still think this would be cool to see

User avatar
Exophase
Posts: 5
Joined: Mon Dec 02, 2013 1:40 pm

Re: Cache fill rates

Post by Exophase » Mon Dec 02, 2013 2:20 pm

Very interesting to see hard measurements. For years N64 had a reputation of the terrible RDRAM latency damaging CPU performance. Of those 48 cycles "only" 8-9 are spent in the cache miss state machine, so it really is pretty awful. I think the tiny cache line size of 16-bytes was poorly matched against this huge latency and being both direct mapped and small-ish at 8KB the miss rate of the dcache was probably very high. It also doesn't help that the critical word first is only in 8-byte granularity.

Was this test done with loads? I bet the situation would be even worse with a tight loop of store misses, which would move the dirty lines to the write buffer first.. then the next store would probably first stall waiting for the write buffer to drain..

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Cache fill rates

Post by MarathonMan » Mon Dec 02, 2013 2:53 pm

Exophase wrote:Was this test done with loads? I bet the situation would be even worse with a tight loop of store misses, which would move the dirty lines to the write buffer first.. then the next store would probably first stall waiting for the write buffer to drain..
Unfortunately, yes. I used CP0 to invalidate and then fill in two separate steps. Just to be safe, I also alternated between two different addresses such that they both map to the same set.

Fortunately, the VR4300 has a (3x32/64 bit?) write buffer to help mitigate some of that loss... but yes, they probably never anticipated the VR4300 to be used with such poor latency RAMs given the line sizes.

Code: Select all

  /* Use $ra for address that we'll hammer on. */
  __asm__ __volatile__("\tmfc0 %0, $9\n\tnop" : "=r"(start));

  while (counter > 0) {
    __asm__ __volatile__("\tlw $1, 00000(%0)\n\tnop\n\tcache 0x11, 00000(%0)\n\tnop" :: "r"(ptr) : "$1");
    __asm__ __volatile__("\tlw $1, 32768(%0)\n\tnop\n\tcache 0x11, 32768(%0)\n\tnop" :: "r"(ptr) : "$1");

    counter--;
  }

  __asm__("\tmfc0 %0, $9\n\tnop" : "=r"(end));
counter was kept in a register.

User avatar
Exophase
Posts: 5
Joined: Mon Dec 02, 2013 1:40 pm

Re: Cache fill rates

Post by Exophase » Mon Dec 02, 2013 4:18 pm

Ah I see, I hadn't even considered using the cache instruction, I would have just marched through a big array. Did you measure the performance of the loop w/o the loads and subtract that from the total time?It seems like the cache instructions operating on dcache can take two cycles although it's not totally clear to me what the conditions for this are.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Cache fill rates

Post by MarathonMan » Mon Dec 02, 2013 8:47 pm

Exophase wrote:Did you measure the performance of the loop w/o the loads and subtract that from the total time?
Yes, I run the exact same function with NOPs in place of the LOAD and CACHE instructions and subtract the amount of time it takes to run that function from the time it took to run the instrumented version. I think you're correct in the CACHE instruction taking two cycles; however, I believe a CP0/DC stall will only arise if the CACHE instruction is in the EX stage while the LOAD is in the DC stage, hence my attempt to separate the two with a NOP.

Of course, I could be wrong. :) I can validate this at a later time by testing using a different method, perhaps the one you mentioned. For now, however, the delays I estimated using the aforementioned routine were enough to uncover some exception handling bugs due to some of my generalizations and misunderstandings. They also appear to be accurate enough to prevent at least some ROMs (OoT) from crashing in odd instances that are mentioned in another thread on the boards, so at the least I'd like to think I'm in the ballpark range.

Of course, these figures are probably inaccurate when the RCP is performing DMAs and other events. It'll never be truly accurate until everything comes full circle.

User avatar
vexiant
Posts: 3
Joined: Thu Dec 12, 2013 4:16 pm
Contact:

Re: Cache fill rates

Post by vexiant » Fri Dec 13, 2013 2:12 pm

Devin wrote:This looks related but I'm no programer so I cant say for sure.
http://www.dragonminded.com/n64dev/libd ... l#_details
Ah yes,N64 homebrew. B)

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest