Where's my runtime going?

Discuss topics related to development here.
Post Reply
User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Where's my runtime going?

Post by MarathonMan » Thu May 07, 2015 7:56 pm

Results collected using perf on Debian Jessie on a i7-4558U. For each ROM, I just let it run for a few minutes without pressing any buttons.

Since GCC does a lot of optimizations, a lot of runtime is associated with under the "device_*' classifier; this is essentially un-attributable to any particular entity. However, it's safe to assume that a large portion of it is divided between the RSP, VR4300, RDP display list decoding, and VI. Even so, there's still some interesting results (at least compared to what I had expected):

Mario Kart 64:
RSP (10.08%) -- perf report --stdio | grep -i rsp | awk -F% '{sum += $1} END { print sum; }'
VR4300 (11.2%) -- perf report --stdio | grep -i vr4300 | awk -F% '{sum += $1} END { print sum; }'
Device (44.56%) -- perf report --stdio | grep -i device | awk -F% '{sum += $1} END { print sum; }'

Remaining stuff is mostly RDP (~27.12%):

Code: Select all

perf report --stdio | grep -iv device | grep -iv rsp | grep -iv vr4300

     8.46%    cen64  cen64                  [.] render_spans_1cycle_notexel1.lto_priv.145          
     7.21%    cen64  cen64                  [.] render_spans_1cycle_notex.lto_priv.144             
     3.65%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.2                 
     2.48%    cen64  cen64                  [.] fetch_texel_quadro.lto_priv.9                      
     1.53%    cen64  i965_dri.so            [.] 0x00000000000f64fc                                 
     1.49%    cen64  cen64                  [.] fbwrite_16                                         
     1.37%    cen64  cen64                  [.] fbread_16                                          
     0.80%    cen64  cen64                  [.] edgewalker_for_loads.constprop.1                   
     0.71%    cen64  cen64                  [.] edgewalker_for_prims.lto_priv.122                  
     0.48%    cen64  i965_dri.so            [.] 0x00000000000f6516                                 
     0.39%    cen64  cen64                  [.] fbfill_16                                          
     0.38%    cen64  i965_dri.so            [.] 0x00000000000f6509                                 
     0.30%    cen64  cen64                  [.] bus_read_word                                      
     0.22%    cen64  cen64                  [.] fetch_texel_entlut_quadro.lto_priv.10              
     0.14%    cen64  libc-2.19.so           [.] __memcpy_sse2_unaligned                            
     0.13%    cen64  cen64                  [.] get_dither_nothing.lto_priv.151                    
     0.11%    cen64  cen64                  [.] write_dp_regs                                      
     0.10%    cen64  i965_dri.so            [.] 0x0000000000105298                                 
     0.10%    cen64  cen64                  [.] rgb_dither_nothing.lto_priv.142
Super Smash Bros.
RSP (9.22%) -- perf report --stdio | grep -i rsp | awk -F% '{sum += $1} END { print sum; }'
VR4300 (8.41%) -- perf report --stdio | grep -i vr4300 | awk -F% '{sum += $1} END { print sum; }'
Device (35.68%) -- perf report --stdio | grep -i device | awk -F% '{sum += $1} END { print sum; }'

Remaining stuff is mostly RDP (~40.94%):

Code: Select all

perf report --stdio | grep -iv device | grep -iv rsp | grep -iv vr4300

    13.81%    cen64  cen64                  [.] render_spans_1cycle_notexel1.lto_priv.145         
     7.02%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.2                
     3.44%    cen64  cen64                  [.] render_spans_1cycle_notex.lto_priv.144            
     3.13%    cen64  cen64                  [.] fetch_texel_entlut_quadro.lto_priv.10             
     2.94%    cen64  cen64                  [.] fetch_texel_quadro.lto_priv.9                     
     1.89%    cen64  cen64                  [.] rgb_dither_complete.lto_priv.141                  
     1.83%    cen64  cen64                  [.] render_spans_2cycle_notexel1.lto_priv.139         
     1.56%    cen64  cen64                  [.] fbwrite_16                                        
     1.47%    cen64  cen64                  [.] fbread_16                                         
     1.11%    cen64  i965_dri.so            [.] 0x00000000000f64fc                                
     1.06%    cen64  cen64                  [.] edgewalker_for_prims.lto_priv.122                 
     0.81%    cen64  cen64                  [.] render_spans_2cycle_notex.lto_priv.146            
     0.67%    cen64  cen64                  [.] edgewalker_for_loads.constprop.1                  
     0.66%    cen64  cen64                  [.] fbfill_16                                         
     0.65%    cen64  cen64                  [.] get_dither_only.lto_priv.150                      
     0.35%    cen64  i965_dri.so            [.] 0x00000000000f6516                                
     0.33%    cen64  cen64                  [.] bus_read_word                                     
     0.28%    cen64  i965_dri.so            [.] 0x00000000000f6509                                
     0.13%    cen64  cen64                  [.] fbread2_16                                        
     0.12%    cen64  libc-2.19.so           [.] __memcpy_sse2_unaligned                           
     0.12%    cen64  libc-2.19.so           [.] memset                                            
     0.11%    cen64  cen64                  [.] bus_write_word.constprop.8                        
     0.08%    cen64  cen64                  [.] write_dp_regs
Zelda: Ocarina of Time
RSP (8.62%) -- perf report --stdio | grep -i rsp | awk -F% '{sum += $1} END { print sum; }'
VR4300 (9.92%) -- perf report --stdio | grep -i vr4300 | awk -F% '{sum += $1} END { print sum; }'
Device (37.51%) -- perf report --stdio | grep -i device | awk -F% '{sum += $1} END { print sum; }'

Remaining stuff is mostly RDP (~38.25%):

Code: Select all

perf report --stdio | grep -iv device | grep -iv rsp | grep -iv vr4300

     8.97%    cen64  cen64                  [.] render_spans_2cycle_notexelnext.lto_priv.147      
     5.57%    cen64  cen64                  [.] render_spans_2cycle_notexel1.lto_priv.139         
     5.39%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.2                
     3.45%    cen64  cen64                  [.] fetch_texel_quadro.lto_priv.9                     
     3.13%    cen64  cen64                  [.] fetch_texel_entlut_quadro.lto_priv.10             
     2.92%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.3                
     1.48%    cen64  cen64                  [.] rgb_dither_complete.lto_priv.141                  
     1.37%    cen64  i965_dri.so            [.] 0x00000000000f64fc                                
     0.97%    cen64  cen64                  [.] fbwrite_16                                        
     0.95%    cen64  cen64                  [.] edgewalker_for_loads.constprop.1                  
     0.93%    cen64  cen64                  [.] fbread2_16                                        
     0.75%    cen64  cen64                  [.] render_spans_1cycle_notexel1.lto_priv.145         
     0.68%    cen64  cen64                  [.] edgewalker_for_prims.lto_priv.122                 
     0.58%    cen64  cen64                  [.] render_spans_1cycle_notex.lto_priv.144            
     0.49%    cen64  cen64                  [.] fbfill_16                                         
     0.43%    cen64  i965_dri.so            [.] 0x00000000000f6516                                
     0.41%    cen64  cen64                  [.] get_dither_only.lto_priv.150                      
     0.37%    cen64  cen64                  [.] bus_read_word                                     
     0.35%    cen64  i965_dri.so            [.] 0x00000000000f6509                                
     0.21%    cen64  cen64                  [.] render_spans_2cycle_complete.lto_priv.148         
     0.15%    cen64  libc-2.19.so           [.] __memcpy_sse2_unaligned                           
     0.14%    cen64  cen64                  [.] fbread_16                                         
     0.10%    cen64  cen64                  [.] bus_write_word.constprop.8                    
Super Mario 64
RSP (14.17%) -- perf report --stdio | grep -i rsp | awk -F% '{sum += $1} END { print sum; }'
VR4300 (10.51%) -- perf report --stdio | grep -i vr4300 | awk -F% '{sum += $1} END { print sum; }'
Device (48.04%) -- perf report --stdio | grep -i device | awk -F% '{sum += $1} END { print sum; }'

Remaining stuff is mostly RDP (~21.67%):

Code: Select all

perf report --stdio | grep -iv device | grep -iv rsp | grep -iv vr4300

     8.18%    cen64  cen64                  [.] render_spans_1cycle_notexel1.lto_priv.145         
     3.97%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.2                
     2.87%    cen64  cen64                  [.] fetch_texel_quadro.lto_priv.9                     
     1.32%    cen64  cen64                  [.] render_spans_2cycle_notexel1.lto_priv.139         
     1.13%    cen64  cen64                  [.] rgb_dither_complete.lto_priv.141                  
     1.09%    cen64  i965_dri.so            [.] 0x00000000000f64fc                                
     0.98%    cen64  cen64                  [.] edgewalker_for_prims.lto_priv.122                 
     0.82%    cen64  cen64                  [.] fbwrite_16                                        
     0.78%    cen64  cen64                  [.] fbread_16                                         
     0.65%    cen64  cen64                  [.] edgewalker_for_loads.constprop.1                  
     0.48%    cen64  cen64                  [.] bus_read_word                                     
     0.34%    cen64  i965_dri.so            [.] 0x00000000000f6516                                
     0.32%    cen64  cen64                  [.] render_spans_1cycle_notex.lto_priv.144            
     0.29%    cen64  cen64                  [.] get_dither_only.lto_priv.150                      
     0.28%    cen64  i965_dri.so            [.] 0x00000000000f6509                                
     0.24%    cen64  cen64                  [.] fbfill_16                                         
     0.12%    cen64  cen64                  [.] bus_write_word.constprop.8                        
     0.12%    cen64  cen64                  [.] write_dp_regs                                     
     0.11%    cen64  libc-2.19.so           [.] __memcpy_sse2_unaligned                           
     0.09%    cen64  libc-2.19.so           [.] memset                                            
     0.07%    cen64  i965_dri.so            [.] 0x0000000000105298                                
     0.07%    cen64  i965_dri.so            [.] 0x00000000000f6505                                
     0.07%    cen64  i965_dri.so            [.] 0x00000000000f6540                                
     0.07%    cen64  i965_dri.so            [.] 0x000000000010528a                                
     0.07%    cen64  cen64                  [.] fbread2_16          
LaC's "fire" demo: (no RSP/RDP)

Code: Select all

perf report --stdio

    75.33%    cen64  cen64                  [.] device_spin.lto_priv.19                            
     7.29%    cen64  cen64                  [.] VR4300_LOAD_STORE                                  
     4.44%    cen64  cen64                  [.] vr4300_cycle_slow_ex.lto_priv.46                   
     2.02%    cen64  cen64                  [.] VR4300_ADDIU_LUI_SUBIU                             
     1.50%    cen64  i965_dri.so            [.] 0x00000000000f64fc                                 
     1.12%    cen64  cen64                  [.] VR4300_ADDU_SUBU                                   
     0.84%    cen64  cen64                  [.] VR4300_SLL_SLLV                                    
     0.49%    cen64  i965_dri.so            [.] 0x00000000000f6516                                 
     0.43%    cen64  cen64                  [.] VR4300_DCB                                         
     0.39%    cen64  i965_dri.so            [.] 0x00000000000f6509                                 
     0.39%    cen64  cen64                  [.] VR4300_ANDI_ORI_XORI                               
     0.33%    cen64  cen64                  [.] vr4300_cycle_slow_dc.lto_priv.45                   
     0.30%    cen64  cen64                  [.] VR4300_AND_OR_XOR                                  
     0.28%    cen64  cen64                  [.] VR4300_BEQ_BEQL_BNE_BNEL_BWDETECT                  
     0.23%    cen64  cen64                  [.] bus_read_word                                      
     0.16%    cen64  cen64                  [.] bus_write_word.constprop.8                         
     0.15%    cen64  cen64                  [.] VR4300_SLTIU                                       
     0.12%    cen64  libc-2.19.so           [.] __memcpy_sse2_unaligned               
Poke'mon Snap!
RSP (10.66%) -- perf report --stdio | grep -i rsp | awk -F% '{sum += $1} END { print sum; }'
VR4300 (6.01%) -- perf report --stdio | grep -i vr4300 | awk -F% '{sum += $1} END { print sum; }'
Device (25.77%) -- perf report --stdio | grep -i device | awk -F% '{sum += $1} END { print sum; }'

Remaining stuff is mostly RDP (~53.91%):

Code: Select all

perf report --stdio | grep -iv device | grep -iv rsp | grep -iv vr4300

    27.20%    cen64  cen64                  [.] render_spans_2cycle_notexel1.lto_priv.139          
    10.21%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.2                 
     4.97%    cen64  cen64                  [.] fetch_texel_entlut_quadro.lto_priv.10              
     2.83%    cen64  cen64                  [.] fetch_texel_quadro.lto_priv.9                      
     1.94%    cen64  cen64                  [.] rgb_dither_complete.lto_priv.141                   
     1.54%    cen64  cen64                  [.] fbread2_16                                         
     1.37%    cen64  cen64                  [.] edgewalker_for_prims.lto_priv.122                  
     1.14%    cen64  cen64                  [.] fbwrite_16                                         
     0.83%    cen64  cen64                  [.] fbfill_16                                          
     0.72%    cen64  cen64                  [.] get_dither_only.lto_priv.150                       
     0.62%    cen64  i965_dri.so            [.] 0x00000000000f64fc                                 
     0.55%    cen64  cen64                  [.] edgewalker_for_loads.constprop.1                   
     0.36%    cen64  cen64                  [.] render_spans_2cycle_notexelnext.lto_priv.147       
     0.25%    cen64  cen64                  [.] bus_read_word                                      
     0.18%    cen64  i965_dri.so            [.] 0x00000000000f6516                                 
     0.17%    cen64  libc-2.19.so           [.] memset                                             
     0.17%    cen64  cen64                  [.] render_spans_1cycle_notexel1.lto_priv.145          
     0.16%    cen64  i965_dri.so            [.] 0x00000000000f6509                                 
     0.15%    cen64  cen64                  [.] render_spans_2cycle_notex.lto_priv.146             
     0.13%    cen64  cen64                  [.] texture_pipeline_cycle.constprop.3                 
     0.11%    cen64  cen64                  [.] render_spans_1cycle_notex.lto_priv.144             
     0.10%    cen64  cen64                  [.] bus_write_word.constprop.8                         
     0.07%    cen64  cen64                  [.] write_dp_regs                                      
     0.05%    cen64  libc-2.19.so           [.] __memcpy_sse2_unaligned
Welp... I didn't realize the RDP was putting that much of a damper on the performance in some instances.

User avatar
OldGnashburg
Posts: 91
Joined: Tue Nov 19, 2013 3:00 pm
Location: Sherwood Park, Alberta, Canada: A place with free universal healthcare, and lots and lots of oil.

Re: Where's my runtime going?

Post by OldGnashburg » Sat May 09, 2015 11:48 pm

What does this mean for you and CEN64?
Gnash, Gnash, Gnash...

ShadowFX
Posts: 86
Joined: Sat Oct 05, 2013 2:08 am
Location: The Netherlands

Re: Where's my runtime going?

Post by ShadowFX » Sun May 10, 2015 3:10 am

For starters, it meant a new build :)
"Change is inevitable; progress is optional"

OS: Windows 10 Pro x64
Specs: Intel Core i7-7700K @ 4.2GHz, 16GB DDR4-RAM, NVIDIA GeForce GTX 1080 Ti
Main build: AVX (official)

User avatar
Narann
Posts: 154
Joined: Mon Jun 16, 2014 4:25 pm
Contact:

Re: Where's my runtime going?

Post by Narann » Wed May 13, 2015 3:04 pm

I'm not such surprised actually. RDP (angrylion) code is wonderfully nice and almost a RDP documentation by itself. However, IIRC, many operations could be easily vectorized. The whole code could even be multithreaded (like a tile renderer) at the cost of loosing atomic accuracy (but I'm not even sure there is any doc about RDP rendering pattern actually). IIRC, there is no locking operation during RDP processing. Everything operate on one pixel at a time. This would mean you could basely reach 4x or 8x faster speed.

But this is only from my perspective. There is maybe some dark RDP corner I don't know.

Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests