Question regarding SSE2, SSE4.1, AVX, etc

Discuss topics related to development here.
Post Reply
User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Question regarding SSE2, SSE4.1, AVX, etc

Post by Nacho » Sat Jan 03, 2015 2:35 pm

So.... As far as I know, CEN64 implements different vectorization methods depending on the different capabilities of the CPU. Right?

Then, the same opcode is written twice for, let's say, SSE2 and AVX?

When testing, should we try with different compiler options, in order to chase bugs?
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

AIO
Posts: 51
Joined: Wed Nov 05, 2014 4:56 pm

Re: Question regarding SSE2, SSE4.1, AVX, etc

Post by AIO » Sat Jan 03, 2015 3:56 pm

Nacho wrote:So.... As far as I know, CEN64 implements different vectorization methods depending on the different capabilities of the CPU. Right?

Then, the same opcode is written twice for, let's say, SSE2 and AVX?

When testing, should we try with different compiler options, in order to chase bugs?
It's highly unlikely for there to be bugs that are only present in 1 version. Just for convenience, I'd say to stick with the best option for your own hardware. The code is thoroughly tested beforehand, so there's no need for you to test different versions for the sake of chasing bugs.

However, perhaps there are other options you can test.

User avatar
MarathonMan
Site Admin
Posts: 692
Joined: Fri Oct 04, 2013 4:49 pm

Re: Question regarding SSE2, SSE4.1, AVX, etc

Post by MarathonMan » Sat Jan 03, 2015 4:13 pm

Nacho wrote:Then, the same opcode is written twice for, let's say, SSE2 and AVX?
SSE instructions have two specifiable operands (pblendvb and some others have three operands, but the mask must be %xmm0, etc). Anyways, in general, SSE operations take the form

Code: Select all

(a) op (b) -> (a)
That is, one of the registers which contains the source will be overwritten with the result. This makes scheduling a bit harder, and sometimes necessitates the need to insert dummy copy instructions in order to prevent destroying values. Suppose we needed to preserve the values of a and b, but need to perform an operation on them:

Code: Select all

(a) mov -> (c)
(a) op (c) -> (c)
AVX instructions solve the dummy copy problem by taking three operands, instead of two:

Code: Select all

(a) op (b) -> (c)
Usually, the AVX builds just elide the copies that are mandatory in SSE2 -- the algorithms are otherwise the same. The differences between the remaining SSE variants? SSE4.1 builds and up use the pblendvb instruction pretty rigorously to perform a vectorized `if (mask) select scalar a, else select scalar b`. SSSE3 builds and up use the pshufb instruction very rigorously all over the code to do everything from shuffling scalars and perform rotates/shifts by variable amounts.

User avatar
Nacho
Posts: 66
Joined: Thu Nov 07, 2013 9:25 am

Re: Question regarding SSE2, SSE4.1, AVX, etc

Post by Nacho » Sat Jan 03, 2015 6:00 pm

Great explanation! :D I was curious about how those fine pitch optimizations and tricks were implemented.
Testing CEN64 on: Intel Core i5 520M 2.4 GHz. SSE2 SSE3 SSE4.1 SSE4.2 SSSE3, but no AVX. Ubuntu Linux

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest