Monday, 19 August 2013

Personal SSE library

Personal SSE library

Ok, so I've been using operator overloading with some of the SSE/AVX
intrinsics to facilitate their usage in more trivial situations where
vector processing is useful. The class definition looks something like
this:
#define Float16a float __attribute__((__aligned__(16)))
class sse
{
private:
__m128 vec __attribute__((__aligned__(16)));
Float16a *temp;
public:
//=================================================================
sse();
sse(float *value);
//=================================================================
void operator + (float *param);
void operator - (float *param);
void operator * (float *param);
void operator / (float *param);
void operator % (float *param);
void operator ^ (int number);
void operator = (float *param);
void operator == (float *param);
void operator += (float *param);
void operator -= (float *param);
void operator *= (float *param);
void operator /= (float *param);
};
With each individual function bearing a resemblance to:
void sse::operator + (float *param)
{
vec = _mm_add_ps(vec, _mm_load_ps(param));
_mm_store_ps(temp, vec);
}
Thus far I have had few problems writing the code but I have run into a
few performance problems, when using when compared with farly trivial
scalar code the SSE/AVX code has a significant performance bump. I know
that this type of code can be difficult profile, but I'm not really even
sure what exactly the the bottleneck is. If there are any pointers that
can be thrown at me it would be appreciated.
Note that this is just a person project that I'm writing to further my own
knowledge of SSE/AVX, so replacing this with an external library would not
be much of a help.

No comments:

Post a Comment