Page 1 of 1

SIMD

Posted: Sun Oct 18, 2009 2:46 pm
by bram
Hi there,

Has anyone investigated the viability of adding SIMD computation to Chipmunk?
Frequently, 4-way SIMD is used to do XYZW in one go, which is not really applicable to a 2d system of course.

However, a better use of 4-way SIMD is to use SoA, or structures of arrays, approach.
In a 2D particle system, e.g., you would store all x-coords of all the particles in an array, and the same for y-coords, x-velocity and y-velocity,etc.
SIMD instructions will then process 4 particles in one instruction.

To apply this to Chipmunk:
When I profile my app with Shark for iPhone, I see that I spend 50% in cpArbiterApplyImpulse.
I expect numContacts is typically one or two here, but if it was frequently 4 or more, a good tactic would be to process 4 contacts in 1 go.

Bram

PS: A good intro to SoA versus AoS is in IBM's CBE programming tutorial:
https://www-01.ibm.com/chips/techlib/te ... A80061F788

In short, it comes down to this:

struct particle { float x, float y, float z, float vel_x, float vel_y, float vel_z };
struct particle particles[1024];

this array of structures is much much slower than:

float x[1024];
float y[1024];
float z[1024];
float vel_x[1024];
float vel_y[1024];
float vel_z[1024];

because in the latter, you can always process 4 particles simultaneously using 4-way SIMD.
E.g. test whether particle is below ground plane z=0, can be done in parallel with a vector compare.
a SIMD xyzw notation will not help you in that case, but SoA does.

Bram

Re: SIMD

Posted: Thu Oct 22, 2009 3:31 pm
by ShiftZ
I suspect nobody wont do that better then you ;)