cpHastySpace parallelization, data race?
Posted: Mon Jun 27, 2016 9:58 am
Chipmunk habitat:
I am interested in parallelizing the chipmunk impulse solver by using cpHastySpace in my code and wanted to clarify its use and a potential data race condition.
1) The cpHastySpace implementation is based on dividing iterations amongst threads, thus the comment in the code that >~50 iterations would be a minimum to suggest using a cpHastySpace. Could you briefly outline in which scenarios one would want to use >50 iterations? I am using a space with >5k bodies, which in pairs have 4 constraints (2 bodies comprise a "cell" that wants to act as one rigid body via 4 constraints between them). For example, could one increase iterations rather than decrease dt time step to squelch overlap?
2) While looking at the work function for a thread in cpHastySpace.c, it seems that more than one thread could access the same body->v parameter via the cpArbiterApplyImpulse() -> apply_impulse() function call sequence, where the latter is non-atomic read-modify-write on body->v. Is this an "innocent" race condition if at worst it does not add an impulse added from another thread? The race is bound to happen, but perhaps not frequently and thus you only miss an iteration or two, on only a few bodies each time step? (I can see this as a good compromise vs. an expensive mutex).
3) I have not yet tried to profile the impulse solver within cpSpaceStep(). Would you estimate, that for a large number (>5k) bodies (=> >10k constraints), all of which are "crowded" together (ie, very few not touching any other cells, thus a large number of arbiters?), that most time is spent in the iterations of the solver, even for the default 10 iterations?
Thanks for any info!
--winkle
I am interested in parallelizing the chipmunk impulse solver by using cpHastySpace in my code and wanted to clarify its use and a potential data race condition.
1) The cpHastySpace implementation is based on dividing iterations amongst threads, thus the comment in the code that >~50 iterations would be a minimum to suggest using a cpHastySpace. Could you briefly outline in which scenarios one would want to use >50 iterations? I am using a space with >5k bodies, which in pairs have 4 constraints (2 bodies comprise a "cell" that wants to act as one rigid body via 4 constraints between them). For example, could one increase iterations rather than decrease dt time step to squelch overlap?
2) While looking at the work function for a thread in cpHastySpace.c, it seems that more than one thread could access the same body->v parameter via the cpArbiterApplyImpulse() -> apply_impulse() function call sequence, where the latter is non-atomic read-modify-write on body->v. Is this an "innocent" race condition if at worst it does not add an impulse added from another thread? The race is bound to happen, but perhaps not frequently and thus you only miss an iteration or two, on only a few bodies each time step? (I can see this as a good compromise vs. an expensive mutex).
3) I have not yet tried to profile the impulse solver within cpSpaceStep(). Would you estimate, that for a large number (>5k) bodies (=> >10k constraints), all of which are "crowded" together (ie, very few not touching any other cells, thus a large number of arbiters?), that most time is spent in the iterations of the solver, even for the default 10 iterations?
Thanks for any info!
--winkle