I wasn't focusing so much on that as on the fact that the profiler seems to indicate the overhead is coming from Chipmunk stuff running outside of the callback itself (that is, not called by it).
Edit: related, though not the reason I made the topic: after reading your post, I've just realised that part of the code in the pre-solve callback would be fine in a begin one, despite having considered it before and for some reason deciding to leave it alone. The rest only does stuff every fourth step of the space ATM, so it's not too bad. (I'm sure I'll find out why at a later date, after a few hours of debugging...)
While it would be nice to put some rewrite some intensive stuff in C, I think it's all quite spread out over a few functions, and a lot of the stuff taking the longest interacts with instances of Python classes, I've written. I haven't done bindings between languages before, but something tells me that might make it tricky. I'll look into it, though; performance hasn't been a priority up to now, but it makes sense to look into things before I end up at the point where it'll take a lot of refactoring to get good performance.