Possible causes for this error?

Official forum for the Chipmunk2D Physics Library.
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Yeah, so, here's the latest. If you haven't guessed, reworking my explosion handling improves things, but doesn't fix it completely. :shock:

Each of my cpBody instances is wrapped with a C++ / Lua object. I'm using a simple object pooling system to recycle the C++ / Lua instances as needed. When a cpBody is removed from the space, I add my C++ / Lua instance to a free list for later reuse.

It seems that this recycling is causing some odd behavior. Specifically, arbiters retain pointers to removed / freed cpBody instances. I'm unsure why this is, but I'm digging through the code looking for a reason. I don't see any clear place where arbiters are cleaned up when a body is removed from the space. Collision persistence seems to allow arbiters to persist and, as such, potentially stale pointers might also persist.

I'll keep digging. :)

steve
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Even more info...

1. Remove a body from the space and free it.

2. It's possible for other bodies in the space (likely sleeping) to have arbiters still pointing to the removed / freed body.

This code was used to detect this problem after removing / freeing a body:

Code: Select all

void cpSpaceDoesBodyHaveArbiters( cpSpace *space, cpBody *body )
{
	if(!space || !body)
		return;

	for(int i=0; i<space->bodies->num; i++)
	{
		cpBody *TheBody = (cpBody*)space->bodies->arr[i];
		CP_BODY_FOREACH_ARBITER(TheBody, arb) cpAssertHard(arb->body_a != body && arb->body_b != body, "Body removed with active arbiters in the space!");
	}
}
It seems that removing a body ought to remove it from all of the arbiters in the space that are related to it. Any advice for a fix?

steve
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

A bit more info... turning off the sleeping algorithm makes this problem seemingly go away. Similarly, never freeing cpBody instances with the sleeping algorithm turned on makes the problem go away.

steve
User avatar
slembcke
Site Admin
Posts: 4166
Joined: Tue Aug 14, 2007 7:13 pm
Contact:

Re: Possible causes for this error?

Post by slembcke »

Oh. So are you forgetting to remove the shapes attached to your body before freeing it? That would be a big problem. You'd have dangling pointers for the body still set on the shapes. That *might* be able explain what is going on here. The arbiters are removed when the shapes are.

I really want to make an assertion for freeing a body with shapes still attached to it, but unfortunately it's a pretty common practice to free all the memory for a space en-masse. I think it would end up being more of an annoyance than a help.

Here's something to try. Try zeroing out the memory when it's freed. That would probably help track down any dangling pointers a bit faster.
Can't sleep... Chipmunks will eat me...
Check out our latest projects! -> http://howlingmoonsoftware.com/wordpress/
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

I'm almost 100% certain I'm freeing and removing all shapes before removing the body. I also added these asserts in cpBodyFree:

Code: Select all

cpAssertSoft(body->arbiterList == NULL, "Body freed with arbiters!");
cpAssertSoft(body->constraintList == NULL, "Body freed with constraints!");
cpAssertSoft(body->shapeList == NULL, "Body freed with shapes!");
cpAssertSoft(body->space == NULL, "Body freed with space!");
Wouldn't a body with shapes still have the shapeList pointer assigned?

steve
User avatar
slembcke
Site Admin
Posts: 4166
Joined: Tue Aug 14, 2007 7:13 pm
Contact:

Re: Possible causes for this error?

Post by slembcke »

Yeah, those variables hold the beginning of the linked lists. Hmm. I really have no idea what is going on here then. I'm having a hard time coming up with some weird edge cases or incorrect API usages for test cases that could possible cause that bug.
Can't sleep... Chipmunks will eat me...
Check out our latest projects! -> http://howlingmoonsoftware.com/wordpress/
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

I'll see what I can do to get a repro case outside of my game. Will give more info when I have it.

steve
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

So, I've been digging through the Chipmunk code and it's still not clear to me when arbiters are cleaned up. I'm definitely seeing arbiters with stale pointers persisting in the system. They don't persist long, which makes a solid repro case hard to pin down. I'm 99% sure I'm cleaning things up properly.

It looks like arbiters are detached from their body in the function cpArbiterUnthread.

cpArbiterUnthread is called on all arbiters that have two non-sleeping bodies attached at the beginning of cpSpaceStep. When I turn off the sleeping algorithm the problem stops... so I'll wager this cpArbiterUnthread is solving the problem in that case.

cpArbiterUnthread also called in cachedArbiterFilters if the arbiter to be filtered matches the context bodies and shapes. cachedArbiterFilters is called exclusively by cpSpaceFilterArbiters. cpSpaceFilterArbiters is called when removing shapes.

Assuming my understanding is correct, there are only a few options here:

1. The shape's body has been cleared before removing it from the space (not likely because cpSpaceRemoveSpace would crash with a NULL dereference).

2. cpSpaceFilterArbiters isn't touching all arbiters, thus the filter isn't executing on the arbiters correctly.

3. cachedArbitersFilter is missing some edge case that allows an arbiter to remain in the graph.

4. cpArbiterUnthread isn't removing the arbiter from the graph properly in all cases.

I'll keep digging in on my repro cases. If you could let me know if my assumptions / understanding is flawed, that'd be helpful.

steve
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Some more details here. I've been instrumenting Chipmunk with all kinds of handy logging / validation code. Here's what I've found.

It turns out that cpSpaceUncacheArbiter doesn't properly cleanup the pointer structure created by flood filling components. When removing shapes of a body that has an arbiter that's been uncached, those arbiters aren't unthreaded. So, we end up with a dangling pointer to a freed body which causes all manner of possible issues.

Consider this case in my logging: (cpArbiter *)0xadad5e0 has a body_a pointer to (cpBody *)0xaccbc80 which has been freed. The relevant sections of logging that apply to this case are below.

Code: Select all

Sat Dec 08 17:34:52 2012:4000:cpSpaceDeactivateBody: 0x0ACCBC80
Sat Dec 08 17:34:52 2012:4000:	cpSpaceUncacheArbiter: 0x0ADAD5E0
Sat Dec 08 17:34:52 2012:4000:	cpSpaceUncacheArbiter: 0x0ADADED0
Sat Dec 08 17:34:52 2012:4000:cpSpaceDeactivateBody done
Here we see the arbiter being uncached. Soon after the game removes the body / shapes for (cpBody *)0xACCBC80. Consider this log of the cpSpaceFilterArbiters behavior in that case:

Code: Select all

Sat Dec 08 17:34:53 2012:4000:cpSpaceFilterArbiters: body=0x0ACCBC80, shape=0x0ACCBD60
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD370 (body_a=0x0AD20A70, shape_a=0x0AE81B00, body_b=0x0AD17250, shape_b=0x0AD17330)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD850 (body_a=0x0AD26BF8, shape_a=0x0AE67130, body_b=0x0AE81050, shape_b=0x0AE81130)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAC0C0 (body_a=0x0AD19798, shape_a=0x0AD19878, body_b=0x0AD17BE0, shape_b=0x0AD17CC0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADACFC8 (body_a=0x0AD18D38, shape_a=0x0AD18E18, body_b=0x0AD20E20, shape_b=0x0AD20F00)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD510 (body_a=0x0AD19798, shape_a=0x0AD19878, body_b=0x0AD17758, shape_b=0x0AD17838)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD6B0 (body_a=0x0AE81050, shape_a=0x0AE81130, body_b=0x0AE6D480, shape_b=0x0AE6D728)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD030 (body_a=0x0AE88370, shape_a=0x0AE713A0, body_b=0x0AE88B20, shape_b=0x0AE88C00)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD8B8 (body_a=0x0AE832D0, shape_a=0x0AE6D2A0, body_b=0x0AE730B8, shape_b=0x0AE73198)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAE070 (body_a=0x0AD17488, shape_a=0x0AD188B0, body_b=0x0AD167F0, shape_b=0x0AD168D0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAC670 (body_a=0x0AD17758, shape_a=0x0AD17838, body_b=0x0AD21150, shape_b=0x0AD21230)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADADD30 (body_a=0x0AD20E20, shape_a=0x0AD20F00, body_b=0x0AD19288, shape_b=0x0AD19368)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAE0D8 (body_a=0x0AE88DE0, shape_a=0x0AE88EC0, body_b=0x0AE88860, shape_b=0x0AE88940)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD308 (body_a=0x0AD20A70, shape_a=0x0AE81B00, body_b=0x0AD18D38, shape_b=0x0AD18E18)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADACF60 (body_a=0x0AD29928, shape_a=0x0ACCFAA0, body_b=0x0AE82C10, shape_b=0x0AE82CF0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD988 (body_a=0x0AD17BE0, shape_a=0x0AD17CC0, body_b=0x0AD167F0, shape_b=0x0AD168D0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAC1F8 (body_a=0x0AD26BF8, shape_a=0x0AE67130, body_b=0x0AE82C10, shape_b=0x0AE82CF0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD920 (body_a=0x0AE7F488, shape_a=0x0AE7F568, body_b=0x0AE62940, shape_b=0x0AE62A20)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADACB50 (body_a=0x0ACCDB30, shape_a=0x0ACCDC48, body_b=0x0AE62D80, shape_b=0x0AE63360)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD440 (body_a=0x0AE81050, shape_a=0x0AE81130, body_b=0x0AE82C10, shape_b=0x0AE82CF0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADACA80 (body_a=0x0AD21150, shape_a=0x0AD21230, body_b=0x0AD17488, shape_b=0x0AD188B0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD1D0 (body_a=0x0AD17250, shape_a=0x0AD17330, body_b=0x0AD19798, shape_b=0x0AD19878)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD2A0 (body_a=0x0AE88370, shape_a=0x0AE713A0, body_b=0x0AE88860, shape_b=0x0AE88940)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADADA58 (body_a=0x0ACCDB30, shape_a=0x0ACCDC48, body_b=0x0AE7F488, shape_b=0x0AE7F568)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAD3D8 (body_a=0x0AE832D0, shape_a=0x0AE6D2A0, body_b=0x0AE82C10, shape_b=0x0AE82CF0)
Sat Dec 08 17:34:53 2012:4000:	testing arbiter 0x0ADAC330 (body_a=0x0AD26BF8, shape_a=0x0AE67130, body_b=0x0AE730B8, shape_b=0x0AE73198)
Sat Dec 08 17:34:53 2012:4000:done
Notice that the arbiter 0x0adad5e0 isn't found during the filter. Which, of course, makes sense because it was just removed by the uncache call. This leaves the arbiter in a bad state with stale pointers to a soon-to-be-freed body.

It seems to me that anytime an arbiter is getting destroyed that it should clean up its threading information so that this dangling pointer issue can be avoided. It's a rather subtle timing issue though that may not show up in many cases. Our game has a bunch of objects and recycles them like crazy.

I'm happy to add more debug logging to further track this down. I'll look for a fix on my own, but any feedback from you would be appreciated. Please find a ZIP file of the log file attached.

steve
Attachments
arbiter-bug.zip
(35.06 KiB) Downloaded 394 times
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

To be crystal clear, the dangling pointer is in the arbiter and the arbiter is on the body's linked list but not in the space's arbiter hash set or arbiter list.

steve
Post Reply

Who is online

Users browsing this forum: No registered users and 10 guests