Possible causes for this error?

Official forum for the Chipmunk2D Physics Library.
User avatar
slembcke
Site Admin
Posts: 4166
Joined: Tue Aug 14, 2007 7:13 pm
Contact:

Re: Possible causes for this error?

Post by slembcke »

Oof. I'll have to take a look at this tomorrow or maybe Monday. Looks like there is a lot to digest.
Can't sleep... Chipmunks will eat me...
Check out our latest projects! -> http://howlingmoonsoftware.com/wordpress/
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Any ideas on this yet? :)

steve
User avatar
slembcke
Site Admin
Posts: 4166
Joined: Tue Aug 14, 2007 7:13 pm
Contact:

Re: Possible causes for this error?

Post by slembcke »

Looking into this now. We'll see if I can figure anything out. :-\
Can't sleep... Chipmunks will eat me...
Check out our latest projects! -> http://howlingmoonsoftware.com/wordpress/
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Feel free to contact me directly if you need more info: snichols326 at gmail dot com
User avatar
slembcke
Site Admin
Posts: 4166
Joined: Tue Aug 14, 2007 7:13 pm
Contact:

Re: Possible causes for this error?

Post by slembcke »

*Aurgh*

I cannot for the life of me figure out how that could happen without forgetting to remove the shapes for the body first. Though you said you threw in the assertion to check that the body's shapeList was NULL.
To be crystal clear, the dangling pointer is in the arbiter and the arbiter is on the body's linked list but not in the space's arbiter hash set or arbiter list
To be even more clear, it's on the cpBody->arbiterList but not in the cpSpace.cachedArbiters set or the cpSpace.arbiters array? Even after all the shapes have been removed for that body? The only way I can figure that is happening is if the bodies aren't getting woken up when removing the shapes. I don't see how that could happen unless there was something else wrong with the contact graph, but I don't see how that could happen without other terrible issues coming up constantly. :(

Do you have any idea which chunk of your code is triggering the bug? I mean there hasn't been very many issues that have come up with the sleeping code and a lot of people have used it for a couple years now. So it must be some pretty specific sequence of events that make the bug possible. It would be easier to conjecture if I had something to work off of instead of trying to work everything backwards.
Can't sleep... Chipmunks will eat me...
Check out our latest projects! -> http://howlingmoonsoftware.com/wordpress/
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Yeah, this is a messy problem to repro reliably. I mean, I can easily reproduce the problem but I don't have a specific sequence of events that's causing it. Thus, getting it to happen in a "clean room" setup hasn't been achieved yet.

Here's the sequence of events I'm following. I'm basically creating game objects and adding them to the space. I do so this way:

1. Create a body.

2. Create shapes and store them in a local vector for later use. The shapes aren't added to the body yet, but their body pointer is assigned.

3. Wait for the object to be added to the simulation.

When the game object is added to the simulation:

1. cpSpaceAddBody

2. cpSpaceAddSpace for each shape stored above.

4. Wait for the object to be removed from the simulation.

When the object is removed:

1. Remove each shape stored above.

2. cpSpaceRemoveBody

I did indeed add asserts to check the body's shape list for NULL. That's not happening. I'm definitely removing shapes before the body.

I'm also at a loss as to how this could happen. My memory manager may be more aggressive at reusing pointers than some? Dunno. I'm sure it has to be something dumb somewhere. :) I just let the game run for a while and my validation code gets hit. Oh, let me explain my validation code:

I've added optional "magic numbers" to be beginning of cpBody and cpArbiter instances. When they're initialized, I assign the magic number. When they're freed / recycled, I clear them. I've also added body / arbiter validation functions. Finally, I added a cpSpaceValidate function that steps through all bodies and arbiters calling their validation functions. The relevant code is:

Code: Select all

void cpArbiterValidate(cpArbiter *arb)
{
	CP_VALIDATE_MAGIC_NUMBER(arb, 0xdeaddead);
	cpBodyValidate(arb->body_a);
	cpBodyValidate(arb->body_b);
}

void cpBodyValidate(cpBody *body)
{
	CP_VALIDATE_MAGIC_NUMBER(body, 0xdeadbeef);
}

void cpSpaceValidate( cpSpace *space )
{
#ifdef CP_DETAILED_VALIDATION
	if(space == NULL)
		return;

	cpArray *bodies = space->bodies;
	cpArray *constraints = space->constraints;
	cpArray *arbiters = space->arbiters;
	cpHashSet *cachedArbiters = space->cachedArbiters;

	for(int i=0; i<arbiters->num; i++){
		cpArbiter *arb = (cpArbiter *)arbiters->arr[i];
		cpArbiterValidate(arb);
	}

	for(int i=0; i<space->bodies->num; i++)
	{
		cpBody *TheBody = (cpBody*)space->bodies->arr[i];
		cpBodyValidate(TheBody);
		CP_BODY_FOREACH_ARBITER(TheBody, arb) 
		{
			cpArbiterValidate(arb);
			cpHashValue arbHashID = CP_HASH_PAIR((cpHashValue)arb->a, (cpHashValue)arb->b);
			cpAssertSoft(cpArrayContains(arbiters, arb) || cpHashSetFind(space->cachedArbiters, arbHashID, arb) == arb, "Arbiter is on body but not in space?");
		}
	}
#endif
}
Based on your comments, I enabled checking for arbiters being in the cachedArbiters hash set or arbiters array to cpSpaceValidate. That assert failed when running the following code:

Code: Select all

void AnimatedPhysicsObject::OnRemoveFromPhysicsLayer()
{
	assert(_Layer != NULL);

	cpBody *TheBody = _Body;
	cpSpace *TheSpace = _Layer->GetSpace();

	cpSpaceValidate(TheSpace);

	for(ShapeList::iterator it=_Shapes.begin(); it!=_Shapes.end(); ++it)
	{
		cpShape *TheShape = *it;
		cpSpaceRemoveShape(TheSpace, TheShape);
	}

	cpSpaceValidate(TheSpace);

	PhysicsObject::OnRemoveFromPhysicsLayer();

	cpSpaceValidate(TheSpace);
}
Validation failed on the second try. My guess is that cpSpaceRemoveShape uncached the arbiter while it was still on the body. But, yeah, I'm definitely getting into this bad state where the body has the arbiter but it doesn't exist as far as the space is concerned.

I'll continue to instrument the code and validate things until I narrow it down more.

steve
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Here's an interesting question IMO. I'm adding more logging to cpSpaceComponent. Specifically to cpSpaceActivateBody. I'm seeing the following code:

Code: Select all

		CP_BODY_FOREACH_ARBITER(body, arb){
			cpBody *bodyA = arb->body_a;
			if(body == bodyA || cpBodyIsStatic(bodyA)){
				int numContacts = arb->numContacts;
				cpContact *contacts = arb->contacts;
I'm wondering why this code is only processing things for (body == body_a) instead of checking for the body matching either one of the arbiter's bodies. Just wondering. :)

steve
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Take a look at the attached log for another snapshot of this bug. Body 0x0AD51B20 has an arbiter 0x0AE0DFD0 that's not in the space. I skip those asserts and let the crash play out. Hopefully the information on when shapes / bodies are added and removed will be more helpful.

Simply search the log for "0x0AD51FB0" and you'll see the lifetime of that body. Once it's freed then the app crashes.

steve
Attachments
arbiter-bug-4.zip
(15.05 KiB) Downloaded 385 times
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

Okay, here's something that might help shed light on this. It seems very much related to the parameter passed into cpSpaceSetSleepTimeThreshold. My default value is 1 second. When I reduce this to 0.1, the bug happens very quickly. When I increase this to, say 600, the problem stops.

So, it seems that something about how objects sleep with this threshold parameter that affects the issue.
So, I dug into the way that area of the code works a bit more. I think I've found a nasty bug. I hope I get a cookie! :)

In cpSpaceComponent, there's a neat section of code that checks for bodies being idle too long. If a body is found to be idle then this code executes:

Code: Select all

cpArrayPush(space->sleepingComponents, body);
CP_BODY_FOREACH_COMPONENT(body, other) cpSpaceDeactivateBody(space, other);
It's expected that cpSpaceDeactivateBody will remove entries from the space's body array. The code doesn't adjust the index into the body array because of this expectation. The problem is that cpSpaceDeactivateBody isn't limited to removing items that are after this body in the array. It might remove ones earlier in the list. When this happens, items are skipped improperly and not processed.

I've implemented a fix for this locally but it's not solving my dangling arbiter pointer problem. I'll keep digging though!

steve
Last edited by snichols on Mon Dec 10, 2012 10:41 pm, edited 1 time in total.
snichols
Posts: 53
Joined: Mon Nov 12, 2012 9:20 pm
Contact:

Re: Possible causes for this error?

Post by snichols »

I've attached one more log with even more details. I'm now logging when arbiters are initialized and pooled. In this instance:

1. body (0x0ADBCE30) has an arbiter in its list (0x0AE6F028) that's not in the space.

2. The shapes and bodies in this arbiter are: body_a=0x0ADBD2C0, body_b=0x0ADBCE30, a=0x0ADBD3A0, b=0x0ADBCF10

3. Right before the crash at the end of the log, body (0x0ADBD2C0) is removed from the space. This leaves a dangling pointer in the arbiter.

4. Earlier in the log, body (0x0ADBD2C0) is put to sleep because of idle time. This causes arbiter (0x0AE6F028) to be uncached. This means it is no longer in the hash set or the space's arbiter list.

5. Right after this, body (0x0ADBCE30) is activated. body (0x0ADBCE30) is the other body in the bad arbiter. The one that owns the dangling pointer. While this body is activating, it shows the arbiter being processed. However, nothing happens because of the check for "body == body_a." And, in this case, body is actually body_b.

I patiently await your response. :)

steve
Attachments
arbiter-bug-5.zip
(10.71 KiB) Downloaded 372 times
Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 9 guests