[Devel] Re: Re: Hang with fair cgroup scheduler (reproducer is attached.)
Dmitry Adamushko
dmitry.adamushko at gmail.com
Fri Dec 14 11:51:28 PST 2007
> [ ... ]
>
> [<a0000001002e0480>] rb_erase+0x300/0x7e0
> [<a000000100076290>] __dequeue_entity+0x70/0xa0
> [<a000000100076300>] set_next_entity+0x40/0xa0
> [<a0000001000763a0>] set_curr_task_fair+0x40/0xa0
> [<a000000100078d90>] sched_move_task+0x2d0/0x340
> [<a000000100078e20>] cpu_cgroup_attach+0x20/0x40
>
> [ ... ]
argh... it's a consequence of the 'current is not kept within the tree" indeed.
When sched_move_task() is called for the 'current' (running on another CPU),
we get the following:
...
running = task_running(rq, tsk);
on_rq = tsk->se.on_rq;
if (on_rq) {
dequeue_task(rq, tsk, 0);
if (unlikely(running))
tsk->sched_class->put_prev_task(rq, tsk);
}
[1] tsk->sched_class->put_prev_task() actually _inserts_ 'tsk' back
into the cfs_rq of its _old_ group :
set_task_cfs_rq(tsk, task_cpu(tsk));
[2] now task.se->cfs_rq gets changed
if (on_rq) {
if (unlikely(running))
tsk->sched_class->set_curr_task(rq);
[3] and now, tsk->sched_class->set_curr_task(rq) _removes_ the
'current' from the tree... but this tree belongs to the _new_ group
(the task is still within the 'old_group->cfs_rq->rb_tree') ---> oops!
enqueue_task(rq, tsk, 0);
}
Anyway, I have to admit that this problem is a consequence of the
special-case treatment for the 'current' by
'dequeue/enqueue_task()'... it makes the interface less transparent
indeed.
/me thinking on how to get it fixed (e.g. set_task_cfs_rq() might take
care of it) or just get this special-case issue removed (have to check
whether we lose anything in this case)... sigh.
--
Best regards,
Dmitry Adamushko
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list