[Devel] Re: [PATCH 14/15] Destroy pid namespace on init's death
sukadev at us.ibm.com
sukadev at us.ibm.com
Thu Aug 2 10:20:33 PDT 2007
Oleg Nesterov [oleg at tv-sign.ru] wrote:
| On 08/02, sukadev at us.ibm.com wrote:
| >
| > Oleg Nesterov [oleg at tv-sign.ru] wrote:
| > | On 07/31, sukadev at us.ibm.com wrote:
| > | >
| > | > Oleg Nesterov [oleg at tv-sign.ru] wrote:
| > | > | >
| > | > | > @@ -925,9 +926,10 @@ fastcall NORET_TYPE void do_exit(long co
| > | > | > if (unlikely(!tsk->pid))
| > | > | > panic("Attempted to kill the idle task!");
| > | > | > if (unlikely(tsk == task_child_reaper(tsk))) {
| > | > | > - if (task_active_pid_ns(tsk) != &init_pid_ns)
| > | > | > - task_active_pid_ns(tsk)->child_reaper =
| > | > | > - init_pid_ns.child_reaper;
| > | > | > + if (pid_ns != &init_pid_ns) {
| > | > | > + zap_pid_ns_processes(pid_ns);
| > | > | > + pid_ns->child_reaper = init_pid_ns.child_reaper;
|
| OOPS. I didn't notice this before, but this is not right too (regardless
| of multi-threaded init problems).
|
| We should not "reset" ->child_reaper here, we may have exiting tasks
| which will re-parent their ->children to global init.
|
| No, we are still /sbin/init of this namespace even if we are exiting,
| ->child_reaper should point to us, at least until zap_pid_ns_processes()
| completes.
Yes, we are resetting the reaper _after_ zap_pid_ns_processes() completes
right ? (all other processes in the namespace must have exited).
|
| > | > Our current definition of is_container_init() and task_child_reaper()
| > | > refer only to the main-thread of the container-init (since they check
| > | > for pid_t == 1)
| > |
| > | Yes.
| >
| > This means that we cannot have a check like "tsk == task_child_reaper(tsk)"
| > to properly detect the child reaper process right ?
|
| Yes, we should use "tsk->group_leader == task_child_reaper(tsk)"
Ok.
|
| > Its basically a very dumb question - How do we detect a container_init()
| > in the multi-threaded case ?
|
| Good point. I think is_container_init(tsk) needs a fix:
|
| - pid = task_pid(tsk);
| + pid = task_pid(tsk->group_leader);
|
Ok.
| Or, perhaps better, change the callers to use tsk->group_leader.
Ok.
|
| > Should we use "task->tgid == 1" ?
|
| No, no, "task->tgid == 1" means "global" init.
Grr. I got that mixed up bw my implm and Pavel's :-) task->pid and
task->tgid referred to "active-pid-ns pid" in mine.
|
| > IOW to identify if the last thread of a child reaper is exiting, should we
| > check "task->tgid == 1" and the "group_dead" flag in do_exit() ?
|
| See above, but yes, as I said before I think we should do this under
| the "if (group_dead)" check below.
|
| > | > If the main thread is exiting, but is not the last thread in the
| > | > group, should we let it exit and let the next thread in the group
| > | > the reaper of the pid ns ?
| > |
| > | We can, but why? The main thread's task_struct can't go away until all
| > | sub-threads exit. Its ->nsproxy will be NULL, but this doesn't matter.
| >
| > After the main thread exits task_child_reaper() would still refer to
| > the main thread right ? So when one of the other processes in the
| > namespace calls forget_original_parent(), it would reparent the process
| > to the main thread - no ? The main thread still has a valid task_struct,
| > but it has exited and cannot adapt children...
|
| Yes it can't, and yes, this is somewhat against the rules.
|
| But, afaics, this should work. Because do_wait() from the alive sub-thread
| still can reap the child, note that do_wait() iterates over all sub-threads
| ->children lists. Please note also that do_notify_parent() uses group
| signal, so it will wake up some alive sub-thread.
|
| This is wrong for the "normal" process (because when the last thread exits
| main_thread->children is lost), but this seems to be OK for the /sbin/init,
| exactly because we are doing zap_pid_ns_processes().
|
| Sukadev, may I ask you to add a fat comment about this in your patch?
Sure.
|
| > BTW, are there any actual users of multi-threaded init ? Or is this
| > something that can be considered outside the "core" patchset and
| > addressed soon, but separately like the signalling-container-init issue ?
|
| Well, I don't know. Please also see the reply to Kirill's message...
|
| Oleg.
More information about the Devel
mailing list