[Devel] Re: How much of a mess does OpenVZ make? ; ) Was: What can OpenVZ do?
Serge E. Hallyn
serue at us.ibm.com
Fri Mar 13 09:35:31 PDT 2009
Quoting Cedric Le Goater (legoater at free.fr):
>
> > No, what you're suggesting does not suffice.
>
> probably. I'm still trying to understand what you mean below :)
>
> Man, I hate these hierarchicals pid_ns. one level would have been enough,
> just one vpid attribute in 'struct pid*'
Well I don't mind - temporarily - saying that nested pid namespaces
are not checkpointable. It's just that if we're going to need a new
syscall anyway, then why not go ahead and address the whole problem?
It's not hugely more complicated, and seems worth it.
> > Call
> > (5591,3,1) the task knows as 5591 in the init_pid_ns, 3 in a child pid
> > ns, and 1 in grandchild pid_ns created from there. Now assume we are
> > checkpointing tasks T1=(5592,1), and T2=(5594,3,1).
> >
> > We don't care about the first number in the tuples, so they will be
> > random numbers after the recreate.
>
> yes.
>
> > But we do care about the second numbers.
>
> yes very much and we need a way set these numbers in alloc_pid()
>
> > But specifying CLONE_NEWPID while recreating the process tree
> > in userspace does not allow you to specify the 3 in (5594,3,1).
>
> I haven't looked closely at hierarchical pid namespaces but as we're
> using a an array of pid indexed but the pidns level, i don't see why
> it shouldn't be possible. you might be right.
>
> anyway, I think that some CLONE_NEW* should be forbidden. Daniel should
> send soon a little patch for the ns_cgroup restricting the clone flags
> being used in a container.
Uh, that feels a bit over the top. We want to make this
uncheckpointable (if it remains so), not prevent the whole action.
After all I may be running a container which I don't plan on ever
checkpointing, and inside that container running a job which i do
want to migrate.
So depending on if we're doing the Dave or the rest-of-the-world
way :), we either clear_bit(pidns->may_checkpoint) on the parent
pid_ns when a child is created, or we walk every task being
checkpointed and make sure they each are in the same pid_ns. Doesn't
that suffice?
-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list