[CRIU] [PATCH 2/2] restore: correctly restore cgroup mounts inside a container
Pavel Emelyanov
xemul at virtuozzo.com
Thu Mar 24 07:11:31 PDT 2016
On 03/24/2016 05:00 PM, Tycho Andersen wrote:
> On Thu, Mar 24, 2016 at 10:16:30AM +0300, Pavel Emelyanov wrote:
>> On 03/24/2016 12:41 AM, Tycho Andersen wrote:
>>> Before the nsroot= mount option, we were just getting lucky because the
>>> cgroup superblocks "matched" when inspecting them from userspace, so we
>>> were actually getting a bind mount from the host when migrating from within
>>> cgroup namespaces.
>>>
>>> Instead, let's actually do a new (i.e. not a bind mount) for cgroup
>>> namespaces. For this, we need two things:
>>>
>>> 1. to prepare the cgroup namespace (and thus the cgroups) before the mount
>>> ns, so when the mount() occurrs it is relative to the right cgroup path.
>>>
>>> 2. not reject cgroup filesystems with no root. A cgroup ns mount looks
>>> like:
>>>
>>> 223 222 0:22 /lxc/unpriv /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/lxc/unpriv
>>>
>>> i.e. it has /lxc/unpriv as its root, and thus doesn't look rooted to CRIU.
>>> Let's allow cgroup mounts to be unrooted so we can deal with this.
>>
>> Am I right that the only problem here is that criu doesn't support mounting
>> of anything that doesn't have root mount? If so, the correct fix would be
>> to support _this_ by creating a temporary root mount, then umounting it at
>> the very end.
>
> Yes. One problem with creating this temporary mount is that any mount
> of cgroupfs is created relative to the task's current cgroup ns root.
> So we'd have to create this temporary mount, restore cgroup ns, and
> then do the real mount restore, which would effectively just be
> another completely new and unrelated mount. I don't see what the
> temporary root mount buys us in this case.
Ah, I see. The mountpoint that is normally non-root mount turns out to
effectively be such in case we call mount() from inside the cgroup ns, right?
> Tycho
>
>>> Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
>>> ---
>>> criu/cr-restore.c | 18 +++++++++---------
>>> criu/mount.c | 5 ++++-
>>> 2 files changed, 13 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
>>> index ffd2f01..967d5fa 100644
>>> --- a/criu/cr-restore.c
>>> +++ b/criu/cr-restore.c
>>> @@ -1608,6 +1608,15 @@ static int restore_task_with_children(void *_arg)
>>> goto err;
>>> }
>>>
>>> + /*
>>> + * Call this _before_ forking to optimize cgroups
>>> + * restore -- if all tasks live in one set of cgroups
>>> + * we will only move the root one there, others will
>>> + * just have it inherited.
>>> + */
>>> + if (prepare_task_cgroup(current) < 0)
>>> + goto err;
>>> +
>>> /* Restore root task */
>>> if (current->parent == NULL) {
>>> if (restore_finish_stage(CR_STATE_RESTORE_NS) < 0)
>>> @@ -1640,15 +1649,6 @@ static int restore_task_with_children(void *_arg)
>>> if (prepare_mappings())
>>> goto err;
>>>
>>> - /*
>>> - * Call this _before_ forking to optimize cgroups
>>> - * restore -- if all tasks live in one set of cgroups
>>> - * we will only move the root one there, others will
>>> - * just have it inherited.
>>> - */
>>> - if (prepare_task_cgroup(current) < 0)
>>> - goto err;
>>> -
>>> if (prepare_sigactions() < 0)
>>> goto err;
>>>
>>> diff --git a/criu/mount.c b/criu/mount.c
>>> index be79999..614a89a 100644
>>> --- a/criu/mount.c
>>> +++ b/criu/mount.c
>>> @@ -734,7 +734,7 @@ static int validate_mounts(struct mount_info *info, bool for_dump)
>>> */
>>> ret = m->need_plugin ? 0 : -ENOTSUP;
>>>
>>> - if (ret < 0) {
>>> + if (ret < 0 && m->fstype->code != FSTYPE__CGROUP) {
>>> if (ret == -ENOTSUP)
>>> pr_err("%d:%s doesn't have a proper root mount\n",
>>> m->mnt_id, m->mountpoint);
>>> @@ -2453,6 +2453,9 @@ static bool can_mount_now(struct mount_info *mi)
>>> if (mi->external)
>>> goto shared;
>>>
>>> + if (mi->fstype->code == FSTYPE__CGROUP)
>>> + return true;
>>> +
>>> /*
>>> * We're the slave peer:
>>> * - Make sure the master peer is already mounted
>>>
>>
> .
>
More information about the CRIU
mailing list