[CRIU] [PATCH] Attempt to restore cgroups
Pavel Emelyanov
xemul at parallels.com
Wed Jul 2 00:00:48 PDT 2014
On 07/02/2014 01:08 AM, Tycho Andersen wrote:
> Hi Pavel,
>
> On Wed, Jul 02, 2014 at 12:12:10AM +0400, Pavel Emelyanov wrote:
>>
>> We have the /proc/pid/mountinfo parsing routine ready. Can it be re-used
>> for this purpose?
>
> Ah, actually I didn't notice this. I can use that instead.
>
>>> + switch(mtype) {
>>> + /* ignore co-mounted cgroups */
>>> + case EXACT_MATCH :
>>> + goto out;
>>> + case PARENT_MATCH :
>>> + list_add_tail(&ncd->siblings, &match->children);
>>> + match->n_children++;
>>> + break;
>>> + case NO_MATCH :
>>> + list_add_tail(&ncd->siblings, ¤t_controller->heads);
>>> + current_controller->n_heads++;
>>> + break;
>>
>> If we have two directories -- /foo and /foo/bar and find the latter one first,
>> then both the /foo and the /foo/bar would just be added into the controller->heads
>> list, but only the /foo should, while /foo/bar should be in /foo's ->children.
>
> Isn't this handled by ftw() doing a preorder traversal?
It is, but since you call add_cgroup() per-task it may happen, that the
first task you meet lives in /foo/bar and the second -- in /foo.
>>
>> AFAIU you call the collect_cgroup() for every new cgset met in order to construct
>> the tree of directories potentially seen by the processes we dump. But since the
>> dump_sets() verify, that all such cgroups are subsets of the init task's one, would
>> it be easier just to scan the tree down starting from init task's dirs?
>
> Could be. Actually I wasn't sure if that would catch everything or
> not, but I can change it to do that instead since it will.
>
>>
>> Some minor thing, that can be done later, but still.
>>
>> In OpenVZ we restore all the limits only after all tasks are restored
>> and pushed into their cgroups. This is done for two reasons.
>>
>> First, some controllers allow situations when the usage is higher than
>> the limit. E.g. the kmemcg will be such, this can happen when tasks eat
>> memory, and the admin lowers the limit. Kmem is mostly unshrinkable and
>> such cgroup will (well, may) just live and fail all new allocations.
>> Restoration inside pre-limited cgroup will be impossible.
>>
>> And the 2nd reason is -- if we put too strict e.g. cpu limit this may
>> slow down the restore precess significantly. It's much better to restore
>> withing some larger limits and then tie them.
>
> Ok, this makes sense, I can try to work it into the patch. I mostly
> just added these as examples of how I envisioned things would be
> serialized.
>
> Thanks for the other comments as well, I will fix those and re-post.
Cool! Thanks, Tycho!
More information about the CRIU
mailing list