[CRIU] [PATCH] Attempt to restore cgroups

Wed Jul 2 00:00:48 PDT 2014

On 07/02/2014 01:08 AM, Tycho Andersen wrote:
> Hi Pavel,
> 
> On Wed, Jul 02, 2014 at 12:12:10AM +0400, Pavel Emelyanov wrote:
>>
>> We have the /proc/pid/mountinfo parsing routine ready. Can it be re-used
>> for this purpose?
> 
> Ah, actually I didn't notice this. I can use that instead.
> 
>>> +		switch(mtype) {
>>> +			/* ignore co-mounted cgroups */
>>> +			case EXACT_MATCH :
>>> +				goto out;
>>> +			case PARENT_MATCH :
>>> +				list_add_tail(&ncd->siblings, &match->children);
>>> +				match->n_children++;
>>> +				break;
>>> +			case NO_MATCH :
>>> +				list_add_tail(&ncd->siblings, &current_controller->heads);
>>> +				current_controller->n_heads++;
>>> +				break;
>>
>> If we have two directories -- /foo and /foo/bar and find the latter one first,
>> then both the /foo and the /foo/bar would just be added into the controller->heads
>> list, but only the /foo should, while /foo/bar should be in /foo's ->children.
> 
> Isn't this handled by ftw() doing a preorder traversal?

It is, but since you call add_cgroup() per-task it may happen, that the
first task you meet lives in /foo/bar and the second -- in /foo.

>>
>> AFAIU you call the collect_cgroup() for every new cgset met in order to construct
>> the tree of directories potentially seen by the processes we dump. But since the
>> dump_sets() verify, that all such cgroups are subsets of the init task's one, would
>> it be easier just to scan the tree down starting from init task's dirs?
> 
> Could be. Actually I wasn't sure if that would catch everything or
> not, but I can change it to do that instead since it will.
> 
>>
>> Some minor thing, that can be done later, but still.
>>
>> In OpenVZ we restore all the limits only after all tasks are restored
>> and pushed into their cgroups. This is done for two reasons.
>>
>> First, some controllers allow situations when the usage is higher than
>> the limit. E.g. the kmemcg will be such, this can happen when tasks eat
>> memory, and the admin lowers the limit. Kmem is mostly unshrinkable and
>> such cgroup will (well, may) just live and fail all new allocations.
>> Restoration inside pre-limited cgroup will be impossible.
>>
>> And the 2nd reason is -- if we put too strict e.g. cpu limit this may
>> slow down the restore precess significantly. It's much better to restore
>> withing some larger limits and then tie them.
> 
> Ok, this makes sense, I can try to work it into the patch. I mostly
> just added these as examples of how I envisioned things would be
> serialized.
> 
> Thanks for the other comments as well, I will fix those and re-post.

Cool! Thanks, Tycho!