[CRIU] [PATCH 00/12] Introduce custom per-task service fds placement
Kirill Tkhai
ktkhai at virtuozzo.com
Thu Dec 28 10:08:03 MSK 2017
On 28.12.2017 03:13, Andrei Vagin wrote:
> On Wed, Dec 27, 2017 at 10:30:26PM +0300, Kirill Tkhai wrote:
>> On 27.12.2017 20:14, Andrei Vagin wrote:
>>> On Tue, Dec 26, 2017 at 06:45:46PM +0300, Kirill Tkhai wrote:
>>>> Currently, service fds are bounded to rlimit(RLIMIT_NOFILE).
>>>> This brings problems with memory usage, because service fds
>>>> forces kernel to allocate big files_struct for every task,
>>>> and the container consumes much more memory, than it used
>>>> before dump. In some situations this even may lead to restore
>>>> fail because of enough memory absence in cgroup or on node.
>>>>
>>>> This patchset reworks service fds placement, and places
>>>> the service fds in custom place for every task in dependence
>>>> of task's biggest fd used on dump. This reduces kernel-allocated
>>>> files_struct in significant way, and make container use less
>>>> memory.
>>>>
>>>> The patchset consists of two parts. Patches [1-4/12] reworks
>>>> ctl tty crutch we used before, and makes ctl tty to restore
>>>> in generic file engine way. This change is good enough by its
>>>> own, and it has to be made separately earlier, but now is
>>>> the time to stop delaying, and to remove all the crutch, because
>>>> this is need for the second part.
>>>>
>>>> The second part of patchset places service fds in dependence
>>>> of task's used fds as described in start of topic. See patches
>>>> [5-12/12] for details of that.
>>>>
>>>> Patch [12/12] actually makes service fds "per-task placed".
>>>>
>>>
>>> Where is a test?
>>
>> I'll be happy to implement a test scenario you'll suggested.
>> What will you recommend?
>
> Create a few processes with different file rlimits, set file descriptors
> so that criu has to move srevice fd-s on retore, etc...
Yeah, it has a sense.
>>
>>> [root at fc24 criu]# python test/zdtm.py run -t zdtm/static/env00
>>> === Run 1/1 ================ zdtm/static/env00
>>>
>>> ========================== Run zdtm/static/env00 in h ==========================
>>> Start test
>>> ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST
>>> Run criu dump
>>> Unable to kill 36: [Errno 3] No such process
>>> Run criu restore
>>> =[log]=> dump/zdtm/static/env00/36/1/restore.log
>>> ------------------------ grep Error ------------------------
>>> (00.016768) 36: Collect fdinfo pid=36 fd=1006 id=0x5
>>> (00.016783) 36: Collect fdinfo pid=36 fd=1007 id=0x5
>>> (00.016798) 36: Collect fdinfo pid=36 fd=1008 id=0x5
>>> (00.016813) 36: Collect fdinfo pid=36 fd=1009 id=0x5
>>> (00.016828) 36: Error (criu/files.c:915): Too big FD number to restore 1010
>>> (00.016864) Error (criu/cr-restore.c:2452): Restoring FAILED.
>>
>> See below.
>>
>>> ------------------------ ERROR OVER ------------------------
>>> ################# Test zdtm/static/env00 FAIL at CRIU restore ##################
>>> ##################################### FAIL #####################################
>>> [root at fc24 criu]# git diff
>>> diff --git a/criu/crtools.c b/criu/crtools.c
>>> index 5f005387d..f18c295d8 100644
>>> --- a/criu/crtools.c
>>> +++ b/criu/crtools.c
>>> @@ -244,7 +244,7 @@ static void rlimit_unlimit_nofile_self(void)
>>> new.rlim_cur = kdat.sysctl_nr_open;
>>> new.rlim_max = kdat.sysctl_nr_open;
>>>
>>> - if (prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
>>> + if (0 && prlimit(getpid(), RLIMIT_NOFILE, &new, NULL)) {
>>
>> It's not a valid change, because this unlimited PRLIMIT_NOFILE value
>> set in the above line is used in init_service_fd() to calculate potentially
>> possible service fds limit.
>>
>> So, the check "Too big FD number" just says that you can't restore a fd,
>> if the system does not allow to create such a number whatever you do.
>
> Pls, write a comment before prlimit() which explains why do we need
> this.
OK
>>
>>> pr_perror("rlimir: Can't setup RLIMIT_NOFILE for self");
>>> return;
>>> } else
>>> diff --git a/test/zdtm/static/env00.c b/test/zdtm/static/env00.c
>>> index 1feabfa9f..0083d3fa0 100644
>>> --- a/test/zdtm/static/env00.c
>>> +++ b/test/zdtm/static/env00.c
>>> @@ -21,6 +21,15 @@ int main(int argc, char **argv)
>>> exit(1);
>>> }
>>>
>>> + while (1) {
>>> + if (dup(1) == -1)
>>> + break;
>>> + }
>>> + close(10);
>>> + close(11);
>>> + close(12);
>>> + close(13);
>>> +
>>> test_daemon();
>>> test_waitsig();
>>>
>>>
>>>
>>>> https://travis-ci.org/tkhai/criu/builds/321817890
>>>> ---
>>>>
>>>> Kirill Tkhai (12):
>>>> files: Close ctl tty via generic engine
>>>> files: Move prepare_ctl_tty() to criu/tty.c
>>>> files: Move CTL_TTY_OFF fixup to generic file engine
>>>> files: Kill unused CTL_TTY_OFF leftovers
>>>> files: Rename service_fd_rlim_cur to service_fd_base_cur
>>>> files: Count inh_fd_max
>>>> files: Pass pstree_item argument to clone_service_fd()
>>>> files: Close old service fd in clone_service_fd()
>>>> files: Do setup_newborn_fds() later
>>>> files: Refactor clone_service_fd()
>>>> files: Prepare clone_service_fd() for overlaping ranges.
>>>> files: Make tasks set their own service_fd_base
>>>>
>>>>
>>>> criu/cr-restore.c | 14 +++--
>>>> criu/files.c | 45 +++-------------
>>>> criu/include/files.h | 3 +
>>>> criu/include/servicefd.h | 4 -
>>>> criu/include/tty.h | 2 -
>>>> criu/include/util.h | 2 +
>>>> criu/tty.c | 127 +++++++++++++++++++++++++++++++++++++++++++---
>>>> criu/util.c | 101 ++++++++++++++++++++++++++-----------
>>>> images/fdinfo.proto | 1
>>>> 9 files changed, 214 insertions(+), 85 deletions(-)
>>>>
>>>> --
>>>> Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>>
>> Kirill
More information about the CRIU
mailing list