[CRIU] [PATCH 00/12] Introduce custom per-task service fds placement

Kirill Tkhai ktkhai at virtuozzo.com
Thu Dec 28 10:59:54 MSK 2017


On 28.12.2017 08:26, Andrei Vagin wrote:
> On Tue, Dec 26, 2017 at 06:45:46PM +0300, Kirill Tkhai wrote:
>> Currently, service fds are bounded to rlimit(RLIMIT_NOFILE).
>> This brings problems with memory usage, because service fds
>> forces kernel to allocate big files_struct for every task,
>> and the container consumes much more memory, than it used
>> before dump. In some situations this even may lead to restore
>> fail because of enough memory absence in cgroup or on node.
>>
>> This patchset reworks service fds placement, and places
>> the service fds in custom place for every task in dependence
>> of task's biggest fd used on dump. This reduces kernel-allocated
>> files_struct in significant way, and make container use less
>> memory.
>>
>> The patchset consists of two parts. Patches [1-4/12] reworks
>> ctl tty crutch we used before, and makes ctl tty to restore
>> in generic file engine way. This change is good enough by its
>> own, and it has to be made separately earlier, but now is
>> the time to stop delaying, and to remove all the crutch, because
>> this is need for the second part.
>>
>> The second part of patchset places service fds in dependence
>> of task's used fds as described in start of topic. See patches
>> [5-12/12] for details of that.
>>
>> Patch [12/12] actually makes service fds "per-task placed".
>>
>> https://travis-ci.org/tkhai/criu/builds/321817890
> 
> I have another idea. Currently we have all problems with descriptors,
> because we want to restore them too early. We can do a trick like we do
> for mappings. We can restore descriptors and don't care about their
> numbers, and then after retoring evething, we can move them to
> required numbers.
> 
> This trick has to simplify a lot of place. For example, we will not need
> find_unused_fd().

This will work, but keep in mind the following negative moments.

1)In most cases this won't give memory usage decrease in comparison
to the variant in this patchset, because of fdtable allocation scheme
chosen in kernel (see alloc_fdtable()). You'll just get an improvement
when service fd array falls near the boundary of alignment (refer to
alloc_fdtable()). Not a big probability. We better reduce the probability
even more just via decreasing the number of service fds. What service fds
do we obligatory have to have as service fds? It seems at least half of
them (LAZY_PAGES_SK_OFF, NS_FD_OFF, CGROUP_YARD, SELF_STDIN_OFF,... etc)
can simply be moved to fdstore. We don't need them as service fds. What
else? CTL_TTY_OFF will be killed by this patchset. ROOT_FD_OFF also seems
to be good to go to fdstore, but I'm not sure we even use it on restore.
Anyway, at least 50% of service fds will feel themselves comfortable in
fdstore.

2)Some file descriptors types may require specific fd on time of restore.
I remember at least eventpolls and their view in /proc/pid/fdinfo/*. Maybe
somebody else. Moving to another scheme makes their after-restore-view
different to now.

3)Currently, we have files ordered by fd in rst_info::fds. In generic
case (and I see this in my system) a process open them sequentially
and new fd is obtained one by one. criu even doesn't have to call dup()
for them as they open with right numbers. In case of free sfds placement,
they just occupy the small fd numbers and criu will have to call dup()
for almost every file descriptor. Not good for performance. Yeah, then
we can move sfds a little bit up to fit chosen generic case, but it
almost the same thing as to have sfds above max used fd.

My point of view: this will have small advantages and big disadvantages.

Kirill
 
>> ---
>>
>> Kirill Tkhai (12):
>>       files: Close ctl tty via generic engine
>>       files: Move prepare_ctl_tty() to criu/tty.c
>>       files: Move CTL_TTY_OFF fixup to generic file engine
>>       files: Kill unused CTL_TTY_OFF leftovers
>>       files: Rename service_fd_rlim_cur to service_fd_base_cur
>>       files: Count inh_fd_max
>>       files: Pass pstree_item argument to clone_service_fd()
>>       files: Close old service fd in clone_service_fd()
>>       files: Do setup_newborn_fds() later
>>       files: Refactor clone_service_fd()
>>       files: Prepare clone_service_fd() for overlaping ranges.
>>       files: Make tasks set their own service_fd_base
>>
>>
>>  criu/cr-restore.c        |   14 +++--
>>  criu/files.c             |   45 +++-------------
>>  criu/include/files.h     |    3 +
>>  criu/include/servicefd.h |    4 -
>>  criu/include/tty.h       |    2 -
>>  criu/include/util.h      |    2 +
>>  criu/tty.c               |  127 +++++++++++++++++++++++++++++++++++++++++++---
>>  criu/util.c              |  101 ++++++++++++++++++++++++++-----------
>>  images/fdinfo.proto      |    1 
>>  9 files changed, 214 insertions(+), 85 deletions(-)
>>
>> --
>> Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>


More information about the CRIU mailing list