[CRIU] [PATCH 1/1] Use choose_service_fd_base in the root process.

Kirill Tkhai ktkhai at virtuozzo.com
Fri Aug 3 17:21:25 MSK 2018


On 03.08.2018 16:48, Radoslaw Burny wrote:
> Replying to Kirill's comment here, since I didn't get it in my inbox.

Sorry, I've replied to list only.

>> This chooses service fd base for main criu process, usernsd process
> (possibly there are other helpers).
>> I'm not sure, these guys never create many temporary fds before they
> install a service fd:
>>
> 
> 
> 
> 
>> criu main task:
>>    cached_fd1 = open()
>>    cached_fd2 = open()
>>    ...
>>    cached_fdn = open()
>>
>>    install_service_fd() -> rewrites cached_fdn.
>>
>> It's need to review criu main task & friends code and check there are no
> such the situations.
>> Also, if we are going to do this way, we will have to implement some
> things to detect such
>> the problems in the future code.
>>
>> Really restored tasks (init and its children) do not create many
> temporary fd by design,
>> and we check their fd leak by tests (in zdtm.py), so there is no such the
> problem.
> 
> Thanks, I wasn't aware of that. So the main CRIU process is effectively
> relying on a high
> service_fd_base to avoid collisions with temporary fds, right?

I hadn't reviewed this fact, and this is the thing we should to do to implement
such the improvement. The collisions are possible (there were one or couple
problems of this sort with ordinary restored tasks, AFAIR).

> In that case, we could still try to select a base by using a large enough
> pad.
> In install_service_fd, we could then do:
>   BUG_ON(close(new_fd) == 0);
> (such a safety check may be generally useful, regardless of my changes).

Yeah, this could be useful. But keep in mind, that currently, install_service_fd()
may overwrite existing fd. So, we should find all the places, where this may
happen, treewide, and to add direct close() there:

	close(fd); /* Comment while we override the fd */
	install_service_fd();

Then we will be able to add the check in install_service_fd().

I think we should do that one time, nobody has just done this yet.

> What do you think? Even with RLIMIT_NOFILE of few billions, the fork delay
> is just few hundred ms
> - so if it turns out that my change gets very complex, it might not be
> worth it.

Hm, in case of the fork delay is 0.1s, this certainly have a sense.

> On Fri, Aug 3, 2018 at 3:12 AM Andrei Vagin <avagin at virtuozzo.com> wrote:
> 
>> Kirill, could you review this patch?
>>
>> On Thu, Aug 02, 2018 at 02:53:10PM +0200, Radoslaw Burny wrote:
>>> choose_service_fd_base selects lowest possible numbers for service fds
>>> to keep fd table of the criu process small, making fork() faster.
>>> However, this method was only used in child processes, not the root CRIU
>>> process - thus forks from root were still slow. This fixes the issue.
>>>
>>> Sorry for slightly unreadable patch - the code move was necessary due to
>>> changes in the inter-function dependencies. The key part of the diff is
>>> this line:
>>>   service_fd_base = choose_service_fd_base(NULL);
>>>
>>> Signed-off-by: Radoslaw Burny <rburny at google.com>
>>> ---
>>>  criu/util.c | 66 ++++++++++++++++++++++++++---------------------------
>>>  1 file changed, 33 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/criu/util.c b/criu/util.c
>>> index ab4d89aa..114756a6 100644
>>> --- a/criu/util.c
>>> +++ b/criu/util.c
>>> @@ -441,27 +441,6 @@ static int service_fd_base;
>>>  /* Id of current process in shared fdt */
>>>  static int service_fd_id = 0;
>>>
>>> -int init_service_fd(void)
>>> -{
>>> -     struct rlimit64 rlimit;
>>> -
>>> -     /*
>>> -      * Service FDs are those that most likely won't
>>> -      * conflict with any 'real-life' ones
>>> -      */
>>> -
>>> -     if (syscall(__NR_prlimit64, getpid(), RLIMIT_NOFILE, NULL,
>> &rlimit)) {
>>> -             pr_perror("Can't get rlimit");
>>> -             return -1;
>>> -     }
>>> -
>>> -     service_fd_rlim_cur = (int)rlimit.rlim_cur;
>>> -     service_fd_base = service_fd_rlim_cur;
>>> -     BUG_ON(service_fd_base < SERVICE_FD_MAX);
>>> -
>>> -     return 0;
>>> -}
>>> -
>>>  static int __get_service_fd(enum sfd_type type, int service_fd_id)
>>>  {
>>>       return service_fd_base - type - SERVICE_FD_MAX * service_fd_id;
>>> @@ -560,20 +539,20 @@ static void move_service_fd(struct pstree_item
>> *me, int type, int new_id, int ne
>>>
>>>  static int choose_service_fd_base(struct pstree_item *me)
>>>  {
>>> -     int nr, real_nr, fdt_nr = 1, id = rsti(me)->service_fd_id;
>>> +     int nr = -1, real_nr, fdt_nr = 1, id = rsti(me)->service_fd_id;
>>>
>>> -     if (rsti(me)->fdt) {
>>> -             /* The base is set by owner of fdt (id 0) */
>>> -             if (id != 0)
>>> -                     return service_fd_base;
>>> -             fdt_nr = rsti(me)->fdt->nr;
>>> +     if (me != NULL) {
>>> +             if (rsti(me)->fdt) {
>>> +                     /* The base is set by owner of fdt (id 0) */
>>> +                     if (id != 0)
>>> +                             return service_fd_base;
>>> +                     fdt_nr = rsti(me)->fdt->nr;
>>> +             }
>>> +             /* Now find process's max used fd number */
>>> +             if (!list_empty(&rsti(me)->fds))
>>> +                     nr = list_entry(rsti(me)->fds.prev,
>>> +                                     struct fdinfo_list_entry,
>> ps_list)->fe->fd;
>>>       }
>>> -     /* Now find process's max used fd number */
>>> -     if (!list_empty(&rsti(me)->fds))
>>> -             nr = list_entry(rsti(me)->fds.prev,
>>> -                             struct fdinfo_list_entry, ps_list)->fe->fd;
>>> -     else
>>> -             nr = -1;
>>>
>>>       nr = max(nr, inh_fd_max);
>>>       /*
>>> @@ -607,6 +586,27 @@ static int choose_service_fd_base(struct
>> pstree_item *me)
>>>       return nr;
>>>  }
>>>
>>> +int init_service_fd(void)
>>> +{
>>> +     struct rlimit64 rlimit;
>>> +
>>> +     /*
>>> +      * Service FDs are those that most likely won't
>>> +      * conflict with any 'real-life' ones
>>> +      */
>>> +
>>> +     if (syscall(__NR_prlimit64, getpid(), RLIMIT_NOFILE, NULL,
>> &rlimit)) {
>>> +             pr_perror("Can't get rlimit");
>>> +             return -1;
>>> +     }
>>> +
>>> +     service_fd_rlim_cur = (int)rlimit.rlim_cur;
>>> +     service_fd_base = choose_service_fd_base(NULL);
>>> +     BUG_ON(service_fd_base < SERVICE_FD_MAX);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>  int clone_service_fd(struct pstree_item *me)
>>>  {
>>>       int id, new_base, i, ret = -1;
>>> --
>>> 2.18.0.597.ga71716f1ad-goog
>>>
>>> _______________________________________________
>>> CRIU mailing list
>>> CRIU at openvz.org
>>> https://lists.openvz.org/mailman/listinfo/criu
>>
> 


More information about the CRIU mailing list