[Devel] [PATCH] spfs: Main process wakes and kills its children on exit

Stanislav Kinsburskiy skinsbursky at virtuozzo.com
Tue Jan 23 13:24:15 MSK 2018



23.01.2018 11:19, Kirill Tkhai пишет:
> On 23.01.2018 13:18, Stanislav Kinsburskiy wrote:
>>
>>
>> 23.01.2018 11:17, Kirill Tkhai пишет:
>>> On 23.01.2018 13:14, Stanislav Kinsburskiy wrote:
>>>>
>>>>
>>>> 23.01.2018 11:09, Kirill Tkhai пишет:
>>>>> On 23.01.2018 13:07, Stanislav Kinsburskiy wrote:
>>>>>>
>>>>>>
>>>>>> 23.01.2018 10:51, Kirill Tkhai пишет:
>>>>>>> On 23.01.2018 12:48, Stanislav Kinsburskiy wrote:
>>>>>>>> Please, see a couple of nits below
>>>>>>>>
>>>>>>>> 23.01.2018 10:41, Kirill Tkhai пишет:
>>>>>>>>> Stanislav Kinsburskiy says:
>>>>>>>>>
>>>>>>>>> "SPFS manager has a special "--exit-with-spfs" options, which is used by CRIU.
>>>>>>>>>  The idea of the option is simple: force SPFS manager to exit, when it has some
>>>>>>>>>  SPFS processes among its children (i.e. spfs was mounted at least once),
>>>>>>>>>  but all these processes have exited for whatever reason (which usually happens
>>>>>>>>>  when restore has failed and spfs mounts where unmounted).
>>>>>>>>>  Although it works in overall (main SPFS manager process exits), its children
>>>>>>>>>  (responsible to SPFS replacement) may wait on FUTEX for "release" command
>>>>>>>>>  for corresponding SPFS mount and thus never stop until they are killed".
>>>>>>>>>
>>>>>>>>> 1 spfs-manager
>>>>>>>>> 2   \_ spfs
>>>>>>>>> 3   \_ spfs-manager
>>>>>>>>> 4   \_ spfs
>>>>>>>>> 5   \_ spfs-manager
>>>>>>>>>
>>>>>>>>> 2 and 3 are pair of a mount, and 4 and 5 are pair of another mount.
>>>>>>>>> The patch makes spfs-manager 1 kill 3 in case of 2 exited.
>>>>>>>>>
>>>>>>>>> https://jira.sw.ru/browse/PSBM-80055
>>>>>>>>>
>>>>>>>>> Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>>>>>>>>> ---
>>>>>>>>>  manager/context.c |    4 ++++
>>>>>>>>>  manager/spfs.c    |    1 +
>>>>>>>>>  2 files changed, 5 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/manager/context.c b/manager/context.c
>>>>>>>>> index 1eb37c9..4464a23 100644
>>>>>>>>> --- a/manager/context.c
>>>>>>>>> +++ b/manager/context.c
>>>>>>>>> @@ -53,6 +53,9 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
>>>>>>>>>  		/* SPFS master was killed. We need to release the reference */
>>>>>>>>>  		spfs_release_mnt(info);
>>>>>>>>>  
>>>>>>>>> +	if (killed || WEXITSTATUS(status))
>>>>>>>>> +		kill(info->replacer, SIGKILL);
>>>>>>>>> +
>>>>>>>>
>>>>>>>> There is "if (killed)" check above.
>>>>>>>> Could you please move this hunk there?
>>>>>>>
>>>>>>> There is logical OR (||) in the hunk. How should I move it to unconditional check?
>>>>>>>
>>>>>>
>>>>>> Ah, ok. Then let the check be as it is.
>>>>>> Could you please add warning message for this kill with the process pid being killed and the reason why (spfs was killed of exited with error)?
>>>>>
>>>>> Maybe we should call spfs_release_mnt() in case of exit status != 0 too?
>>>>>
>>>>
>>>> Well, this makes sense.
>>>> If spfs exited with non-zero result (either it was killed or exited due to some error), then there is no need in replacer. And there is no need in the reference.
>>>> So, probably this check you proposed should be used for both spfs_release_mnt(info) and kill(info->replacer, SIGKILL).
>>>
>>> Are you OK with this change sent in the only patch, or should we use one more patch?
>>>
>>
>> I think one patch is OK. Only mention the change explicitly in the patch description, please.
> 
> Ok, then what about spfs_cleanup_env(info, killed)? It also has killed argument
> 

Well, this is also true...
Maybe the right thing here would be to have 2 patches, and the first one should be something like this (also rename variable to reflect the change):


diff --git a/manager/context.c b/manager/context.c
index 1eb37c9..756971b 100644
--- a/manager/context.c
+++ b/manager/context.c
@@ -45,12 +45,12 @@ const char *mgr_ovz_id(void)
 static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
                               struct spfs_info_s *info, int status)
 {
-       bool killed = WIFSIGNALED(status);
+       bool failed = WIFSIGNALED(status) || WEXITSTATUS(status);
 
        pr_debug("removing info %s from the list\n", info->mnt.id);
 
-       if (killed)
-               /* SPFS master was killed. We need to release the reference */
+       if (failed)
+               /* SPFS master has failed. We need to release the reference */
                spfs_release_mnt(info);
 
        info->dead = true;
@@ -59,7 +59,7 @@ static void cleanup_spfs_mount(struct spfs_manager_context_s *ctx,
        if (unlink(info->socket_path))
                pr_perror("failed to unlink %s", info->socket_path);
 
-       spfs_cleanup_env(info, killed);
+       spfs_cleanup_env(info, failed);
 
        close_namespaces(info->ns_fds);
 }
diff --git a/manager/spfs.c b/manager/spfs.c
index 3e0f667..99845b1 100644
--- a/manager/spfs.c
+++ b/manager/spfs.c
@@ -409,9 +409,9 @@ int spfs_prepare_env(struct spfs_info_s *info, const char *proxy_dir)
        return err ? err : res;
 }
 
-static int __spfs_cleanup_env(struct spfs_info_s *info, bool killed)
+static int __spfs_cleanup_env(struct spfs_info_s *info, bool failed)
 {
-       if (killed && umount(info->work_dir)) {
+       if (failed && umount(info->work_dir)) {
                pr_perror("failed to umount %s", info->work_dir);
                return -errno;
        }
@@ -423,7 +423,7 @@ static int __spfs_cleanup_env(struct spfs_info_s *info, bool killed)
        return 0;
 }
 
-int spfs_cleanup_env(struct spfs_info_s *info, bool killed)
+int spfs_cleanup_env(struct spfs_info_s *info, bool failed)
 {
        int err, res;
        unsigned orig_ns_mask;
@@ -432,7 +432,7 @@ int spfs_cleanup_env(struct spfs_info_s *info, bool killed)
        if (res)
                return res;
 
-       err = __spfs_cleanup_env(info, killed);
+       err = __spfs_cleanup_env(info, failed);
 
        res = leave_spfs_context(info, orig_ns_mask);
 




More information about the Devel mailing list