[CRIU] [PATCH] restore: Serialize access to last_pid
Andrey Vagin
avagin at gmail.com
Sat Nov 16 21:10:14 MSK 2019
On Fri, Nov 15, 2019 at 05:33:34PM +0300, Cyrill Gorcunov wrote:
> When we do clone threads in a later stage of
> restore procedure it may race with helpers
> which do call clone_noasan by self.
>
> In particular when restoring mount points
> we do use try_remount_writable which is called
> via call_helper_process and produce own fork.
>
> Thus to eliminate the race which happens that
> later lets allocate clone_noasan context via
> shared memory on restore and then once
> processes are forked we pass @last_pid_mutex
> to the clone_noasan engine and all subsequent
> calls to clone_noasan will be serialized with
> threads creation.
>
> To be honest I have no clue right now how to
> provide some kind of test for this stuff. Our
> QA team reported the issue as
>
> | (07.398000) pie: 20768: Error (criu/pie/restorer.c:1823): Unable to create a thread 20770: 20775
> | (07.398013) pie: 20768: Error (criu/pie/restorer.c:1828): Reread last_pid 20775
>
> where we wrote 20769 into last_pid inside pie thread
> restore , it failed and re-reading the last_pid returned
> 20775 instead of expecting value, which means there is
> other forks happen inbetween without taking last_pid
> lock.
>
> Also I'm in big doubt this architectural solution, don't
> like it much, it was rather a fast fix which address
> the problem. So any other ideas are welcome.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
> ---
> criu/clone-noasan.c | 63 ++++++++++++++++++++++++++++++++++++-
> criu/cr-restore.c | 6 ++++
> criu/include/clone-noasan.h | 5 +++
> 3 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/criu/clone-noasan.c b/criu/clone-noasan.c
> index 5ca280eb8386..627d924d5346 100644
> --- a/criu/clone-noasan.c
> +++ b/criu/clone-noasan.c
> @@ -1,8 +1,65 @@
> #include <sched.h>
> +#include <sys/mman.h>
> #include "common/compiler.h"
> +#include "clone-noasan.h"
> #include "log.h"
> #include "common/bug.h"
>
> +#undef LOG_PREFIX
> +#define LOG_PREFIX "clone_noasan: "
> +
> +static struct {
> + mutex_t op_mutex;
> + mutex_t *clone_mutex;
> +} *context;
> +
> +int clone_noasan_init(void)
> +{
> + context = mmap(NULL, sizeof(*context), PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_SHARED, -1, 0);
> + if (context == MAP_FAILED) {
> + pr_perror("Can't allocate context");
> + return -1;
> + }
> +
> + mutex_init(&context->op_mutex);
> + return 0;
> +}
> +
> +void clone_noasan_fini(void)
> +{
> + munmap(context, sizeof(*context));
> + context = NULL;
> +}
> +
> +static inline void context_lock(void)
> +{
> + if (context && context->clone_mutex)
> + mutex_lock(context->clone_mutex);
> +}
> +
> +static inline void context_unlock(void)
> +{
context_lock has to return whether a mutex has been locked or not:
locked = context_lock();
....
context_unlock(locked)l
otherwise you can call mutex_unlock for the unlocked mutex or unlock a
mutex which has been locked by someone else.
Actually, all this looks like overdesign...
I think we don't need context and another shared vma.
static struct *last_pid_mutex;
void clone_noasan_init(struct *last_pid_mu) {
last_pid_mutex = last_pid_mu;
}
and then call clone_noasan_init before forking the root task. Hmm?
More information about the CRIU
mailing list