[CRIU] [PATCH 4/8] memory: don't use parent memdump if detected possible pid reuse

Dmitry Safonov 0x7f454c46 at gmail.com
Fri Feb 9 20:01:54 MSK 2018


2018-02-09 16:06 GMT+00:00 Pavel Tikhomirov <ptikhomirov at virtuozzo.com>:
> We have a problem when a pid is reused between consequent dumps we can't
> understand if pagemap and pages from images of parent dump are invalid
> to restore these pid already. That can lead even to wrong memory
> restored for these pid, see the test in last patch.
>
> So these is a try do separate processes with (likely) invalid previous
> memory dump from processes with 100% valid previous dump.
>
> For that we use the value of /proc/<pid>/stat's start_time and also the
> timestamp of each (pre)dump. If the start time is strictly less than the
> timestamp, that means that the pagemap for these pid from previous dump
> is valid - was done for exactly the same process.
>
> Creation time is in centiseconds by default so if predump is really fast
> (<1csec) we can have false negative decisions for some processes, but in
> case of long running processes we are fine.
>
> https://jira.sw.ru/browse/PSBM-67502
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
>  criu/mem.c | 37 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/criu/mem.c b/criu/mem.c
> index 4c6942a11..355c992c7 100644
> --- a/criu/mem.c
> +++ b/criu/mem.c
> @@ -30,9 +30,11 @@
>  #include "fault-injection.h"
>  #include "prctl.h"
>  #include <compel/compel.h>
> +#include "proc_parse.h"
>
>  #include "protobuf.h"
>  #include "images/pagemap.pb-c.h"
> +#include "images/stats.pb-c.h"
>
>  static int task_reset_dirty_track(int pid)
>  {
> @@ -303,6 +305,7 @@ static int __parasite_dump_pages_seized(struct pstree_item *item,
>         int ret = -1;
>         unsigned cpp_flags = 0;
>         unsigned long pmc_size;
> +       bool possible_pid_reuse = false;
>
>         if (opts.check_only)
>                 return 0;
> @@ -360,6 +363,38 @@ static int __parasite_dump_pages_seized(struct pstree_item *item,
>                         xfer.parent = NULL + 1;
>         }
>
> +       if (xfer.parent) {
> +               struct proc_pid_stat pps_buf;
> +               StatsEntry *stats = NULL;
> +               unsigned long dump_ticks;
> +               unsigned long clock_ticks;
> +
> +               clock_ticks = sysconf(_SC_CLK_TCK);
> +               if (clock_ticks == -1) {
> +                       pr_perror("Failed to get clock ticks via sysconf");
> +                       goto out_xfer;
> +               }
> +
> +               ret = parse_pid_stat(item->pid->real, &pps_buf);
> +               if (ret < 0)
> +                       goto out_xfer;
> +
> +               ret = get_parent_stats((void**)&stats);
> +               if (ret < 0)
> +                       goto out_xfer;
> +               dump_ticks = stats->dump->dump_uptime/(USEC_PER_SEC / clock_ticks);
> +               stats_entry__free_unpacked(stats, NULL);
> +
> +               if (pps_buf.start_time >= dump_ticks) {
> +                       pr_warn("Detected possible pid reuse pid=%d, " \
> +                               "start_time=%llu, parent's dump_uptime=%lu\n",
> +                               item->pid->real, pps_buf.start_time,
> +                               dump_ticks);
> +                       possible_pid_reuse = true;

What the meaning of this warning in logs?
Can we separate the two cases:
1. pps_buf.start_time > dump_ticks
    Real pid-reuse, silently re-dumping pid.
2. pps_buf.start_time == dump_ticks)
    Warn that the reuse is possible and re-dump.

For (2) we may really be interested how often it's happening
because if it happens way too often - we might be interested
in improving this detection.. Like inserting 1csec delay before
saving uptime.

-- 
             Dmitry


More information about the CRIU mailing list