[CRIU] Restore may fail due to PID number overflow

Evgenii Shatokhin eshatokhin at virtuozzo.com
Sat Feb 20 06:22:06 PST 2016


Hi,

When CRIU is used to checkpoint and then restore a number of processes 
(live zdtm tests, actually) running in their own pid namespace, restore 
fails with the following error in rare cases:

    Error (cr-restore.c:1573): Pid 300 do not match expected 32768

I am new to CRIU and cannot say right now how to fix this properly, so 
your suggestions are appreciated.

As far as I can see in the code, the problem is in
pstree.c, prepare_pstree_ids():
-------------
	/* Try to find helpers, who should be connected to the leader */
	list_for_each_entry(child, &helpers, sibling) {
		if (child->state != TASK_HELPER)
			continue;

		if (child->sid != item->sid)
			continue;

		child->pgid = item->pgid;
		child->pid.virt = ++max_pid;
		child->parent = item;
		list_move(&child->sibling, &item->children);

		pr_info("Attach %d to the task %d\n",
				child->pid.virt, item->pid.virt);

		break;
	}
-------------

max_pid may become 32768 after the increment, and this value is saved in 
child->pid.virt.

However, when that process is spawned, the OS cannot give it PID number 
greater than or equal to the maximum (/proc/sys/kernel/pid_max contains 
32768 in that case). Thus the OS gives it the smallest unused PID number 
not less than 300, as it should.

When that process executes restore_task_with_children() (cr-restore.c), 
it compares its stored and real PID numbers, sees the mismatch and 
reports failure:
-------------
pid = getpid();
if (current->pid.virt != pid) {
	pr_err("Pid %d do not match expected %d\n", pid,
		current->pid.virt);
	set_task_cr_err(EEXIST);
	goto err;
}
-------------

If I hack prepare_pstree_ids() as follows, the problem is gone, but, 
obviously, this is not a proper solution:
-------------
                         child->pgid = item->pgid;
-                       child->pid.virt = ++max_pid;
+
+                       max_pid++;
+                       if (max_pid == 32768)
+                               max_pid = 300;
+
+                       child->pid.virt = max_pid;
                         child->parent = item;
-------------

As for the maximum value of PIDs, one can get it from 
/proc/sys/kernel/pid_max, I suppose.

The tricky part is how to find the smallest unused PID number >= 300 at 
that point. Any ideas?

Regards,
Evgenii


More information about the CRIU mailing list