[CRIU] Fix cpuset restore-in-root bug

Tycho Andersen tycho.andersen at canonical.com
Tue Oct 7 15:43:15 PDT 2014


Hi Pavel,

On Tue, Oct 07, 2014 at 03:58:39PM -0500, Tycho Andersen wrote:
> On Tue, Oct 07, 2014 at 11:15:34PM +0400, Pavel Emelyanov wrote:
> > On 10/07/2014 06:16 PM, Tycho Andersen wrote:
> > > Hi Pavel,
> > > 
> > > On Tue, Oct 07, 2014 at 12:58:34PM +0400, Pavel Emelyanov wrote:
> > >> On 10/06/2014 06:03 PM, Tycho Andersen wrote:
> > >>> Hi all,
> > >>>
> > >>> Based on some discussion, we shouldn't try to do any intelligent copying of any
> > >>> properties anywhere else in the tree. These cpuset properties are special only
> > >>> in the sense that they need to be restored before the move_in_cgroup call, and
> > >>> if the parent cgroups have conflicting options (or empty options), it is best
> > >>> to just let the restore fail.
> > >>
> > >> Patches applied, thanks!
> > >>
> > >>> Also, I noticed when making this patch that I am getting a lot of:
> > >>>
> > >>> (00.092575)    290: Error (util.c:101): Unable to close fd 1016: Bad file descriptor
> > >>
> > >> Ouch. Can you show the full log please?
> > > 
> > > Yes, these are with the current master:
> > > 
> > > http://paste.ubuntu.com/8514798/
> > 
> > Hm... I've run cgroup* tests on recent HEAD and no errors in logs.
> > 
> > > also, 1016 is the cg service fd, so it seems like maybe a recent
> > > reordering broke something? I can do a bisect if you take a look at
> > > the log and nothing obvious pops out.
> > 
> > Yes, please. If you would be able to shed more light on this, that
> > would be awesome.
> 
> Hmm. I tried to bisect with a patch that I know I wasn't seeing this
> problem with, but even that one has this issue now. Something about my
> environment must have changed, I guess? I will investigate more and
> keep you posted.

The patch below fixes the issue for me. Does it make sense? I have no
idea why we/I didn't see this earlier.

Tycho



>From 0f3aad8a78aee56ff7794db33cf76c5d5aae5039 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen at canonical.com>
Date: Tue, 7 Oct 2014 17:25:26 -0500
Subject: [PATCH] restore: don't race when closing cg yard

If CLONE_FILES is set, we only have one copy of the cg yard fd, so we shouldn't
all close it. Further, we should wait to close it until after the FORKING stage
is done, since it is needed during that stage.

If CLONE_FILES is not set, each process can safely close their copy of the fd.

Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
---
 cr-restore.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/cr-restore.c b/cr-restore.c
index fd35bef..c12d28e 100644
--- a/cr-restore.c
+++ b/cr-restore.c
@@ -1461,14 +1461,6 @@ static int restore_task_with_children(void *_arg)
 	if (create_children_and_session())
 		goto err_fini_mnt;
 
-	/*
-	 * This must be done after forking to allow child
-	 * to get the cgroup fd so it can move into the
-	 * correct /tasks file if it is in a different cgroup
-	 * set than its parent
-	 */
-	close_service_fd(CGROUP_YARD);
-
 	if (restore_task_mnt_ns(current))
 		goto err_fini_mnt;
 
@@ -1480,6 +1472,15 @@ static int restore_task_with_children(void *_arg)
 	if (restore_finish_stage(CR_STATE_FORKING) < 0)
 		goto err_fini_mnt;
 
+	/*
+	 * This must be done after forking to allow child
+	 * to get the cgroup fd so it can move into the
+	 * correct /tasks file if it is in a different cgroup
+	 * set than its parent
+	 */
+	if (!(ca->clone_flags & CLONE_FILES) || current->parent == NULL)
+		close_service_fd(CGROUP_YARD);
+
 	if (current->parent == NULL && fini_mnt_ns())
 		goto err;
 
-- 
1.9.1



More information about the CRIU mailing list