[CRIU] Dumping tmux session with a client connected

Pavel Emelyanov xemul at parallels.com
Tue Apr 14 05:44:39 PDT 2015


OK, so here's the "external" socket found by CRIU:

> (00.020385) 	Ext stream not supported: ino 0xceff peer_ino 0xcf00 family    1 type    1 state  1 name (null)

This is (decimal) 52991 and 52992. In the ss -xup output these sockets are here:

> u_str  ESTAB      0      0                    * 52991                 * 52992   users:(("tmux",16447,6))
> u_str  ESTAB      0      0                    * 52992                 * 52991   users:(("tmux",16449,7))

And they go between tmux-s 16447 and 16449, here they are in the tree:

> 16411 ?        Ss     0:00  \_ sshd: ac [priv]     
> 16430 ?        S      0:00  |   \_ sshd: ac at pts/3      
> 16431 pts/3    Ss     0:00  |       \_ -bash
> 16447 pts/3    S+     0:00  |           \_ tmux                    <<<<< the first
> 16448 pts/3    Z+     0:00  |               \_ [tmux] <defunct>

> 16449 ?        Ss     0:00 tmux                                    <<<<< the second
> 16450 pts/4    Ss     0:00  \_ -bash
> 16464 pts/4    S+     0:00      \_ top

You're trying to dump the 16449 one and CRIU validly says that there's an open connection
to the first one. So, theoretically, we can resolve this situation, and in order to make
CRIU dump this guy we have several things to do. However, I don't know how tmux works, so 
I will just describe the general idea and ask you to suggest what can be done on tmux side 
to address this :)


First of all, we need to somehow explain to the 16447 that the connection in question will
probably (see below) go away and will go back when we will restore the tmux. By saying
"probably" I mean, that if we do just "criu dump", then criu will kill 16449 and the 16447
will see EOF on its socket. If we say "criu dump --leave-running" then after dump criu will
not kill tasks and the connection will be kept open, but then we MAY have problems restoring
them.

So this is the tmux part. As I told, I don't know how tmux works, so I cannot say for sure
what can be done about it.

On the CRIU side we will have to do two things. First, explain on dump that the external
stream socket with the given number (or some other ID) is OK and CRIU should just dump it
and do not worry about connection loss. This can be done via CLI option, e.g. by extending
the existing one: --ext-unix-sk $ID. I'm perfectly OK with such a patch with "use at your
own risk" comment :)

On restore we will have to establish this connection back. I see two options for this.

The first is if we add CLI option to CRIU saying that --ext-unix-sk $ID=$PATH_TO_CONNECT
and while restoring CRIU will call connect() for the $ID socket with the $PATH_... path.
I'm also fine with such a patch :)

The 2nd option seems more generic to me, but would require patching as well. It's to use
the --inherit-fd option. The steps to restore would be like this then -- the CRIU caller
opens the connection to tmux itself, does all the necessary initialization or handshake.
Then it fork()+exec() CRIU and via --inherit-fd $ID=$FD tells CRIU that the unix socket
$ID is to be taken from (inherited) fdtable by number $FD. You can read more about inherit
fds here [1]. Also note, that unix sockets cannot yet be inherited, this is why I told
that it would require patching too :)

Sorry for too-many-words :) What do you think, can we do something like above with tmux?

-- Pavel

[1] http://criu.org/Inheriting_FDs_on_restore


More information about the CRIU mailing list