[CRIU] Options when restoring a socket

Tue Apr 21 11:52:26 PDT 2015

On 04/21/2015 07:46 PM, Ross Boucher wrote:
> Thanks, this is all very interesting. How does the story change for tcp sockets?

For TCP we can "lock" the connection so that the peer doesn't see the
socket gets closed. We either put iptables rule that blocks all the
packets or (in case of containers) unplug container's virtual NIC from
the network. So while the connection is locked we can kill the socket,
then create it back. And the TCP-repair thing helps us "connect" the
socket back w/o actually doing the connect.

For unix socket we don't have ability to "lock" the connection in the
first place. So once we dumped the task we cannot keep peer from noticing
this. This thing was the main reason for not implementing this.

-- Pavel

> On Tue, Apr 21, 2015 at 5:03 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>> wrote:
> 
>     On 04/19/2015 04:48 AM, Ross Boucher wrote:
>     > Hey everyone,
> 
>     Hi, Ross.
> 
>     > I've been trying to figure out both what happens when you checkpoint an open socket
>     > and what my options are for restoring that socket (or maybe doing something else at
>     > that point in time). It might be best to just describe the program I have and what
>     > I want to accomplish.
>     >
>     > I have two programs communicating over a socket. Program A opens a socket and listens
>     > for connections, and then program B connects to it. They essentially exchange messages
>     > forever in a pattern something like:
>     >
>     > A -> B send next message
>     > B -> A ok, here's the next message
>     >
>     > Obviously, in between, A performs some actions. The goal is to checkpoint A after each
>     > message is processed and before the next is received (while leaving the process
>     > running), so that we can restore to any previous state and reprocess possibly changed
>     > messages.
> 
>     First thing that comes to mind is that --track-mem thing definitely makes sense for
>     such frequent C/R-s. But that's a side note :)
> 
>     > It's completely fine for our use case to have to re-establish that socket connection,
>     > we don't actually need or want to try and magically use the same socket (since program
>     > B has probably moved on to other things in between).
> 
>     Hm.. OK.
> 
>     > Is this a use case for a criu plugin?
> 
>     Well, I would say it is, but there are two things about it. First is that we don't have any
>     hooks in CRIU code for sockets, so patching will be required. And the second is -- many
>     people are asking for handling the connected unix socket, so I think we'd better patch
>     criu itself to provide some sane API rather than make everybody invent their own plugins :)
> 
>     So, I see two options for such an API.
> 
>     The first is to extend the --ext-unix-sk to accept socket ID as an argument that would
>     force even stream sockets to be marked as "external" and not block the dump. On restore
>     the --ext-unix-sk with $ID would make CRIU connect the restored unix socket back to its
>     original path. Optionally we can make the argument look like $ID:$PATH and connect to
>     $PATH instead.
> 
>     The other option would be to still teach dump accept the --ext-unix-sk $ID parameter. On
>     restore you can create the connection yourself and then pass it into CRIU with --inherit-fd
>     argument. We already have fd inheritance for pipes and files [1], but we can patch CRIU
>     to support it for sockets too.
> 
>     [1] http://criu.org/Inheriting_FDs_on_restore
> 
>     > I've tried playing around with the ext-unix-sk flag but I haven't quite figured anything
>     > out yet.
> 
>     The --ext-unix-sk is for datagram sockets as they are stateless and we can just close and
>     re-open one back w/o any risk that the peer notices it.
> 
>     > Any help would be appreciated. Thanks!
> 
>     -- Pavel
> 
>