[CRIU] Options when restoring a socket

Tue Apr 21 13:19:52 PDT 2015

On 04/21/2015 11:08 PM, Ross Boucher wrote:
> In our case, the peer will be gone, and the restored process is in the middle of a
> blocking read.  As such the ideal for us is to receive an error for that read (in
> other words, abort the connection immediately/rst).  We'd also be fine with having
> to specify this specific tcp connection/file descriptor so it knows to do this on 
> restore.

I see. In this case for restore the best that can be done is the closed socket that 
would report EOF (zero) upon read.

Anyway, the extended notion of the --ext-unix-sk should handle this. Unix sockets
C/R code is in sk-unix.c, if you have questions about the codeflow/logic there, just
feel free to ask :)

-- Pavel

> On Tue, Apr 21, 2015 at 12:03 PM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>> wrote:
> 
>     On 04/21/2015 09:54 PM, Ross Boucher wrote:
>     > The tcp restore relies on the other side of the tcp connection still being open
>     > though, right?
> 
>     Well, no. The TCP repair we use doesn't care about it. Just once we did the restore
>     and _if_ the peer is not there, the remote kernel would respond with RST on _any_
>     packet from restored socket and the connection would abort.
> 
>     So if we're talking about restore procedure only -- then criu doesn't care. If we're
>     talking about "successful restore" as the result user wants, then yes, peer should
>     still be alive and criu does everything it can to make it such.
> 
>     > In my case that won't be very easy. It would be much easier for me
>     > to just re-establish the connection if I could notify myself somehow that the
>     > process is restored. (I'm playing around right now with running a timer in another
>     > thread to try and do that...)
> 
>     Unix sockets are not the same as TCP I would say. But, since you're OK with connecting
>     the socket back upon restore, then we can teach CRIU do all this for you.
> 
>     -- Pavel
> 
>     > On Tue, Apr 21, 2015 at 11:52 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
>     >
>     >     On 04/21/2015 07:46 PM, Ross Boucher wrote:
>     >     > Thanks, this is all very interesting. How does the story change for tcp sockets?
>     >
>     >     For TCP we can "lock" the connection so that the peer doesn't see the
>     >     socket gets closed. We either put iptables rule that blocks all the
>     >     packets or (in case of containers) unplug container's virtual NIC from
>     >     the network. So while the connection is locked we can kill the socket,
>     >     then create it back. And the TCP-repair thing helps us "connect" the
>     >     socket back w/o actually doing the connect.
>     >
>     >     For unix socket we don't have ability to "lock" the connection in the
>     >     first place. So once we dumped the task we cannot keep peer from noticing
>     >     this. This thing was the main reason for not implementing this.
>     >
>     >     -- Pavel
>     >
>     >     > On Tue, Apr 21, 2015 at 5:03 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>> <mailto:xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>>> wrote:
>     >     >
>     >     >     On 04/19/2015 04:48 AM, Ross Boucher wrote:
>     >     >     > Hey everyone,
>     >     >
>     >     >     Hi, Ross.
>     >     >
>     >     >     > I've been trying to figure out both what happens when you checkpoint an open socket
>     >     >     > and what my options are for restoring that socket (or maybe doing something else at
>     >     >     > that point in time). It might be best to just describe the program I have and what
>     >     >     > I want to accomplish.
>     >     >     >
>     >     >     > I have two programs communicating over a socket. Program A opens a socket and listens
>     >     >     > for connections, and then program B connects to it. They essentially exchange messages
>     >     >     > forever in a pattern something like:
>     >     >     >
>     >     >     > A -> B send next message
>     >     >     > B -> A ok, here's the next message
>     >     >     >
>     >     >     > Obviously, in between, A performs some actions. The goal is to checkpoint A after each
>     >     >     > message is processed and before the next is received (while leaving the process
>     >     >     > running), so that we can restore to any previous state and reprocess possibly changed
>     >     >     > messages.
>     >     >
>     >     >     First thing that comes to mind is that --track-mem thing definitely makes sense for
>     >     >     such frequent C/R-s. But that's a side note :)
>     >     >
>     >     >     > It's completely fine for our use case to have to re-establish that socket connection,
>     >     >     > we don't actually need or want to try and magically use the same socket (since program
>     >     >     > B has probably moved on to other things in between).
>     >     >
>     >     >     Hm.. OK.
>     >     >
>     >     >     > Is this a use case for a criu plugin?
>     >     >
>     >     >     Well, I would say it is, but there are two things about it. First is that we don't have any
>     >     >     hooks in CRIU code for sockets, so patching will be required. And the second is -- many
>     >     >     people are asking for handling the connected unix socket, so I think we'd better patch
>     >     >     criu itself to provide some sane API rather than make everybody invent their own plugins :)
>     >     >
>     >     >     So, I see two options for such an API.
>     >     >
>     >     >     The first is to extend the --ext-unix-sk to accept socket ID as an argument that would
>     >     >     force even stream sockets to be marked as "external" and not block the dump. On restore
>     >     >     the --ext-unix-sk with $ID would make CRIU connect the restored unix socket back to its
>     >     >     original path. Optionally we can make the argument look like $ID:$PATH and connect to
>     >     >     $PATH instead.
>     >     >
>     >     >     The other option would be to still teach dump accept the --ext-unix-sk $ID parameter. On
>     >     >     restore you can create the connection yourself and then pass it into CRIU with --inherit-fd
>     >     >     argument. We already have fd inheritance for pipes and files [1], but we can patch CRIU
>     >     >     to support it for sockets too.
>     >     >
>     >     >     [1] http://criu.org/Inheriting_FDs_on_restore
>     >     >
>     >     >     > I've tried playing around with the ext-unix-sk flag but I haven't quite figured anything
>     >     >     > out yet.
>     >     >
>     >     >     The --ext-unix-sk is for datagram sockets as they are stateless and we can just close and
>     >     >     re-open one back w/o any risk that the peer notices it.
>     >     >
>     >     >     > Any help would be appreciated. Thanks!
>     >     >
>     >     >     -- Pavel
>     >     >
>     >     >
>     >
>     >
> 
>