[CRIU] Options when restoring a socket

Pavel Emelyanov xemul at parallels.com
Tue Apr 21 12:03:05 PDT 2015


On 04/21/2015 09:54 PM, Ross Boucher wrote:
> The tcp restore relies on the other side of the tcp connection still being open
> though, right?

Well, no. The TCP repair we use doesn't care about it. Just once we did the restore
and _if_ the peer is not there, the remote kernel would respond with RST on _any_ 
packet from restored socket and the connection would abort.

So if we're talking about restore procedure only -- then criu doesn't care. If we're
talking about "successful restore" as the result user wants, then yes, peer should
still be alive and criu does everything it can to make it such.

> In my case that won't be very easy. It would be much easier for me
> to just re-establish the connection if I could notify myself somehow that the
> process is restored. (I'm playing around right now with running a timer in another 
> thread to try and do that...)

Unix sockets are not the same as TCP I would say. But, since you're OK with connecting
the socket back upon restore, then we can teach CRIU do all this for you.

-- Pavel

> On Tue, Apr 21, 2015 at 11:52 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>> wrote:
> 
>     On 04/21/2015 07:46 PM, Ross Boucher wrote:
>     > Thanks, this is all very interesting. How does the story change for tcp sockets?
> 
>     For TCP we can "lock" the connection so that the peer doesn't see the
>     socket gets closed. We either put iptables rule that blocks all the
>     packets or (in case of containers) unplug container's virtual NIC from
>     the network. So while the connection is locked we can kill the socket,
>     then create it back. And the TCP-repair thing helps us "connect" the
>     socket back w/o actually doing the connect.
> 
>     For unix socket we don't have ability to "lock" the connection in the
>     first place. So once we dumped the task we cannot keep peer from noticing
>     this. This thing was the main reason for not implementing this.
> 
>     -- Pavel
> 
>     > On Tue, Apr 21, 2015 at 5:03 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
>     >
>     >     On 04/19/2015 04:48 AM, Ross Boucher wrote:
>     >     > Hey everyone,
>     >
>     >     Hi, Ross.
>     >
>     >     > I've been trying to figure out both what happens when you checkpoint an open socket
>     >     > and what my options are for restoring that socket (or maybe doing something else at
>     >     > that point in time). It might be best to just describe the program I have and what
>     >     > I want to accomplish.
>     >     >
>     >     > I have two programs communicating over a socket. Program A opens a socket and listens
>     >     > for connections, and then program B connects to it. They essentially exchange messages
>     >     > forever in a pattern something like:
>     >     >
>     >     > A -> B send next message
>     >     > B -> A ok, here's the next message
>     >     >
>     >     > Obviously, in between, A performs some actions. The goal is to checkpoint A after each
>     >     > message is processed and before the next is received (while leaving the process
>     >     > running), so that we can restore to any previous state and reprocess possibly changed
>     >     > messages.
>     >
>     >     First thing that comes to mind is that --track-mem thing definitely makes sense for
>     >     such frequent C/R-s. But that's a side note :)
>     >
>     >     > It's completely fine for our use case to have to re-establish that socket connection,
>     >     > we don't actually need or want to try and magically use the same socket (since program
>     >     > B has probably moved on to other things in between).
>     >
>     >     Hm.. OK.
>     >
>     >     > Is this a use case for a criu plugin?
>     >
>     >     Well, I would say it is, but there are two things about it. First is that we don't have any
>     >     hooks in CRIU code for sockets, so patching will be required. And the second is -- many
>     >     people are asking for handling the connected unix socket, so I think we'd better patch
>     >     criu itself to provide some sane API rather than make everybody invent their own plugins :)
>     >
>     >     So, I see two options for such an API.
>     >
>     >     The first is to extend the --ext-unix-sk to accept socket ID as an argument that would
>     >     force even stream sockets to be marked as "external" and not block the dump. On restore
>     >     the --ext-unix-sk with $ID would make CRIU connect the restored unix socket back to its
>     >     original path. Optionally we can make the argument look like $ID:$PATH and connect to
>     >     $PATH instead.
>     >
>     >     The other option would be to still teach dump accept the --ext-unix-sk $ID parameter. On
>     >     restore you can create the connection yourself and then pass it into CRIU with --inherit-fd
>     >     argument. We already have fd inheritance for pipes and files [1], but we can patch CRIU
>     >     to support it for sockets too.
>     >
>     >     [1] http://criu.org/Inheriting_FDs_on_restore
>     >
>     >     > I've tried playing around with the ext-unix-sk flag but I haven't quite figured anything
>     >     > out yet.
>     >
>     >     The --ext-unix-sk is for datagram sockets as they are stateless and we can just close and
>     >     re-open one back w/o any risk that the peer notices it.
>     >
>     >     > Any help would be appreciated. Thanks!
>     >
>     >     -- Pavel
>     >
>     >
> 
> 



More information about the CRIU mailing list