[CRIU] Options when restoring a socket

Tue Apr 21 13:08:46 PDT 2015

In our case, the peer will be gone, and the restored process is in the
middle of a blocking read.  As such the ideal for us is to receive an error
for that read (in other words, abort the connection immediately/rst).  We'd
also be fine with having to specify this specific tcp connection/file
descriptor so it knows to do this on restore.

On Tue, Apr 21, 2015 at 12:03 PM, Pavel Emelyanov <xemul at parallels.com>
wrote:

> On 04/21/2015 09:54 PM, Ross Boucher wrote:
> > The tcp restore relies on the other side of the tcp connection still
> being open
> > though, right?
>
> Well, no. The TCP repair we use doesn't care about it. Just once we did
> the restore
> and _if_ the peer is not there, the remote kernel would respond with RST
> on _any_
> packet from restored socket and the connection would abort.
>
> So if we're talking about restore procedure only -- then criu doesn't
> care. If we're
> talking about "successful restore" as the result user wants, then yes,
> peer should
> still be alive and criu does everything it can to make it such.
>
> > In my case that won't be very easy. It would be much easier for me
> > to just re-establish the connection if I could notify myself somehow
> that the
> > process is restored. (I'm playing around right now with running a timer
> in another
> > thread to try and do that...)
>
> Unix sockets are not the same as TCP I would say. But, since you're OK
> with connecting
> the socket back upon restore, then we can teach CRIU do all this for you.
>
> -- Pavel
>
> > On Tue, Apr 21, 2015 at 11:52 AM, Pavel Emelyanov <xemul at parallels.com
> <mailto:xemul at parallels.com>> wrote:
> >
> >     On 04/21/2015 07:46 PM, Ross Boucher wrote:
> >     > Thanks, this is all very interesting. How does the story change
> for tcp sockets?
> >
> >     For TCP we can "lock" the connection so that the peer doesn't see the
> >     socket gets closed. We either put iptables rule that blocks all the
> >     packets or (in case of containers) unplug container's virtual NIC
> from
> >     the network. So while the connection is locked we can kill the
> socket,
> >     then create it back. And the TCP-repair thing helps us "connect" the
> >     socket back w/o actually doing the connect.
> >
> >     For unix socket we don't have ability to "lock" the connection in the
> >     first place. So once we dumped the task we cannot keep peer from
> noticing
> >     this. This thing was the main reason for not implementing this.
> >
> >     -- Pavel
> >
> >     > On Tue, Apr 21, 2015 at 5:03 AM, Pavel Emelyanov <
> xemul at parallels.com <mailto:xemul at parallels.com> <mailto:
> xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
> >     >
> >     >     On 04/19/2015 04:48 AM, Ross Boucher wrote:
> >     >     > Hey everyone,
> >     >
> >     >     Hi, Ross.
> >     >
> >     >     > I've been trying to figure out both what happens when you
> checkpoint an open socket
> >     >     > and what my options are for restoring that socket (or maybe
> doing something else at
> >     >     > that point in time). It might be best to just describe the
> program I have and what
> >     >     > I want to accomplish.
> >     >     >
> >     >     > I have two programs communicating over a socket. Program A
> opens a socket and listens
> >     >     > for connections, and then program B connects to it. They
> essentially exchange messages
> >     >     > forever in a pattern something like:
> >     >     >
> >     >     > A -> B send next message
> >     >     > B -> A ok, here's the next message
> >     >     >
> >     >     > Obviously, in between, A performs some actions. The goal is
> to checkpoint A after each
> >     >     > message is processed and before the next is received (while
> leaving the process
> >     >     > running), so that we can restore to any previous state and
> reprocess possibly changed
> >     >     > messages.
> >     >
> >     >     First thing that comes to mind is that --track-mem thing
> definitely makes sense for
> >     >     such frequent C/R-s. But that's a side note :)
> >     >
> >     >     > It's completely fine for our use case to have to
> re-establish that socket connection,
> >     >     > we don't actually need or want to try and magically use the
> same socket (since program
> >     >     > B has probably moved on to other things in between).
> >     >
> >     >     Hm.. OK.
> >     >
> >     >     > Is this a use case for a criu plugin?
> >     >
> >     >     Well, I would say it is, but there are two things about it.
> First is that we don't have any
> >     >     hooks in CRIU code for sockets, so patching will be required.
> And the second is -- many
> >     >     people are asking for handling the connected unix socket, so I
> think we'd better patch
> >     >     criu itself to provide some sane API rather than make
> everybody invent their own plugins :)
> >     >
> >     >     So, I see two options for such an API.
> >     >
> >     >     The first is to extend the --ext-unix-sk to accept socket ID
> as an argument that would
> >     >     force even stream sockets to be marked as "external" and not
> block the dump. On restore
> >     >     the --ext-unix-sk with $ID would make CRIU connect the
> restored unix socket back to its
> >     >     original path. Optionally we can make the argument look like
> $ID:$PATH and connect to
> >     >     $PATH instead.
> >     >
> >     >     The other option would be to still teach dump accept the
> --ext-unix-sk $ID parameter. On
> >     >     restore you can create the connection yourself and then pass
> it into CRIU with --inherit-fd
> >     >     argument. We already have fd inheritance for pipes and files
> [1], but we can patch CRIU
> >     >     to support it for sockets too.
> >     >
> >     >     [1] http://criu.org/Inheriting_FDs_on_restore
> >     >
> >     >     > I've tried playing around with the ext-unix-sk flag but I
> haven't quite figured anything
> >     >     > out yet.
> >     >
> >     >     The --ext-unix-sk is for datagram sockets as they are
> stateless and we can just close and
> >     >     re-open one back w/o any risk that the peer notices it.
> >     >
> >     >     > Any help would be appreciated. Thanks!
> >     >
> >     >     -- Pavel
> >     >
> >     >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150421/0753e8e0/attachment.html>