[CRIU] Options when restoring a socket
Ross Boucher
rboucher at gmail.com
Tue Apr 21 13:08:46 PDT 2015
In our case, the peer will be gone, and the restored process is in the
middle of a blocking read. As such the ideal for us is to receive an error
for that read (in other words, abort the connection immediately/rst). We'd
also be fine with having to specify this specific tcp connection/file
descriptor so it knows to do this on restore.
On Tue, Apr 21, 2015 at 12:03 PM, Pavel Emelyanov <xemul at parallels.com>
wrote:
> On 04/21/2015 09:54 PM, Ross Boucher wrote:
> > The tcp restore relies on the other side of the tcp connection still
> being open
> > though, right?
>
> Well, no. The TCP repair we use doesn't care about it. Just once we did
> the restore
> and _if_ the peer is not there, the remote kernel would respond with RST
> on _any_
> packet from restored socket and the connection would abort.
>
> So if we're talking about restore procedure only -- then criu doesn't
> care. If we're
> talking about "successful restore" as the result user wants, then yes,
> peer should
> still be alive and criu does everything it can to make it such.
>
> > In my case that won't be very easy. It would be much easier for me
> > to just re-establish the connection if I could notify myself somehow
> that the
> > process is restored. (I'm playing around right now with running a timer
> in another
> > thread to try and do that...)
>
> Unix sockets are not the same as TCP I would say. But, since you're OK
> with connecting
> the socket back upon restore, then we can teach CRIU do all this for you.
>
> -- Pavel
>
> > On Tue, Apr 21, 2015 at 11:52 AM, Pavel Emelyanov <xemul at parallels.com
> <mailto:xemul at parallels.com>> wrote:
> >
> > On 04/21/2015 07:46 PM, Ross Boucher wrote:
> > > Thanks, this is all very interesting. How does the story change
> for tcp sockets?
> >
> > For TCP we can "lock" the connection so that the peer doesn't see the
> > socket gets closed. We either put iptables rule that blocks all the
> > packets or (in case of containers) unplug container's virtual NIC
> from
> > the network. So while the connection is locked we can kill the
> socket,
> > then create it back. And the TCP-repair thing helps us "connect" the
> > socket back w/o actually doing the connect.
> >
> > For unix socket we don't have ability to "lock" the connection in the
> > first place. So once we dumped the task we cannot keep peer from
> noticing
> > this. This thing was the main reason for not implementing this.
> >
> > -- Pavel
> >
> > > On Tue, Apr 21, 2015 at 5:03 AM, Pavel Emelyanov <
> xemul at parallels.com <mailto:xemul at parallels.com> <mailto:
> xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
> > >
> > > On 04/19/2015 04:48 AM, Ross Boucher wrote:
> > > > Hey everyone,
> > >
> > > Hi, Ross.
> > >
> > > > I've been trying to figure out both what happens when you
> checkpoint an open socket
> > > > and what my options are for restoring that socket (or maybe
> doing something else at
> > > > that point in time). It might be best to just describe the
> program I have and what
> > > > I want to accomplish.
> > > >
> > > > I have two programs communicating over a socket. Program A
> opens a socket and listens
> > > > for connections, and then program B connects to it. They
> essentially exchange messages
> > > > forever in a pattern something like:
> > > >
> > > > A -> B send next message
> > > > B -> A ok, here's the next message
> > > >
> > > > Obviously, in between, A performs some actions. The goal is
> to checkpoint A after each
> > > > message is processed and before the next is received (while
> leaving the process
> > > > running), so that we can restore to any previous state and
> reprocess possibly changed
> > > > messages.
> > >
> > > First thing that comes to mind is that --track-mem thing
> definitely makes sense for
> > > such frequent C/R-s. But that's a side note :)
> > >
> > > > It's completely fine for our use case to have to
> re-establish that socket connection,
> > > > we don't actually need or want to try and magically use the
> same socket (since program
> > > > B has probably moved on to other things in between).
> > >
> > > Hm.. OK.
> > >
> > > > Is this a use case for a criu plugin?
> > >
> > > Well, I would say it is, but there are two things about it.
> First is that we don't have any
> > > hooks in CRIU code for sockets, so patching will be required.
> And the second is -- many
> > > people are asking for handling the connected unix socket, so I
> think we'd better patch
> > > criu itself to provide some sane API rather than make
> everybody invent their own plugins :)
> > >
> > > So, I see two options for such an API.
> > >
> > > The first is to extend the --ext-unix-sk to accept socket ID
> as an argument that would
> > > force even stream sockets to be marked as "external" and not
> block the dump. On restore
> > > the --ext-unix-sk with $ID would make CRIU connect the
> restored unix socket back to its
> > > original path. Optionally we can make the argument look like
> $ID:$PATH and connect to
> > > $PATH instead.
> > >
> > > The other option would be to still teach dump accept the
> --ext-unix-sk $ID parameter. On
> > > restore you can create the connection yourself and then pass
> it into CRIU with --inherit-fd
> > > argument. We already have fd inheritance for pipes and files
> [1], but we can patch CRIU
> > > to support it for sockets too.
> > >
> > > [1] http://criu.org/Inheriting_FDs_on_restore
> > >
> > > > I've tried playing around with the ext-unix-sk flag but I
> haven't quite figured anything
> > > > out yet.
> > >
> > > The --ext-unix-sk is for datagram sockets as they are
> stateless and we can just close and
> > > re-open one back w/o any risk that the peer notices it.
> > >
> > > > Any help would be appreciated. Thanks!
> > >
> > > -- Pavel
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150421/0753e8e0/attachment.html>
More information about the CRIU
mailing list