<div dir="ltr">
<p class=""><span class="">In our case, the peer will be gone, and the restored process is in the middle of a blocking read. As such the ideal for us is to receive an error for that read (in other words, abort the connection immediately/rst). We'd also be fine with having to specify this specific tcp connection/file descriptor so it knows to do this on restore.</span></p></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 21, 2015 at 12:03 PM, Pavel Emelyanov <span dir="ltr"><<a href="mailto:xemul@parallels.com" target="_blank">xemul@parallels.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 04/21/2015 09:54 PM, Ross Boucher wrote:<br>
> The tcp restore relies on the other side of the tcp connection still being open<br>
> though, right?<br>
<br>
</span>Well, no. The TCP repair we use doesn't care about it. Just once we did the restore<br>
and _if_ the peer is not there, the remote kernel would respond with RST on _any_<br>
packet from restored socket and the connection would abort.<br>
<br>
So if we're talking about restore procedure only -- then criu doesn't care. If we're<br>
talking about "successful restore" as the result user wants, then yes, peer should<br>
still be alive and criu does everything it can to make it such.<br>
<span class=""><br>
> In my case that won't be very easy. It would be much easier for me<br>
> to just re-establish the connection if I could notify myself somehow that the<br>
> process is restored. (I'm playing around right now with running a timer in another<br>
> thread to try and do that...)<br>
<br>
</span>Unix sockets are not the same as TCP I would say. But, since you're OK with connecting<br>
the socket back upon restore, then we can teach CRIU do all this for you.<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Pavel<br>
</font></span><span class="im HOEnZb"><br>
> On Tue, Apr 21, 2015 at 11:52 AM, Pavel Emelyanov <<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>>> wrote:<br>
><br>
> On 04/21/2015 07:46 PM, Ross Boucher wrote:<br>
> > Thanks, this is all very interesting. How does the story change for tcp sockets?<br>
><br>
> For TCP we can "lock" the connection so that the peer doesn't see the<br>
> socket gets closed. We either put iptables rule that blocks all the<br>
> packets or (in case of containers) unplug container's virtual NIC from<br>
> the network. So while the connection is locked we can kill the socket,<br>
> then create it back. And the TCP-repair thing helps us "connect" the<br>
> socket back w/o actually doing the connect.<br>
><br>
> For unix socket we don't have ability to "lock" the connection in the<br>
> first place. So once we dumped the task we cannot keep peer from noticing<br>
> this. This thing was the main reason for not implementing this.<br>
><br>
> -- Pavel<br>
><br>
</span><div class="HOEnZb"><div class="h5">> > On Tue, Apr 21, 2015 at 5:03 AM, Pavel Emelyanov <<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>>>> wrote:<br>
> ><br>
> > On 04/19/2015 04:48 AM, Ross Boucher wrote:<br>
> > > Hey everyone,<br>
> ><br>
> > Hi, Ross.<br>
> ><br>
> > > I've been trying to figure out both what happens when you checkpoint an open socket<br>
> > > and what my options are for restoring that socket (or maybe doing something else at<br>
> > > that point in time). It might be best to just describe the program I have and what<br>
> > > I want to accomplish.<br>
> > ><br>
> > > I have two programs communicating over a socket. Program A opens a socket and listens<br>
> > > for connections, and then program B connects to it. They essentially exchange messages<br>
> > > forever in a pattern something like:<br>
> > ><br>
> > > A -> B send next message<br>
> > > B -> A ok, here's the next message<br>
> > ><br>
> > > Obviously, in between, A performs some actions. The goal is to checkpoint A after each<br>
> > > message is processed and before the next is received (while leaving the process<br>
> > > running), so that we can restore to any previous state and reprocess possibly changed<br>
> > > messages.<br>
> ><br>
> > First thing that comes to mind is that --track-mem thing definitely makes sense for<br>
> > such frequent C/R-s. But that's a side note :)<br>
> ><br>
> > > It's completely fine for our use case to have to re-establish that socket connection,<br>
> > > we don't actually need or want to try and magically use the same socket (since program<br>
> > > B has probably moved on to other things in between).<br>
> ><br>
> > Hm.. OK.<br>
> ><br>
> > > Is this a use case for a criu plugin?<br>
> ><br>
> > Well, I would say it is, but there are two things about it. First is that we don't have any<br>
> > hooks in CRIU code for sockets, so patching will be required. And the second is -- many<br>
> > people are asking for handling the connected unix socket, so I think we'd better patch<br>
> > criu itself to provide some sane API rather than make everybody invent their own plugins :)<br>
> ><br>
> > So, I see two options for such an API.<br>
> ><br>
> > The first is to extend the --ext-unix-sk to accept socket ID as an argument that would<br>
> > force even stream sockets to be marked as "external" and not block the dump. On restore<br>
> > the --ext-unix-sk with $ID would make CRIU connect the restored unix socket back to its<br>
> > original path. Optionally we can make the argument look like $ID:$PATH and connect to<br>
> > $PATH instead.<br>
> ><br>
> > The other option would be to still teach dump accept the --ext-unix-sk $ID parameter. On<br>
> > restore you can create the connection yourself and then pass it into CRIU with --inherit-fd<br>
> > argument. We already have fd inheritance for pipes and files [1], but we can patch CRIU<br>
> > to support it for sockets too.<br>
> ><br>
> > [1] <a href="http://criu.org/Inheriting_FDs_on_restore" target="_blank">http://criu.org/Inheriting_FDs_on_restore</a><br>
> ><br>
> > > I've tried playing around with the ext-unix-sk flag but I haven't quite figured anything<br>
> > > out yet.<br>
> ><br>
> > The --ext-unix-sk is for datagram sockets as they are stateless and we can just close and<br>
> > re-open one back w/o any risk that the peer notices it.<br>
> ><br>
> > > Any help would be appreciated. Thanks!<br>
> ><br>
> > -- Pavel<br>
> ><br>
> ><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div>