[CRIU] Process Migration Using Sockets - PATCH

Tue Sep 22 05:13:50 PDT 2015

On 09/22/2015 02:04 AM, Rodrigo Bruno wrote:
> Hi, 
> 
> sorry for taking so long for replying. See answer inline please.
> 
> On Mon, 14 Sep 2015 13:41:43 +0300
> Pavel Emelyanov <xemul at parallels.com> wrote:
> 
>> On 09/11/2015 06:38 PM, Rodrigo Bruno wrote:
>>>
>>>>
>>>>
>>>>>> Can you describe your protocol then in more details. Why I don't quite understand
>>>>>> is how you mix text info with binary data (images and pages) and how you define
>>>>>> borders between objects.
>>>>>
>>>>> Yes. Two communications can exist: writing and reading remote images. They both use the
>>>>> same protocol:
>>>>>
>>>>> 1. open socket on read or write port (both cache and proxy have one of each)
>>>>> 2. write image pathname (32 bytes)
>>>>> 3. write image namespace (32 bytes)
>>>>> 4. read image pathname (32 bytes)
>>>>> 5. read image namesapce (32 bytes)
>>>>
>>>> I see. I would still suggest to switch this header onto protobuf format so that we
>>>> could extend one later (and easily drop the 32-bytes limitation).
>>>>
>>>>> The 32 bytes is a constant. It is a limitation on the size of the names and namespaces. It 
>>>>> could be solved by adding a first field (size).
>>>>>
>>>>> Steps 4 and 5 are used to check if the image exists (if it doesn't, an error is read from the 
>>>>> socket).
>>>>>
>>>>> Then, I simply return the socket file descriptor and let criu use it for writing (dump) or 
>>>>> reading (restore).
>>>>>
>>>>> Note that I do not unpack the objects being sent (pagemap entries for example). Neither I 
>>>>> check if the file was correctly sent. When the socket FD is closed I assume that the 
>>>>> operation (reading or writing) is complete and I let the other side (restore for example)
>>>>> to check if the image is correct.
>>>>
>>>> I see one problem with it. The other side cannot always verify the image correctness. E.g.
>>>> if a single object is lost from image, this can only be found out at the actual restore
>>>> time, but not always. E.g. fdtable.img contains file descriptors. If one is lost from there
>>>> a task will be restored w/o one descriptor and criu has no glues to check this.
>>>>
>>>> We've seen this with the page-server, so to finalize the transfer we use special command
>>>> (PS_IOV_FLUSH) that requires response from the server side indicating that everything is OK.
>>>
>>> Okey, so two things to do:
>>>
>>> a- initial "handshake" with protobuf object in which
>>> 	1- the side making the request sends a protobuf object with the name and namespace and 
>>> 	2- the side answering replies with another protobuf object with a ack/nack
>>>
>>> b- final "handshake" in which
>>> 	1- the sender reports that it is done writing and
>>> 	2- the receiver answers that everything is okay.
>>
>> And, probably, c) -- anything in between that describes objects being transferred.
> 
> I already implemented protobuf objects for handling image name and namespaces.
> 
> I have a question regarding the "final handshake". It is easy to do it for the pre-dump/dump
> side where I only need to handle the situation when CRIU calls close_image.
> 
> The problem seems more difficult when we consider the restore side. To do it well (and considering
> all types of image files), we would have to consider all locations where an image is being read 
> and check for the handshake bytes. Right?
> 
> Since only the image-proxy and image-cache communicate over less reliable sockets (through a LAN or WAN),
> wouldn't it suffice to do this "final handshake" between the image-proxy and the image-cache? The 
> communication between dump/pre-dump > image-proxy and image-cache > restore is supposed to be local and
> therefore very reliable.

Yes, I agree. The local channel can be considered as safe and no afterward checks
are required. Only proxy-to-cache communication that typically occurs via TCP socket
should be post-checked.

-- Pavel