[CRIU] Process Migration using Sockets v2 - Patch 1/2
Rodrigo Bruno
rbruno at gsd.inesc-id.pt
Sun Oct 25 11:19:12 PDT 2015
Hi, sorry for answering soo late...
On Mon, 19 Oct 2015 13:52:58 +0300
Pavel Emelyanov <xemul at parallels.com> wrote:
> On 10/19/2015 01:47 PM, Rodrigo Bruno wrote:
> > Hi,
> >
> > On Thu, 15 Oct 2015 14:27:29 +0300
> > Pavel Emelyanov <xemul at parallels.com> wrote:
> >
> >> On 10/13/2015 05:06 PM, Rodrigo Bruno wrote:
> >>
> >>
> >>>>>> And why should receiver check for any match? What can happen in the remote side,
> >>>>>> that the requestor gets different object than it asked for?
> >>>>>>
> >>>>>>> 4. Write connections only write the image header before sending the actual content.
> >>>>>>>
> >>>>>>> 5. the image header is a protobuf object that contains two strings: the image name
> >>>>>>> and the image namespace (the namespace identifies the process that created the image).
> >>>>>>>
> >>>>>>> 6. only connections between the image-proxy and the image-cache (which are TCP) check
> >>>>>>> for file boundaries. Imagine the image-proxy starts to forward an image to image-cache:
> >>>>>>> a) write image-header
> >>>>>>> b) write image size (uint64_t)
> >>>>>>
> >>>>>> Why isn't the size in the header?
> >>>>>
> >>>>> Good point. The image-header object is used in all connections (dump, proxy, cache, restore).
> >>>>> This image size is only used between the proxy and cache (because I know the size of the image).
> >>>>>
> >>>>> Maybe I can have a different image header with the size?
> >>>>
> >>>> I haven't yet fully understood the difference between criu-proxy/cache and proxy-cache
> >>>> protocols. But from what I have :) I think that it's worth having two different headers,
> >>>> one for "local" communications (criu-proxy/cache) and the other one for remote (cache-proxy).
> >>>
> >>> Yes, one header with size for proxy->cache communication, and a header without size for
> >>> all others (dump<->proxy, and restore<-cache).
> >>>
> >>> Both protocols are very similar. The only difference is that in cache<-proxy communication
> >>> I send the size of the file before the atual file.
> >>>
> >>> Example:
> >>> open connection to write
> >>> write header (name + namespace) (protobuf object)
> >>> write size uint64_t
> >>> write image
> >>> close
> >>>
> >>> The new version (discussed in these emails) will result in:
> >>> get the already open connection (inherited from the user)
> >>> write header (name + namespace + size) (protobuf object)
> >>> write image
> >>> write closing header (to replace the close)
> >>
> >> Yup. This would be simple, extendable and easy to read :)
> >>
> >>>>>>> + static char buf[4096];
> >>>>>>> + int n = 0;
> >>>>>>> + unsigned long curr = 0;
> >>>>>>> +
> >>>>>>> + for(; curr < len; ) {
> >>>>>>> + n = read(fd, buf, MIN(len - curr, 4096));
> >>>>>>
> >>>>>> This bytes skipping looks incorrect. The skip_img_bytes() is called for pages.img
> >>>>>> on low-level snapshots to correctly forward the file position. Thus, if we get here
> >>>>>> with criu restore --remote, this means that the low-level images should be already
> >>>>>> here, on the node, and there's no need in reading the data in vain.
> >>>>>
> >>>>> Sorry, I couldn't get your idea. This function replaces an lseek inside
> >>>>> skip_pagemap_pages. Since lseek does not work in sockets (as far as I know),
> >>>>> I use this function.
> >>>>
> >>>> My point is -- you shouldn't get to this place in case of remote connection at all.
> >>>> The skip_img_bytes is called in the situation when we have a stack of images and
> >>>> we read data from the top-most one and want to skip the duplicate data from the
> >>>> lower one(s).
> >>>>
> >>>> Next, the restore happens when the whole stack is already obtained and if we get to
> >>>> the skip_img_bytes routine, this means that we want to skip bytes from some image
> >>>> namespace, excluding the top-most. But the non-top images cannot sit behind the socket,
> >>>> they have already been transferred.
> >>>
> >>> Well, I added this method because it was not working and I found this could be a potential
> >>> bug of my solution (since lseek does not work for sockets).
> >>>
> >>> I don't see much difference here between files and sockets. In remote mode, we have two
> >>> sockets (for example, one from a predump pages-1.img and other from a dump pages-1.img).
> >>>
> >>> These two connections are retrieving data from the local image-cache. If you need to skip
> >>> bytes from one pages-1.img, you have to consume them from the socket.
> >>>
> >>> Right?
> >>
> >> Right, but in normal operations we do not skip bytes from the top-level image, only
> >> from the low level ones, and these (low) cannot (or can they?) sit on the proxy side
> >> at restore time.
> >
> > At restore time, all images are cached at the image-cache. They have already been
> > transferred from image-proxy to image-cache (if CRIU restore tries to open an image
> > that is not cached yet, the open call will block until the image is cached at
> > image-cache).
>
> Ah, I see. Yes, then read() solves the issue, but I consider this as quite inefficient
> way :) Can we (later) make it by a control message to the cache saying that particular
> amount of data should be thrown away from the respective image?
Yes, of course. Later we can extendend the protocol to include such request.
>
> Also note, that for _this_ usage the --auto-dedup will be must-have feature. Since if
> we live migrate a container and do even the 2nd iteration the amount of data sitting
> in the cache will be larger (sometimes significantly) than the actual container memory
> size due to duplication of the pages.img data.
Yes, I can look into that once this patch is okey.
>
> > Therefore, CRIU restore can have multiple unix socket connections to image-cache,
> > one for each pages image. If CRIU restore needs to skip bytes from a particular
> > image, it can, without interfering with other images (which come from separate
> > connections).
> >
> > It looks like this:
> >
> > CRIU Restore image-cache image-proxy
> > |<---(img1.img)- | | |
> > |<---(img2.img)- | |<-----(TCP)----|
> > |<---(img3.img)- | | |
> >
> >> And since they can't there's nothing we should read from anywhere
> >> to skip them.
> >>
> >> -- Pavel
> >
> >
>
--
Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
More information about the CRIU
mailing list