[CRIU] [PATCH v3] Add docker phaul driver

Wed Oct 21 09:59:53 PDT 2015

On Wed, Oct 21, 2015 at 11:32 AM, Pavel Emelyanov <xemul at parallels.com> wrote:
> On 10/21/2015 05:14 PM, Hui Kang wrote:
>> On Wed, Oct 21, 2015 at 8:23 AM, Pavel Emelyanov <xemul at parallels.com> wrote:
>>>> diff --git a/phaul/p_haul_iters.py b/phaul/p_haul_iters.py
>>>> index b2c76e3..f8b97db 100644
>>>> --- a/phaul/p_haul_iters.py
>>>> +++ b/phaul/p_haul_iters.py
>>>> @@ -62,6 +62,7 @@ class phaul_iter_worker:
>>>>               self.fs.set_options(opts)
>>>>               self.__force = opts["force"]
>>>>               self.pre_dump = opts["pre_dump"]
>>>> +             self.target_host_ip = opts["to"]
>>>>
>>>>       def validate_cpu(self):
>>>>               logging.info("Checking CPU compatibility")
>>>> @@ -110,6 +111,12 @@ class phaul_iter_worker:
>>>>               self.fs.set_work_dir(self.img.work_dir())
>>>>               self.fs.start_migration()
>>>>
>>>> +             # TODO: Do not do predump for docker right now. Add page-server
>>>> +             #       to docker C/R API, then we can enable the pre-dump
>>>> +             if self.htype.get_driver_name() == "docker" :
>>>> +                     logging.info("Disable pre-dump for docker")
>>>> +                     self.pre_dump = False
>>>> +
>>>
>>> No if self.htype.get_driver_name() == "docker" checks should remain in
>>> generic code, please.
>>
>> What does the "No" point to in your reply? Do you think the above code
>> is correct or not? Thanks.
>
> There should have been an "," after "no" :) IOW I think the code is correct,
> but explicit check for driver being docker should not be here.

If we remove the "if self.htype.get_driver_name() == "docker" here,
which place do you think is better to set pre-dump = faulse?

I am proposing if we could temporarily depend on the p.haul command
line before pre-dump being merged to the docker-py and docker
upstream.

That means we should enforce --no-pre-dump in the p.haul command line.

./p.haul-wrap 10.0.0.2 --no-pre-dump docker 123

The change will be made to p.haul: if type == docker, --no-pre-dump
must be set to no. Otherwise, return the help message in the command
line.

By doing so, we can remove explicit check in p_haul_iter.py. What do you think?

>
>>>
>>>>               logging.info("Checking for Dirty Tracking")
>>>>               if self.pre_dump == PRE_DUMP_AUTO_DETECT:
>>>>                       # pre-dump auto-detection
>>>> @@ -194,36 +201,41 @@ class phaul_iter_worker:
>>>>
>>>>               logging.info("Final dump and restore")
>>>>
>>>> -             self.target_host.start_iter()
>>>> -             self.img.new_image_dir()
>>>> +             if self.htype.get_driver_name() == "docker" :
>>>> +                     # call docker dump API
>>>> +                     self.htype.dump()
>>>> +                     logging.info("Dump complete")
>>>
>>> Why do you need such rough fix? Why just setting pre_dump to false doesn't help?
>>
>> Two reasons. First, this part is about the final dump. So setting
>> predump=False does not have effect here. Second, docker driver calls
>> the dump() which will call docker C/R API directly. This is different
>> than pid or openvz.
>
> OK. Then we should make the dump part be similar to how the restore one
> looks -- we should use the caller's API for everyone, not direct CRIU
> connection.

To make this happen, shall we also change the driver to type pid and
openvz. Then that seems beyhond my control : )

>
>>>
>>>> +             else:
>>>> +                     self.target_host.start_iter()
>>>> +                     self.img.new_image_dir()
>>>>
>>>> -             logging.info("\tIssuing dump command to service")
>>>> +                     logging.info("\tIssuing dump command to service")
>>>>
>>>> -             req = criu_req.make_dump_req(
>>>> -                     self.pid, self.htype, self.img, self.criu_connection, self.fs)
>>>> -             resp = self.criu_connection.send_req(req)
>>>> -             while True:
>>>> -                     if resp.type != pycriu.rpc.NOTIFY:
>>>> -                             raise Exception("Dump failed")
>>>> -
>>>> -                     if resp.notify.script == "post-dump":
>>>> -                             #
>>>> -                             # Dump is effectively over. Now CRIU
>>>> -                             # waits for us to do whatever we want
>>>> -                             # and keeps the tasks frozen.
>>>> -                             #
>>>> -                             break
>>>> +                     req = criu_req.make_dump_req(
>>>> +                             self.pid, self.htype, self.img, self.criu_connection, self.fs)
>>>> +                     resp = self.criu_connection.send_req(req)
>>>> +                     while True:
>>>> +                             if resp.type != pycriu.rpc.NOTIFY:
>>>> +                                     raise Exception("Dump failed")
>>>> +
>>>> +                             if resp.notify.script == "post-dump":
>>>> +                                     #
>>>> +                                     # Dump is effectively over. Now CRIU
>>>> +                                     # waits for us to do whatever we want
>>>> +                                     # and keeps the tasks frozen.
>>>> +                                     #
>>>> +                                     break
>>>>
>>>> -                     elif resp.notify.script == "network-lock":
>>>> -                             self.htype.net_lock()
>>>> -                     elif resp.notify.script == "network-unlock":
>>>> -                             self.htype.net_unlock()
>>>> +                             elif resp.notify.script == "network-lock":
>>>> +                                     self.htype.net_lock()
>>>> +                             elif resp.notify.script == "network-unlock":
>>>> +                                     self.htype.net_unlock()
>>>>
>>>> -                     logging.info("\t\tNotify (%s)", resp.notify.script)
>>>> -                     resp = self.criu_connection.ack_notify()
>>>> +                             logging.info("\t\tNotify (%s)", resp.notify.script)
>>>> +                             resp = self.criu_connection.ack_notify()
>>>>
>>>> -             logging.info("Dump complete")
>>>> -             self.target_host.end_iter()
>>>> +                     logging.info("Dump complete")
>>>> +                     self.target_host.end_iter()
>>>>
>>>>               #
>>>>               # Dump is complete -- go to target node,
>>>> @@ -233,8 +245,12 @@ class phaul_iter_worker:
>>>>
>>>>               logging.info("Final FS and images sync")
>>>>               self.fs.stop_migration()
>>>> -             self.img.sync_imgs_to_target(self.target_host, self.htype,
>>>> -                     self.connection.mem_sk)
>>>> +
>>>> +             if self.htype.get_driver_name() == "docker" :
>>>> +                     self.htype.send_criu_images(self.target_host_ip)
>>>
>>> The fs_haul_docker.send_criu_images() does almost the same as phaul_images.send_criu_images
>>> do, please, reuse the phaul_images class functionality for this.
>>
>> Do you mean "sync_imgs_to_target()" in imgaes.py? If so, yes I will
>> change the code to re-use sync_imgs_to_target.
>
> Yes, please.

Ok, I will make the change

- Hui

>
> -- Pavel