[CRIU] [P.haul + Docker] Missing Running States During Live Migration of Container

Pavel Emelyanov xemul at virtuozzo.com
Tue Feb 21 23:49:29 PST 2017


On 02/20/2017 10:47 PM, Lele Ma wrote:
> On Mon, Feb 20, 2017 at 2:37 PM, Pavel Emelyanov <xemul at virtuozzo.com> wrote:
>> On 02/20/2017 10:23 PM, Lele Ma wrote:
>>>
>>> On Mon, Feb 20, 2017 at 1:16 PM, Pavel Emelyanov <xemul at virtuozzo.com <mailto:xemul at virtuozzo.com>> wrote:
>>>
>>>     On 02/19/2017 10:50 PM, Lele Ma wrote:
>>>     > Hi All,
>>>     >
>>>     > I am testing container live migration with this github repos <https://github.com/boucher/docker/tree/v1.10_2-16-16-experimental <https://github.com/boucher/docker/tree/v1.10_2-16-16-experimental>> for docker-1.10-dev. I found the container not restored exactly where it's checkpointed. For example:
>>>     >
>>>     > The container I run
>>>     >      docker run  -d busybox  /bin/sh -c 'echo > /foo; max=1000000; i=0; while [ $i -lt $max ] ; do date >> /foo; date +%s >> /foo; echo "i=$i" >> /foo; i=$(expr $i + 1 ); sleep 0.0001; done'
>>>     >
>>>     > After migrated using p.haul, I got the /foo in target node:
>>>     > .....
>>>     > Sun Feb 19 03:23:13 UTC 2017
>>>     > 1487474593
>>>     > i=4247
>>>     > Sun Feb 19 03:23:13 UTC 2017
>>>     > 1487474593
>>>     > i=4248                       -----> before migration
>>>     > i=7545                       -----> after migartion ( it is supposed to be i=4249 )
>>>     > Sun Feb 19 03:23:20 UTC 2017
>>>     > 1487474600
>>>     > i=7546
>>>     > Sun Feb 19 03:23:20 UTC 2017
>>>     > 1487474600
>>>     > i=7547
>>>     > ......
>>>     > The printed numbers jump from 'i=4248' to 'i=7545' instead of increasing by one. It seems that it ignores
>>>     > some computation status of the docker containers. But I am not sure where it goes wrong. However, when I
>>>     > checkpoint and restore the container locally, the number increase continuously with no such jumping.
>>>
>>>     Where do you get these numbers from? Docker console or some file on disk?
>>>
>>>
>>> It's from the file '/foo' inside container. ( The container is running /bin/sh -c 'echo > /foo;
>>> max=1000000; i=0; while [ $i -lt $max ] ; do date >> /foo; date +%s >> /foo; echo "i=$i" >> /foo;
>>> i=$(expr $i + 1 ); sleep 0.0001; done' )
>>
>> Then this is likely a race between images sync and filesystem sync.
>> You can check your /foo file on the source node right after container
>> migration, it should contain the missing numbers :)
>>
>> What p.haul do you use, btw?
> 
> Thank you. But how can we avoid the race?

Somewhere the final rsync is missing. But it's just a guess, I'd suggest
that we first check whether it's really the case. Can you check the /foo
files on both source and destination nodes?

> I am using this repo from
> Ross Boucher: https://github.com/boucher/p.haul/tree/docker-1.10

Ah :) That's Ross' fork. Let's ask Ross to join us in this discussion.

-- Pavel



More information about the CRIU mailing list