<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 22, 2017 at 2:49 AM, Pavel Emelyanov <span dir="ltr"><<a href="mailto:xemul@virtuozzo.com" target="_blank">xemul@virtuozzo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">On 02/20/2017 10:47 PM, Lele Ma wrote:<br>
> On Mon, Feb 20, 2017 at 2:37 PM, Pavel Emelyanov <<a href="mailto:xemul@virtuozzo.com">xemul@virtuozzo.com</a>> wrote:<br>
>> On 02/20/2017 10:23 PM, Lele Ma wrote:<br>
>>><br>
>>> On Mon, Feb 20, 2017 at 1:16 PM, Pavel Emelyanov <<a href="mailto:xemul@virtuozzo.com">xemul@virtuozzo.com</a> <mailto:<a href="mailto:xemul@virtuozzo.com">xemul@virtuozzo.com</a>>> wrote:<br>
>>><br>
>>> On 02/19/2017 10:50 PM, Lele Ma wrote:<br>
>>> > Hi All,<br>
>>> ><br>
>>> > I am testing container live migration with this github repos <<a href="https://github.com/boucher/docker/tree/v1.10_2-16-16-experimental" rel="noreferrer" target="_blank">https://github.com/boucher/<wbr>docker/tree/v1.10_2-16-16-<wbr>experimental</a> <<a href="https://github.com/boucher/docker/tree/v1.10_2-16-16-experimental" rel="noreferrer" target="_blank">https://github.com/boucher/<wbr>docker/tree/v1.10_2-16-16-<wbr>experimental</a>>> for docker-1.10-dev. I found the container not restored exactly where it's checkpointed. For example:<br>
>>> ><br>
>>> > The container I run<br>
>>> > docker run -d busybox /bin/sh -c 'echo > /foo; max=1000000; i=0; while [ $i -lt $max ] ; do date >> /foo; date +%s >> /foo; echo "i=$i" >> /foo; i=$(expr $i + 1 ); sleep 0.0001; done'<br>
>>> ><br>
>>> > After migrated using p.haul, I got the /foo in target node:<br>
>>> > .....<br>
>>> > Sun Feb 19 03:23:13 UTC 2017<br>
>>> > 1487474593<br>
>>> > i=4247<br>
>>> > Sun Feb 19 03:23:13 UTC 2017<br>
>>> > 1487474593<br>
>>> > i=4248 -----> before migration<br>
>>> > i=7545 -----> after migartion ( it is supposed to be i=4249 )<br>
>>> > Sun Feb 19 03:23:20 UTC 2017<br>
>>> > 1487474600<br>
>>> > i=7546<br>
>>> > Sun Feb 19 03:23:20 UTC 2017<br>
>>> > 1487474600<br>
>>> > i=7547<br>
>>> > ......<br>
>>> > The printed numbers jump from 'i=4248' to 'i=7545' instead of increasing by one. It seems that it ignores<br>
>>> > some computation status of the docker containers. But I am not sure where it goes wrong. However, when I<br>
>>> > checkpoint and restore the container locally, the number increase continuously with no such jumping.<br>
>>><br>
>>> Where do you get these numbers from? Docker console or some file on disk?<br>
>>><br>
>>><br>
>>> It's from the file '/foo' inside container. ( The container is running /bin/sh -c 'echo > /foo;<br>
>>> max=1000000; i=0; while [ $i -lt $max ] ; do date >> /foo; date +%s >> /foo; echo "i=$i" >> /foo;<br>
>>> i=$(expr $i + 1 ); sleep 0.0001; done' )<br>
>><br>
>> Then this is likely a race between images sync and filesystem sync.<br>
>> You can check your /foo file on the source node right after container<br>
>> migration, it should contain the missing numbers :)<br>
>><br>
>> What p.haul do you use, btw?<br>
><br>
> Thank you. But how can we avoid the race?<br>
<br>
</div></div>Somewhere the final rsync is missing. But it's just a guess, I'd suggest<br>
that we first check whether it's really the case. Can you check the /foo<br>
files on both source and destination nodes?<br>
<span class="gmail-"><br></span></blockquote><br></div><div class="gmail_quote">Thank you for your help! If the migrated container is restored on the source node, the numbers in /foo are good. Each is added by 1 with no jumping. But when restored on the target node, we could see the jumping. So, does this mean somewhere rsync is missing? How to find which one is missing? <br><br>I found the last time it calls __run_rsync() method is right after the container is 'dumped' (use checkpoint cmd). So could here is where the race? Here is some logging info I get from the console (on source node):<br><br><div style="margin-left:40px">17:17:05.374: 101000: Final dump and restore<br>17:17:05.409: 101000: Making directory /var/local/p.haul-fs/dmp-qY7DBC-17.03.05-17.17/img/1<br>17:17:05.410: 101000: Dump docker container 64541be4a5de0b655e764088d39cc227e4153aba197c9852de3fea5cd3e2ed0e<br>17:17:05.411: 101000: /usr/bin/docker checkpoint --image-dir=/var/local/p.haul-fs/dmp-qY7DBC-17.03.05-17.17/img/1 64541be4a5de; log file: /tmp/docker_checkpoint.log<br>17:17:05.985: 101000: /usr/bin/docker checkpoint --image-dir=/var/local/p.haul-fs/dmp-qY7DBC-17.03.05-17.17/img/1 64541be4a5de; log file: /tmp/docker_checkpoint.log<br>17:17:06.024: 101000: Final FS and images sync<br>17:17:06.024: 101000: Doing final FS sync<br>17:17:06.025: 101000: calling __run_rsync(), logging file: /var/local/p.haul-fs/dmp-qY7DBC-17.03.05-17.17/rsync.log <br>17:17:08.567: 101000: Sending images to target<br>17:17:08.603: 101000: Pack<br>17:17:08.619: 101000: Add htype images<br>17:17:09.109: 101000: Asking target host to restore<br>17:17:11.887: 101000: Restored on target host<br></div><br><br></div><div class="gmail_quote">Lele <br></div><div class="gmail_quote"><br><br></div><div class="gmail_quote"><br></div></div></div>