[CRIU] [Users] socket will take at least 0.5 seconds to recovery after docker restore done

Yanbao Cui yygcui at gmail.com
Fri Jul 17 17:50:32 PDT 2015


yeah, but i did't use Docker 1.5 before, so I don't know if it is ok.

Actually I think the network is restored by CRIU, rather than create a new
one by Docker.

In my logic, we only rebuild the container object in Docker daemon, the
entire container is restored by CRIU, include processes, network, etc.

So I think the problem is NOT caused by Docker, as I describe in the last
mails that after restore the docker container is work well except the
_existing_ socket restored hang at least 0.5 seconds

On Sat, Jul 18, 2015 at 12:30 AM, Saied Kazemi <saied at google.com> wrote:

> On Fri, Jul 17, 2015 at 8:50 AM, Yanbao Cui <yygcui at gmail.com> wrote:
>
> I use the latter one. And we integrade C/R functionality into Docker based
>> on https://github.com/SaiedKazemi/docker/wiki
>>
>
>  So, you rebased Docker 1.5 code to 1.6?  Did you see the issue in Docker
> 1.5?
>
>
> And I found there is another one based on Docker 1.7
>>
>
> Up until Docker 1.5, the network code was both in libcontainer and in the
> Docker engine.  Now all network logic is in libnetwork, so there's no point
> spending time on older versions.  Unfortunately I haven't had time yet to
> familiarize myself with the new code.  Going forward, I suggest that you
> use the Docker 1.7.  It's a rebase of 1.5 to the head and is under active
> development by Ross Boucher (rboucher at gmail.com) and other community
> members as I am sure you know.
>
>
>
>> Did you guys test it and focus on the time consumed?
>>
>
> No my concentration was on getting the network to restore successfully.
> Didn't make any time measurements.
>
> --Saied
>
>
>
>> On Fri, Jul 17, 2015 at 10:45 PM, Saied Kazemi <saied at google.com> wrote:
>>
>>> Are you doing external checkpoint restore, calling CRIU directly to dump
>>> and restore the container, or are you using native "docker checkpoint" and
>>> "docker restore" commands?  If latter, did you integrate C/R functionality
>>> into Docker yourself?
>>>
>>> --Saied
>>>
>>>
>>> On Fri, Jul 17, 2015 at 5:56 AM, Yanbao Cui <yygcui at gmail.com> wrote:
>>>
>>>> I use docker 1.6.0 and 1.6.2, they all have this problem.
>>>>
>>>> the needed files are shared via NFS.
>>>>
>>>> On Fri, Jul 17, 2015 at 11:15 AM, Saied Kazemi <saied at google.com>
>>>> wrote:
>>>>
>>>>> Which Docker version are you using to checkpoint and restore your
>>>>> containers?  Also, for migration, are you manually copying the container to
>>>>> a target machine?
>>>>>
>>>>> --Saied
>>>>>
>>>>>
>>>>> On Tue, Jul 14, 2015 at 7:36 AM, Yanbao Cui <yygcui at gmail.com> wrote:
>>>>>
>>>>>> Correct my reply:
>>>>>>
>>>>>> _existing_ migrated connections hang.
>>>>>>
>>>>>> New connection (here I mean new socket or a new process, not as like
>>>>>> reconnection manually) is OK
>>>>>>
>>>>>>
>>>>>> Yanbao Cui <yygcui at gmail.com>于2015年7月14日 周二 22:07写道:
>>>>>>
>>>>>>> _existing_ migrated connections hang.
>>>>>>>
>>>>>>> New connection is OK
>>>>>>>
>>>>>>> Pavel Emelyanov <xemul at parallels.com>于2015年7月14日 周二 21:59写道:
>>>>>>>
>>>>>>>> On 07/14/2015 04:43 PM, Yanbao Cui wrote:
>>>>>>>> > Server is working always and waiting. It seems the client, which
>>>>>>>> is in the container, cannot send data out after restored.
>>>>>>>> >
>>>>>>>> > For TCP, yeah, the client try to reconnect manually.
>>>>>>>>
>>>>>>>> You mean that after restore new connect()-s hang for a while? Why
>>>>>>>> do these connect()-s happen?
>>>>>>>> Or _existing_ migrated connections hang?
>>>>>>>>
>>>>>>>> > The delay is happened after restore successful, although the
>>>>>>>> network is recovered
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>>于2015年7月14日
>>>>>>>> 周二 21:31写道:
>>>>>>>> >
>>>>>>>> >     On 07/14/2015 04:15 PM, Yanbao Cui wrote:
>>>>>>>> >     > Sorry for mistake.
>>>>>>>> >     > For UDP, I mean the sever can receive the packet from
>>>>>>>> client again.
>>>>>>>> >
>>>>>>>> >     So where's the 0.5 seconds delay? Server sleeps and doesn't
>>>>>>>> wake up, packets
>>>>>>>> >     do not reach the server or something else?
>>>>>>>> >
>>>>>>>> >     > Actually, I have analysis the tcpdump output, in my case,
>>>>>>>> the client try to reconnect
>>>>>>>> >     > to the server again, but can not receive SYN+ACK, so it
>>>>>>>> re-transmission after 1 second
>>>>>>>> >     > according to the client rule, and then try again.
>>>>>>>> >
>>>>>>>> >     During migration we don't reconnect TCP (with regular SYN,
>>>>>>>> SYNACK, ACK sequence),
>>>>>>>> >     do you reconnect them manually?
>>>>>>>> >
>>>>>>>> >     -- Pavel
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> CRIU mailing list
>>>>>> CRIU at openvz.org
>>>>>> https://lists.openvz.org/mailman/listinfo/criu
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>> Cui Yanbao | 崔言宝
>>>> --
>>>> 龍生玖天,豈能安於凡塵!
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards
>> Cui Yanbao | 崔言宝
>> --
>> 龍生玖天,豈能安於凡塵!
>>
>
>


-- 
Best Regards
Cui Yanbao | 崔言宝
--
龍生玖天,豈能安於凡塵!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150718/ffc674c2/attachment.html>


More information about the CRIU mailing list