[CRIU] Debugging the process to restore veth in a namespace

Saied Kazemi saied at google.com
Tue Aug 18 13:09:50 PDT 2015


Glad it works for you :)

--Saied


On Tue, Aug 18, 2015 at 12:59 PM, Hui Kang <hkang.sunysb at gmail.com> wrote:

> Hi, Saied,
> Thanks. It works perfectly.
>
> - Hui
>
>
> On Mon, Aug 17, 2015 at 11:52 AM, Saied Kazemi <saied at google.com> wrote:
>
>> Haven't looked at your problem in any detail, just sharing a thought...
>>
>> Is the veth end in the global namespace in a bridge?  Docker containers,
>> for example, have one end of the veth device in the container namespace and
>> the other end in the global namespace's bridge (docker0).  We extended the
>> --veth-pair option to accept the @bridge string appended to its argument so
>> that it would move the veth end to the specified bridge during restore.  It
>> also makes sure the interface is up.
>>
>> You can see the commit message of 296129295 for additional details and
>> try this option if it applies to your case.
>>
>> --Saied
>>
>>
>>
>>
>>
>>
>> On Sun, Aug 16, 2015 at 12:12 PM, Hui Kang <hkang.sunysb at gmail.com>
>> wrote:
>>
>>> Hi, Pavel
>>> I used "--veth-pair veth101=veth100" when dumping and restoring a
>>> process. veth101 is the device name in the process net namespace, veth100
>>> is the other end which is in the criu host.
>>>
>>> After restore, I can see the ip address of veth101 is restored. However,
>>> the veth end in the host (veth100) is not successfully restored. By "not
>>> success", I mean the veth100 link is created, however, its state is DOWN
>>> and no IP is assigned to the restore link. Only I manually set the link
>>> state to UP and assigne IP, the two ends can talk to each other.
>>> Moreover, the link index of veth100 is not the same as when I dump the
>>> process. For example the index for veth101 and veth100 is 15 and 16 when I
>>> dump the process. After restore, veth100's index becomes 17. Is this a bug
>>> in CRIU? Thanks.
>>>
>>>
>>> - Hui
>>>
>>> Part of the restore log is below. It looks like veth100 failed to
>>> restore on host due to RTNETLINK file exists. But after dump the process, I
>>> do not see veth100 in the host.
>>>
>>> (00.004652)      1: Restoring link lo type 1
>>> (00.005715)      1: Restoring link veth101 type 2
>>> (00.005743)      1: Restoring netdev veth101 idx 33
>>> (00.005754)      1: Restore ll addr (62:../6) for device
>>> (00.006534)      1: DEBUG Skip veth101/accept_local, val =0
>>> (00.006562)      1: DEBUG Skip veth101/accept_redirects, val =1
>>> (00.006574)      1: DEBUG Skip veth101/accept_source_route, val =1
>>> (00.006584)      1: DEBUG Skip veth101/arp_accept, val =0
>>> (00.006594)      1: DEBUG Skip veth101/arp_announce, val =0
>>> (00.006604)      1: DEBUG Skip veth101/arp_filter, val =0
>>> (00.006614)      1: DEBUG Skip veth101/arp_ignore, val =0
>>> (00.006624)      1: DEBUG Skip veth101/arp_notify, val =0
>>> (00.006633)      1: DEBUG Skip veth101/bootp_relay, val =0
>>> (00.006644)      1: DEBUG Skip veth101/disable_policy, val =0
>>> (00.006653)      1: DEBUG Skip veth101/disable_xfrm, val =0
>>> (00.006664)      1: DEBUG Skip veth101/force_igmp_version, val =0
>>> (00.006674)      1: DEBUG Skip veth101/forwarding, val =1
>>> (00.006683)      1: DEBUG Skip
>>> veth101/igmpv2_unsolicited_report_interval, val =10000
>>> (00.006693)      1: DEBUG Skip
>>> veth101/igmpv3_unsolicited_report_interval, val =1000
>>> (00.006703)      1: DEBUG Skip veth101/log_martians, val =0
>>> (00.006712)      1: DEBUG Skip veth101/medium_id, val =0
>>> (00.006722)      1: DEBUG Skip veth101/promote_secondaries, val =0
>>> (00.006733)      1: DEBUG Skip veth101/proxy_arp, val =0
>>> (00.006742)      1: DEBUG Skip veth101/proxy_arp_pvlan, val =0
>>> (00.006752)      1: DEBUG Skip veth101/route_localnet, val =0
>>> (00.006762)      1: DEBUG Skip veth101/rp_filter, val =1
>>> (00.006772)      1: DEBUG Skip veth101/secure_redirects, val =1
>>> (00.006782)      1: DEBUG Skip veth101/send_redirects, val =1
>>> (00.006793)      1: DEBUG Skip veth101/shared_media, val =1
>>> (00.006803)      1: DEBUG Skip veth101/src_valid_mark, val =0
>>> (00.006814)      1: DEBUG Skip veth101/tag, val =0
>>> (00.006864)      1:     Running ip addr restore
>>> RTNETLINK answers: File exists
>>> RTNETLINK answers: File exists
>>> :
>>>
>>>
>>> On Mon, Aug 3, 2015 at 10:20 AM, Pavel Emelyanov <xemul at parallels.com>
>>> wrote:
>>>
>>>> On 07/31/2015 05:25 PM, Hui Kang wrote:
>>>> > Thanks for pointing out this option. I tested it to checkpoint and
>>>> restore my program. It seems that dumping is successful, but the restore
>>>> fails. The detailed log message is as follows
>>>> >
>>>> > veth100: the link in the host''s namespace
>>>> > veth101: the link in the child process''s namespace
>>>> >
>>>> > # criu  dump -t 3737 -vvvv --veth-pair veth101=veth100  -j
>>>> >
>>>> > ...
>>>> > (00.043143) Dumping pstree (pid: 3737)
>>>> > (00.092334) Writing stats
>>>> > (00.092545) Dumping finished successfully
>>>> >
>>>> >
>>>> > # criu restore -vvvv --veth-pair veth101=veth100
>>>> > (00.043539)      1: Error (tty.c:333): tty: Found slave peer index 2
>>>> without correspond master peer
>>>>
>>>> The -j option should be used on restore too.
>>>>
>>>>
>>>
>>> _______________________________________________
>>> CRIU mailing list
>>> CRIU at openvz.org
>>> https://lists.openvz.org/mailman/listinfo/criu
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150818/78264b7d/attachment-0001.html>


More information about the CRIU mailing list