[CRIU] Debugging the process to restore veth in a namespace

Hui Kang hkang.sunysb at gmail.com
Tue Aug 18 12:59:45 PDT 2015


Hi, Saied,
Thanks. It works perfectly.

- Hui

On Mon, Aug 17, 2015 at 11:52 AM, Saied Kazemi <saied at google.com> wrote:

> Haven't looked at your problem in any detail, just sharing a thought...
>
> Is the veth end in the global namespace in a bridge?  Docker containers,
> for example, have one end of the veth device in the container namespace and
> the other end in the global namespace's bridge (docker0).  We extended the
> --veth-pair option to accept the @bridge string appended to its argument so
> that it would move the veth end to the specified bridge during restore.  It
> also makes sure the interface is up.
>
> You can see the commit message of 296129295 for additional details and try
> this option if it applies to your case.
>
> --Saied
>
>
>
>
>
>
> On Sun, Aug 16, 2015 at 12:12 PM, Hui Kang <hkang.sunysb at gmail.com> wrote:
>
>> Hi, Pavel
>> I used "--veth-pair veth101=veth100" when dumping and restoring a
>> process. veth101 is the device name in the process net namespace, veth100
>> is the other end which is in the criu host.
>>
>> After restore, I can see the ip address of veth101 is restored. However,
>> the veth end in the host (veth100) is not successfully restored. By "not
>> success", I mean the veth100 link is created, however, its state is DOWN
>> and no IP is assigned to the restore link. Only I manually set the link
>> state to UP and assigne IP, the two ends can talk to each other.
>> Moreover, the link index of veth100 is not the same as when I dump the
>> process. For example the index for veth101 and veth100 is 15 and 16 when I
>> dump the process. After restore, veth100's index becomes 17. Is this a bug
>> in CRIU? Thanks.
>>
>>
>> - Hui
>>
>> Part of the restore log is below. It looks like veth100 failed to restore
>> on host due to RTNETLINK file exists. But after dump the process, I do not
>> see veth100 in the host.
>>
>> (00.004652)      1: Restoring link lo type 1
>> (00.005715)      1: Restoring link veth101 type 2
>> (00.005743)      1: Restoring netdev veth101 idx 33
>> (00.005754)      1: Restore ll addr (62:../6) for device
>> (00.006534)      1: DEBUG Skip veth101/accept_local, val =0
>> (00.006562)      1: DEBUG Skip veth101/accept_redirects, val =1
>> (00.006574)      1: DEBUG Skip veth101/accept_source_route, val =1
>> (00.006584)      1: DEBUG Skip veth101/arp_accept, val =0
>> (00.006594)      1: DEBUG Skip veth101/arp_announce, val =0
>> (00.006604)      1: DEBUG Skip veth101/arp_filter, val =0
>> (00.006614)      1: DEBUG Skip veth101/arp_ignore, val =0
>> (00.006624)      1: DEBUG Skip veth101/arp_notify, val =0
>> (00.006633)      1: DEBUG Skip veth101/bootp_relay, val =0
>> (00.006644)      1: DEBUG Skip veth101/disable_policy, val =0
>> (00.006653)      1: DEBUG Skip veth101/disable_xfrm, val =0
>> (00.006664)      1: DEBUG Skip veth101/force_igmp_version, val =0
>> (00.006674)      1: DEBUG Skip veth101/forwarding, val =1
>> (00.006683)      1: DEBUG Skip
>> veth101/igmpv2_unsolicited_report_interval, val =10000
>> (00.006693)      1: DEBUG Skip
>> veth101/igmpv3_unsolicited_report_interval, val =1000
>> (00.006703)      1: DEBUG Skip veth101/log_martians, val =0
>> (00.006712)      1: DEBUG Skip veth101/medium_id, val =0
>> (00.006722)      1: DEBUG Skip veth101/promote_secondaries, val =0
>> (00.006733)      1: DEBUG Skip veth101/proxy_arp, val =0
>> (00.006742)      1: DEBUG Skip veth101/proxy_arp_pvlan, val =0
>> (00.006752)      1: DEBUG Skip veth101/route_localnet, val =0
>> (00.006762)      1: DEBUG Skip veth101/rp_filter, val =1
>> (00.006772)      1: DEBUG Skip veth101/secure_redirects, val =1
>> (00.006782)      1: DEBUG Skip veth101/send_redirects, val =1
>> (00.006793)      1: DEBUG Skip veth101/shared_media, val =1
>> (00.006803)      1: DEBUG Skip veth101/src_valid_mark, val =0
>> (00.006814)      1: DEBUG Skip veth101/tag, val =0
>> (00.006864)      1:     Running ip addr restore
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>> :
>>
>>
>> On Mon, Aug 3, 2015 at 10:20 AM, Pavel Emelyanov <xemul at parallels.com>
>> wrote:
>>
>>> On 07/31/2015 05:25 PM, Hui Kang wrote:
>>> > Thanks for pointing out this option. I tested it to checkpoint and
>>> restore my program. It seems that dumping is successful, but the restore
>>> fails. The detailed log message is as follows
>>> >
>>> > veth100: the link in the host''s namespace
>>> > veth101: the link in the child process''s namespace
>>> >
>>> > # criu  dump -t 3737 -vvvv --veth-pair veth101=veth100  -j
>>> >
>>> > ...
>>> > (00.043143) Dumping pstree (pid: 3737)
>>> > (00.092334) Writing stats
>>> > (00.092545) Dumping finished successfully
>>> >
>>> >
>>> > # criu restore -vvvv --veth-pair veth101=veth100
>>> > (00.043539)      1: Error (tty.c:333): tty: Found slave peer index 2
>>> without correspond master peer
>>>
>>> The -j option should be used on restore too.
>>>
>>>
>>
>> _______________________________________________
>> CRIU mailing list
>> CRIU at openvz.org
>> https://lists.openvz.org/mailman/listinfo/criu
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150818/e57d429a/attachment.html>


More information about the CRIU mailing list