[CRIU] Debugging the process to restore veth in a namespace

Saied Kazemi saied at google.com
Mon Aug 17 08:52:49 PDT 2015


Haven't looked at your problem in any detail, just sharing a thought...

Is the veth end in the global namespace in a bridge?  Docker containers,
for example, have one end of the veth device in the container namespace and
the other end in the global namespace's bridge (docker0).  We extended the
--veth-pair option to accept the @bridge string appended to its argument so
that it would move the veth end to the specified bridge during restore.  It
also makes sure the interface is up.

You can see the commit message of 296129295 for additional details and try
this option if it applies to your case.

--Saied






On Sun, Aug 16, 2015 at 12:12 PM, Hui Kang <hkang.sunysb at gmail.com> wrote:

> Hi, Pavel
> I used "--veth-pair veth101=veth100" when dumping and restoring a process.
> veth101 is the device name in the process net namespace, veth100 is the
> other end which is in the criu host.
>
> After restore, I can see the ip address of veth101 is restored. However,
> the veth end in the host (veth100) is not successfully restored. By "not
> success", I mean the veth100 link is created, however, its state is DOWN
> and no IP is assigned to the restore link. Only I manually set the link
> state to UP and assigne IP, the two ends can talk to each other.
> Moreover, the link index of veth100 is not the same as when I dump the
> process. For example the index for veth101 and veth100 is 15 and 16 when I
> dump the process. After restore, veth100's index becomes 17. Is this a bug
> in CRIU? Thanks.
>
>
> - Hui
>
> Part of the restore log is below. It looks like veth100 failed to restore
> on host due to RTNETLINK file exists. But after dump the process, I do not
> see veth100 in the host.
>
> (00.004652)      1: Restoring link lo type 1
> (00.005715)      1: Restoring link veth101 type 2
> (00.005743)      1: Restoring netdev veth101 idx 33
> (00.005754)      1: Restore ll addr (62:../6) for device
> (00.006534)      1: DEBUG Skip veth101/accept_local, val =0
> (00.006562)      1: DEBUG Skip veth101/accept_redirects, val =1
> (00.006574)      1: DEBUG Skip veth101/accept_source_route, val =1
> (00.006584)      1: DEBUG Skip veth101/arp_accept, val =0
> (00.006594)      1: DEBUG Skip veth101/arp_announce, val =0
> (00.006604)      1: DEBUG Skip veth101/arp_filter, val =0
> (00.006614)      1: DEBUG Skip veth101/arp_ignore, val =0
> (00.006624)      1: DEBUG Skip veth101/arp_notify, val =0
> (00.006633)      1: DEBUG Skip veth101/bootp_relay, val =0
> (00.006644)      1: DEBUG Skip veth101/disable_policy, val =0
> (00.006653)      1: DEBUG Skip veth101/disable_xfrm, val =0
> (00.006664)      1: DEBUG Skip veth101/force_igmp_version, val =0
> (00.006674)      1: DEBUG Skip veth101/forwarding, val =1
> (00.006683)      1: DEBUG Skip veth101/igmpv2_unsolicited_report_interval,
> val =10000
> (00.006693)      1: DEBUG Skip veth101/igmpv3_unsolicited_report_interval,
> val =1000
> (00.006703)      1: DEBUG Skip veth101/log_martians, val =0
> (00.006712)      1: DEBUG Skip veth101/medium_id, val =0
> (00.006722)      1: DEBUG Skip veth101/promote_secondaries, val =0
> (00.006733)      1: DEBUG Skip veth101/proxy_arp, val =0
> (00.006742)      1: DEBUG Skip veth101/proxy_arp_pvlan, val =0
> (00.006752)      1: DEBUG Skip veth101/route_localnet, val =0
> (00.006762)      1: DEBUG Skip veth101/rp_filter, val =1
> (00.006772)      1: DEBUG Skip veth101/secure_redirects, val =1
> (00.006782)      1: DEBUG Skip veth101/send_redirects, val =1
> (00.006793)      1: DEBUG Skip veth101/shared_media, val =1
> (00.006803)      1: DEBUG Skip veth101/src_valid_mark, val =0
> (00.006814)      1: DEBUG Skip veth101/tag, val =0
> (00.006864)      1:     Running ip addr restore
> RTNETLINK answers: File exists
> RTNETLINK answers: File exists
> :
>
>
> On Mon, Aug 3, 2015 at 10:20 AM, Pavel Emelyanov <xemul at parallels.com>
> wrote:
>
>> On 07/31/2015 05:25 PM, Hui Kang wrote:
>> > Thanks for pointing out this option. I tested it to checkpoint and
>> restore my program. It seems that dumping is successful, but the restore
>> fails. The detailed log message is as follows
>> >
>> > veth100: the link in the host''s namespace
>> > veth101: the link in the child process''s namespace
>> >
>> > # criu  dump -t 3737 -vvvv --veth-pair veth101=veth100  -j
>> >
>> > ...
>> > (00.043143) Dumping pstree (pid: 3737)
>> > (00.092334) Writing stats
>> > (00.092545) Dumping finished successfully
>> >
>> >
>> > # criu restore -vvvv --veth-pair veth101=veth100
>> > (00.043539)      1: Error (tty.c:333): tty: Found slave peer index 2
>> without correspond master peer
>>
>> The -j option should be used on restore too.
>>
>>
>
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150817/e807c011/attachment.html>


More information about the CRIU mailing list