[CRIU] lxc-checkpoint 1.1.5 works with criu 1.6.1 but not master

Thu Jan 14 07:20:04 PST 2016

On 01/14/2016 05:46 PM, Adrian Reber wrote:
> On Thu, Jan 14, 2016 at 05:41:28PM +0300, Pavel Emelyanov wrote:
>> On 01/14/2016 03:07 PM, Adrian Reber wrote:
>>> On Wed, Jan 13, 2016 at 04:18:31PM +0300, Pavel Emelyanov wrote:
>>>> On 01/07/2016 01:04 PM, Adrian Reber wrote:
>>>>> Hello Tycho,
>>>>>
>>>>> thanks for your answers.
>>>>>
>>>>> On Wed, Jan 06, 2016 at 07:15:25AM -0700, Tycho Andersen wrote:
>>>>>> Hi Adrian,
>>>>>>
>>>>>> On Tue, Jan 05, 2016 at 06:47:55PM +0100, Adrian Reber wrote:
>>>>>>> Running lxc-checkpoint works with CRIU 1.6.1 but not with today's
>>>>>>> master.
>>>>>>>
>>>>>>> I get the following dump.log with today's master:
>>>>>>>
>>>>>>> (00.000250) Probing sock diag modules
>>>>>>> (00.000280) Done probing
>>>>>>> (00.000283) ========================================
>>>>>>> (00.000286) Dumping processes (pid: 10794)
>>>>>>> (00.000287) ========================================
>>>>>>> (00.000289) Running pre-dump scripts
>>>>>>> (00.000315) Found anon-shmem device at 4
>>>>>>> (00.000321) Reset 16059's dirty tracking
>>>>>>> (00.000329) Warn  (mem.c:56): Can't reset 16059's dirty memory tracker (22)
>>>>>>> (00.000341) Unlock network
>>>>>>> (00.000347) Unfreezing tasks into 1
>>>>>>> (00.000352) Error (cr-dump.c:1578): Dumping FAILED.
>>>>>>
>>>>>> I've not seen this before. Based on a quick glance through the source,
>>>>>> looks like the write() to /proc/pid/clear_refs is failing with EINVAL,
>>>>>> which probably means your kernel is too old. Seems like this shouldn't
>>>>>> be a fatal failure, though, as lxc-checkpoint doesn't try to use any
>>>>>> memory tracking features.
>>>>>
>>>>> I have always seen the dirty memory tracking warning, but as it is a
>>>>> warning it has never been a problem before. It seems, however, that with
>>>>> the following commit:
>>>>>
>>>>> commit d10835c4ee0d0b1881b926708dee9877f5fb294d
>>>>> Author: Pavel Emelyanov <xemul at parallels.com>
>>>>> Date:   Tue Dec 15 22:25:09 2015 +0300
>>>>>
>>>>>     dump: Dont read prohibited kernel files
>>>>>
>>>>> criu now just aborts. Reverting this commits 'fixes' the broken master
>>>>> behaviour.
>>>>
>>>> :(
>>>>
>>>> Would you check whether the patch titled 
>>>> "[PATCH] kdat: Handle pagemaps with zeroed pfns"
>>>> from the mailing list fixes it?
>>>
>>> Unfortunately not. I had to change the last line of context of that
>>> patch to get it applied, but a simple dump still fails:
>>>
>>> # ./criu dump -D /tmp/3 -j -t `pidof minimal`  -v -v -v -v
>>> (00.000035) Probing sock diag modules
>>> (00.000072) Done probing
>>> (00.000074) ========================================
>>> (00.000076) Dumping processes (pid: 18737)
>>> (00.000078) ========================================
>>> (00.000081) Running pre-dump scripts
>>> (00.000106) Pagemap is fully functional
>>> (00.000136) Found anon-shmem device at 4
>>> (00.000141) Reset 20624's dirty tracking
>>> (00.000147) Warn  (mem.c:56): Can't reset 20624's dirty memory tracker (22)
>>
>> Hm... Does your kernel lack support for soft-dirty tracking at all?
> 
> Yes. No soft-dirty tracking for me. But that was no problem until now.

OK :) My fault, yes.

Would you then apply the patch I've mentioned earlier, then this one:

diff --git a/mem.c b/mem.c
index 92e37f3..f23e6e9 100644
--- a/mem.c
+++ b/mem.c
@@ -50,15 +50,20 @@ int do_task_reset_dirty_track(int pid)
                return errno == EACCES ? 1 : -1;
 
        ret = write(fd, cmd, sizeof(cmd));
-       close(fd);
-
        if (ret < 0) {
-               pr_warn("Can't reset %d's dirty memory tracker (%d)\n", pid, errno);
-               return -1;
+               if (errno == EINVAL) /* No clear-soft-dirty in kernel */
+                       ret = 1;
+               else {
+                       pr_perror("Can't reset %d's dirty memory tracker (%d)\n", pid, errno);
+                       ret = -1;
+               }
+       } else {
+               pr_info(" ... done\n");
+               ret = 0;
        }
 
-       pr_info(" ... done\n");
-       return 0;
+       close(fd);
+       return ret;
 }
 
 unsigned int dump_pages_args_size(struct vm_area_list *vmas)

and check again?

-- Pavel