[CRIU] [PATCH 14/15] restorer: rework unmaping old VMA-s (v3)

Pavel Emelyanov xemul at parallels.com
Tue Oct 7 02:05:43 PDT 2014


On 10/06/2014 11:33 PM, Christopher Covington wrote:
> Hi Andrey, Pavel,
> 
> On 09/23/2013 06:33 AM, Andrey Vagin wrote:
>> All process VMA-s are in "premmaped area". All restorer stuff are in
>> bootstap "area", so we have two areas.
>>
>> So we don't need to unmap extra VMA-s one by one. We can call munmap
>> three times for the region before the first area, for the hole between
>> areas and for the region after the second area.
>>
>> The old scheme didn't work, because the list of VMA-s can be changed
>> after collecting. It can be due to memory allocations by libc or due to
>> increased stack.
> 
>> diff --git a/pie/restorer.c b/pie/restorer.c
>> index 8e43609..59b801f 100644
>> --- a/pie/restorer.c
>> +++ b/pie/restorer.c
>> @@ -524,6 +524,51 @@ void __export_unmap(void)
>>  }
>>  
>>  /*
>> + * This function unmaps all VMAs, which don't belong to
>> + * the restored process or the restorer
>> + */
>> +static int unmap_old_vmas(void *premmapped_addr, unsigned long premmapped_len,
>> +		      void *bootstrap_start, unsigned long bootstrap_len)
>> +{
>> +	unsigned long s1, s2;
>> +	void *p1, *p2;
>> +	int ret;
>> +
>> +	if ((void *) premmapped_addr < bootstrap_start) {
>> +		p1 = premmapped_addr;
>> +		s1 = premmapped_len;
>> +		p2 = bootstrap_start;
>> +		s2 = bootstrap_len;
>> +	} else {
>> +		p2 = premmapped_addr;
>> +		s2 = premmapped_len;
>> +		p1 = bootstrap_start;
>> +		s1 = bootstrap_len;
>> +	}
>> +
>> +	ret = sys_munmap(NULL, p1 - NULL);
>> +	if (ret) {
>> +		pr_err("Unable to unmap (%p-%p): %d\n", NULL, p1, ret);
>> +		return -1;
>> +	}
>> +
>> +	ret = sys_munmap(p1 + s1, p2 - (p1 + s1));
>> +	if (ret) {
>> +		pr_err("Unable to unmap (%p-%p): %d\n", p1 + s1, p2, ret);
>> +		return -1;
>> +	}
>> +
>> +	ret = sys_munmap(p2 + s2, (void *) TASK_SIZE - (p2 + s2));
> 
> Experimenting with various kernel configurations on AArch64 such as 64K pages
> (which change the default VA_BITS and therefore TASK_SIZE), it has become
> apparent to me that TASK_SIZE as used here cannot be a compile-time constant
> if we are to have one AArch64 CRIU binary that works regardless of the kernel
> configuration it is paired with.
> 
> Currently, the shift for TASK_SIZE could be 39, 42, or 48. What do you all
> think is the best way to handle this? Return -1 if unmapping up to bit 39
> fails, but just give a debug print if unmapping between bits 39 and 42 or bits
> 42 and 48 fails? Is there an existing /proc entry or sysconf() or
> prctl(PR_GET_MM, ...) to determine task size dynamically that I've overlooked?

Presumably a hack, but when reading the /proc/$pid/pagemap the EOF would (should)
occur when hitting the TASK_SIZE. If this is true, we can estimate this value
in the kerndat.c on criu start and use this as variable.

BTW, on x86_64 this value is constant, so can we have the TASK_SIZE remain such
on x86 and turn into variable on arm?

Thanks,
Pavel

> If not, should I propose one? Should I try to probe the value with mmap calls
> or similar?
> 
>> +	if (ret) {
>> +		pr_err("Unable to unmap (%p-%p): %d\n",
>> +				p2 + s2, (void *)TASK_SIZE, ret);
>> +		return -1;
>> +	}
>> +
>> +	return 0;
>> +}
> 
> Thanks,
> Christopher
> 



More information about the CRIU mailing list