[CRIU] GSoC: Boot up week
Pavel Emelianov
xemul at virtuozzo.com
Tue May 28 11:33:25 MSK 2019
On 5/28/19 5:51 AM, abhishek dubey wrote:
> Hi Pavel,
>
> This week I will start with following task:
>
> - Pre-allocate 4Mb user buffer(or chain of buffer - questioned
> inline below)
>
> - collecting VMAs after freezing
>
> - VMAs from complete pstree will be collected at once in
> list, unlike current handling of one process at a time in pstree
The list of VMAs must be kept in per-pstree manner, otherwise you wouldn't
be able to find out the pid on which to call the vm reading syscall.
> - I will use existing function for collection -
> modification needed to skip non-readable VMAs (root user can't read such
> VMAs using process_vm_readv)
OK
> - No more injection of parasite code
Yup
> - unfreeze the process just after collection of VMAs
>
> - pstree_switch_state() will unfreeze pstree
>
>
> Is above approach fine?
Looks OK.
> Please look for inline question below :
>
> On 23/05/19 2:13 PM, Pavel Emelianov wrote:
>> On 5/22/19 11:12 AM, Radostin Stoyanov wrote:
>>> Hi Abhishek,
>>>
>>> I have some suggestions/ideas that may be useful.
>>>
>>> On 22/05/2019 01:11, Abhishek Dubey wrote:
>>>> Hi Pavel,
>>>>
>>>> I have gone through the cr_pre_dump_tasks() function tree and quite comfortable with parts of it. Compel stuff seem bit difficult to digest in one go.
>>>> I will query if stuck somewhere in code. I think we can start with design discussion.
>>>>
>>>> Some queries related to new approach:
>>>> 1) We need to replace page pipe with user-space supplied buffer. There is list of pipes in struct page_pipe. If I got it correct then, pipe buffer in the list has to be replaced with user-supplied buffer and these buffer exhibit same properties as of pipes in current implementation?
>>>>
>>> There is a prototype implementation which you can use as a starting point:
>>>
>>> https://github.com/avagin/criu/tree/process_vm_readv
>> Yup, that's the good starting point, thank you, Radostin.
> Went through pointed commit for pipe size limiting. If I am not
> mistaken, then a page_pipe could have maximum of 8 page_pipe_bufs with
> pipe size up to PIPE_MAX_SIZE each.
Yes, something like that.
>>
>>>> 2) We finalized user space buffer for process_vm_readv to be of fixed size. How do we go deciding best size (=max size of pipe)?
>>> Currently, CRIU is creating a pipe and it is continuously increasing it's buffer size (see __ppb_resize_pipe() in criu/page-pipe.c). In the case of pre-dump (or when --leave-running is used) it would be more efficient to compute the necessary memory space and allocate it prior freezing the process tree. Thus, reducing the down time during pre-copy migration.
>>>
>>> Dump is currently using chunks (see commit bb98a82) and perhaps the same idea could be applied with memory buffer(s). This reduces the required amount of memory during checkpoint (e.g. when we want to dump a process tree that occupies 90% of the available memory).
>> Agree. Let's start with the fixed-size buffers for pre-dumps and use the same size as for chunked dump mode.
>> One thing is that criu doesn't have the explicit constant for that, instead it uses several of them (max
>> number of pipes, page-alloc-costly-order, etc.) I propose not to overengineer things here (at least for now)
>> and just agree on some pre-defined constant. Say, 4Mb.
> Since we have to utilize existing xfer functions, so we need to adhere
> to "page_pipe -- page_pipe_buf" model for user buffer. In that case,
> will these 4Mb chunk be similar to page_pipe_buf of current implementation?
That's a tricky thing. Page-pipe is the description of the process pagemap with the
data sitting in file descriptors. In your case you will have the same, but the
data sitting right in the memory. So as a quick hack we can vmsplice() the local
buffer into a pipe and then feed this pipe into the page_pipe. But as a longer-term
solution we'd need to generalize the page_pipe_buf structure to allow for keeping
raw memory pointers instead of file descriptors.
>>>> 3) iovs generation for shared mapping are ignored and shared mapping is handled separately. Will new approach handle shared memory similarly?
>> We're talking about the __parasite_dump_pages_seized, this routine just ignores the shared mappings.
> Yes.
>>
>>>> 4) Freeze - collect vmas - Unfreeze : How we go about handling following events -
>>>> a) process does something such that vma gets modified
>>>> - we can't ignore such mappingsWill these 4Mb chunk be similar to page_pipe_buf of current implementation?
>> When saving memory contents you will generate a set of pagemaps. The pagemaps do _not_ coincide with
>> the collected mappings, but are those that has been successfully read. Those that were collected as
>> mappings but failed to be read should be just ignored.
>>
>> Note, that some mappings may be partially read. For those, the pagemap size should be "tuned" respectively.
> Sure!
>>
>>>> - we can't freeze single process again, becomes inconsistent with other tree processes
>> Why again? Freezing happens once.
>>
>>>> b) one of the process in pstree dies
>> That's OK, this can happen even in the current scheme.
>>
>> -- Pavel
> .
>
More information about the CRIU
mailing list