[CRIU] Reg unix stream checkpointing and other issues.
Pavel Emelyanov
xemul at parallels.com
Fri Oct 24 09:14:07 PDT 2014
On 10/24/2014 05:51 PM, Sanidhya Kashyap wrote:
> Hi Pavel,
>
> On Fri, Sep 12, 2014 at 12:33 PM, Pavel Emelyanov <xemul at parallels.com> wrote:
>
>>>> We can do incremental restore, but it's quite tricky. The process
>>>> of migration would look like this then.
>>>>
>>>> 1. Get the process tree and their memory
>>>> 2. Go on restore node, fork tasks and put the memory in places
>>>> 3. Go back on source node, get the tree and changed memory
>>>> 4. Go on restore node, fixup the tree by killing died tasks
>>>> and forking the appeared ones, then update their memory
>>>> 5. Repeat steps 3 and 4 some more times
>>>>
>>>> The trickiest part is step #4. I have no nice algorithm for "fixup the tree"
>>>> step of it. Tuning up changed memory is more or less clear how to do.
>>>>
>>>
>>> So, in order to do this, do we need to get some support from the
>>> kernel or criu will be able to manage it?
>>
>> CRIU can manage it, but the algo would be quite tricky :)
>>
>
> I have been thinking of working on the incremental restore, and would
> like to contribute patches that I develop.
That's awesome!
> But, I have some questions before I decide the approach. I have
> some questions about the one that you have mentioned (above). I wanted
> to discuss in detail as I have already browsed the code.
>
> - About step 2, you have mentioned about forking tasks and dumping
> memory in places.
> Should all the processes be forked or only a subset of them.
It depends on the algorithm we develop. Maybe it would be enough to
just pre-fork only those with the most of the memory on-board. But
nonetheless, the implementation should work on any tree -- partial
or full.
> - Suppose that I fork a subset of processes and they try to access a
> memory which is shared
> between some other task that has not been forked till now. What will
> happen in that case?
Tasks on pre-restore shouldn't access any memory, they are frozen
and are controlled by CRIU waiting for the final restore to happen.
Probably you're talking about migrating tree not as a whole, but task
by task. This is another task which differs from the pre-restore.
> - Is there a possibility of having a memory that is not present for
> the task? If yes, then how will that be handled?
Right now no, but there's a work done by Andrea Arcangeli on the
userfaultd and memcopy system calls. I plan to write him an e-mail
about extending this API to fit our needs.
> - IMO, the fixup tree can be done by maintaining the whole process
> structure and we can see
> what is the difference that is existing between old and the existing
> one. Btw, how come will a task die, if that has not started yet?
I don't understand the issue. If a task is present in a pre-restore
tree, but died on source node we should just kill one on the destination.
> - There is a possibility that a forked task might call not yet started
> task. What will happen in this case?
The pre-restored tree is not running, it's frozen.
> Besides this, I was thinking of another approach using userfaultfd.
> That is fork all the tasks but
> don't dump the memory and start the process. Later, when a page is
> accessed, it will result in
> page-fault handler invocation which should be handled by criu handling
> that page. What do you think of this approach?
This is what we call "lazy migration" and yes, this is in our plans
too :) But the existing userfaultfd + memcopy API is not enough. The
latter syscall should operation on arbitrary task VM, not only on the
current one.
Thanks,
Pavel
More information about the CRIU
mailing list