[CRIU] Reg unix stream checkpointing and other issues.
Sanidhya Kashyap
sanidhya.gatech at gmail.com
Fri Oct 24 06:51:19 PDT 2014
Hi Pavel,
On Fri, Sep 12, 2014 at 12:33 PM, Pavel Emelyanov <xemul at parallels.com> wrote:
>>> We can do incremental restore, but it's quite tricky. The process
>>> of migration would look like this then.
>>>
>>> 1. Get the process tree and their memory
>>> 2. Go on restore node, fork tasks and put the memory in places
>>> 3. Go back on source node, get the tree and changed memory
>>> 4. Go on restore node, fixup the tree by killing died tasks
>>> and forking the appeared ones, then update their memory
>>> 5. Repeat steps 3 and 4 some more times
>>>
>>> The trickiest part is step #4. I have no nice algorithm for "fixup the tree"
>>> step of it. Tuning up changed memory is more or less clear how to do.
>>>
>>
>> So, in order to do this, do we need to get some support from the
>> kernel or criu will be able to manage it?
>
> CRIU can manage it, but the algo would be quite tricky :)
>
I have been thinking of working on the incremental restore, and would
like to contribute
patches that I develop. But, I have some questions before I decide the
approach. I have
some questions about the one that you have mentioned (above). I wanted
to discuss in
detail as I have already browsed the code.
- About step 2, you have mentioned about forking tasks and dumping
memory in places.
Should all the processes be forked or only a subset of them.
- Suppose that I fork a subset of processes and they try to access a
memory which is shared
between some other task that has not been forked till now. What will
happen in that case?
- Is there a possibility of having a memory that is not present for
the task? If yes, then how will
that be handled?
- IMO, the fixup tree can be done by maintaining the whole process
structure and we can see
what is the difference that is existing between old and the existing
one. Btw, how come will a
task die, if that has not started yet?
- There is a possibility that a forked task might call not yet started
task. What will happen in this
case?
Besides this, I was thinking of another approach using userfaultfd.
That is fork all the tasks but
don't dump the memory and start the process. Later, when a page is
accessed, it will result in
page-fault handler invocation which should be handled by criu handling
that page. What do you
think of this approach?
Thanks,
Sanidhya
More information about the CRIU
mailing list