[CRIU] Reg unix stream checkpointing and other issues.

Fri Oct 24 11:55:06 PDT 2014

On 10/24/2014 02:34 PM, Pavel Emelyanov wrote:
>>>> - There is a possibility that a forked task might call not yet started
>>>> task. What will happen in this case?
>>>
>>> The pre-restored tree is not running, it's frozen.
>>>
>>
>> I don't think the incremental restore will have any performance benefits, whereas
>> the lazy one will definitely have. I have done this for VMs. 
> 
> Lazy will, but it has another drawback -- once the source node is not
> accessible while the destination still needs pages from it, all the
> soon-to-be-restored tasks will die.
> 

I didn't get the point completely. Right now, I am thinking of local
machine which is both source and destination. The checkpointed
data has been dumped on the disk and the process is about to be
restored again. Later, I'll extend the work across multiple nodes i.e.
source and destination. 

> Pre-restore should have performance benefits as we will avoid big portion
> of two stages -- fork() and memory restore -- which currently take quite
> a lot of time.
> 

I think, I am still not getting this point. Can you give me an example 
of how it is going to benefit. Lets take the case of memcached server
which is running and is checkpointed and then it is again restarted. 
It does have multiple threads to read/write the data to the memory 
as queried by the client. In this how, is the incremental approach
going to work? 

Besides, I am thinking that lazy migration approach can be applied 
on the local machine as well where we are going to restore the already
checkpointed process. In that case, will the userfaultfd more than sufficient?

>>> This is what we call "lazy migration" and yes, this is in our plans
>>> too :) But the existing userfaultfd + memcopy API is not enough. The
>>> latter syscall should operation on arbitrary task VM, not only on the
>>> current one.
>>>
>>
>> It would be great if you can give me some details about the memcopy API 
>> as I would like to work and develop a prototype for the whole lazy migration
>> process.
>>
>> Another question is that can I do it currently without extending the memcopy API?
> 
> I heavily doubt it. We cannot "make" restored process pull memory for itself.
> We should have a userfault-daemon that will suck pages from dst node and
> inject them into the remote processes.
>

Can't criu act as a daemon in fetching the pages from the file?

>> I would also like to discuss the model / approach for the lazy migration. If you 
>> have something in mind, it would definitely help me a lot. Are you available on
>> irc, where I can discuss the issues?
> 
> Sure. What is your timezone? Mine is MSK. I think we can meet some day next week.
>

Mine is EST. I can meet anytime you want to. It would be great if we can
as early as possible as I have some more plans as I would like to extend the
work to seamless kernel update, which I would also like to discuss. 

Thanks,
Sanidhya