[CRIU] Reg unix stream checkpointing and other issues.

Fri Oct 24 14:03:10 PDT 2014

On 10/24/2014 03:26 PM, Pavel Emelyanov wrote:
>> I didn't get the point completely. Right now, I am thinking of local
>> machine which is both source and destination. The checkpointed
>> data has been dumped on the disk and the process is about to be
>> restored again. Later, I'll extend the work across multiple nodes i.e.
>> source and destination. 
> 
> OK, if the source and destination machine is the same box, then no problems.
> Issue will arise when source machine gives pages to destination over the
> network.
> 

Good, then I can easily use that :) for one of my test case. 

>>> Pre-restore should have performance benefits as we will avoid big portion
>>> of two stages -- fork() and memory restore -- which currently take quite
>>> a lot of time.
>>>
>>
>> I think, I am still not getting this point. Can you give me an example 
>> of how it is going to benefit. Lets take the case of memcached server
>> which is running and is checkpointed and then it is again restarted. 
>> It does have multiple threads to read/write the data to the memory 
>> as queried by the client. In this how, is the incremental approach
>> going to work? 
> 
> I'm talking about not lazy, but pre-copy migration. It will come in stages.
> 
> 1. We get all the memory from memcached and send it to destination node,
>    while the daemon itself continues running.
> 
> 2. On dst we pre-create the daemon and pre-populate it with memory. On
>    source node daemon is still running.
> 
> 3. On next step we pick the memory modified by daemon and send it to dst.
>    On dst we put the newly arrived memory in place. Daemon is, again, not
>    stopped.
> 
> 4. We repeat step 3 several times.
> 
> 5. We freeze the daemon on src, get full dump and send it on dst (modulo 
>    the pages not changed since step 4). Then we just restore what's missing
>    and resume the daemon.
> 
> W/o pre-restore step 5 would look like
> 
> 5. We create daemon and put _all_ its memory from images into respective
>    places.
> 
> It will be longer.
> 
> 

Ohh! I get it. 

>> Besides, I am thinking that lazy migration approach can be applied 
>> on the local machine as well where we are going to restore the already
> 
> I don't understand the use-case for "live migration on the local maching".
> Live migration is only valuable when we move task(s) from one node to
> another. Do you have some use-case we're unaware of?
> 

I was talking about the approach, i.e. just start the process without
memory and get the pages from the disk that has been saved at the time
of checkpointing. 

>>> Sure. What is your timezone? Mine is MSK. I think we can meet some day next week.
>>>
>>
>> Mine is EST. I can meet anytime you want to. It would be great if we can
> 
> Let's then aim at Monday 18:00 MSK (it should be 10AM). Would this be OK?
>

Yup. Thats awesome! I'll be there. 

>> as early as possible as I have some more plans as I would like to extend the
>> work to seamless kernel update, which I would also like to discuss. 
> 
> Seamless kernel update?! This thing is totally different from lazy migration,
> and it requires different APIs from the kernel :)
> 

Yeah, I do have some ideas in my mind. I have already looked at the
pram over kexec patch. I will start working on that once I am
finished with the restore. But, I would also like to get your views about
that before proceeding. 

Thanks,
Sanidhya