[CRIU] Reg unix stream checkpointing and other issues.
Sanidhya Kashyap
sanidhya.gatech at gmail.com
Fri Oct 24 14:03:10 PDT 2014
On 10/24/2014 03:26 PM, Pavel Emelyanov wrote:
>> I didn't get the point completely. Right now, I am thinking of local
>> machine which is both source and destination. The checkpointed
>> data has been dumped on the disk and the process is about to be
>> restored again. Later, I'll extend the work across multiple nodes i.e.
>> source and destination.
>
> OK, if the source and destination machine is the same box, then no problems.
> Issue will arise when source machine gives pages to destination over the
> network.
>
Good, then I can easily use that :) for one of my test case.
>>> Pre-restore should have performance benefits as we will avoid big portion
>>> of two stages -- fork() and memory restore -- which currently take quite
>>> a lot of time.
>>>
>>
>> I think, I am still not getting this point. Can you give me an example
>> of how it is going to benefit. Lets take the case of memcached server
>> which is running and is checkpointed and then it is again restarted.
>> It does have multiple threads to read/write the data to the memory
>> as queried by the client. In this how, is the incremental approach
>> going to work?
>
> I'm talking about not lazy, but pre-copy migration. It will come in stages.
>
> 1. We get all the memory from memcached and send it to destination node,
> while the daemon itself continues running.
>
> 2. On dst we pre-create the daemon and pre-populate it with memory. On
> source node daemon is still running.
>
> 3. On next step we pick the memory modified by daemon and send it to dst.
> On dst we put the newly arrived memory in place. Daemon is, again, not
> stopped.
>
> 4. We repeat step 3 several times.
>
> 5. We freeze the daemon on src, get full dump and send it on dst (modulo
> the pages not changed since step 4). Then we just restore what's missing
> and resume the daemon.
>
> W/o pre-restore step 5 would look like
>
> 5. We create daemon and put _all_ its memory from images into respective
> places.
>
> It will be longer.
>
>
Ohh! I get it.
>> Besides, I am thinking that lazy migration approach can be applied
>> on the local machine as well where we are going to restore the already
>
> I don't understand the use-case for "live migration on the local maching".
> Live migration is only valuable when we move task(s) from one node to
> another. Do you have some use-case we're unaware of?
>
I was talking about the approach, i.e. just start the process without
memory and get the pages from the disk that has been saved at the time
of checkpointing.
>>> Sure. What is your timezone? Mine is MSK. I think we can meet some day next week.
>>>
>>
>> Mine is EST. I can meet anytime you want to. It would be great if we can
>
> Let's then aim at Monday 18:00 MSK (it should be 10AM). Would this be OK?
>
Yup. Thats awesome! I'll be there.
>> as early as possible as I have some more plans as I would like to extend the
>> work to seamless kernel update, which I would also like to discuss.
>
> Seamless kernel update?! This thing is totally different from lazy migration,
> and it requires different APIs from the kernel :)
>
Yeah, I do have some ideas in my mind. I have already looked at the
pram over kexec patch. I will start working on that once I am
finished with the restore. But, I would also like to get your views about
that before proceeding.
Thanks,
Sanidhya
More information about the CRIU
mailing list