[CRIU] Reg unix stream checkpointing and other issues.

Fri Oct 24 12:26:40 PDT 2014

On 10/24/2014 10:55 PM, Sanidhya Kashyap wrote:
> On 10/24/2014 02:34 PM, Pavel Emelyanov wrote:
>>>>> - There is a possibility that a forked task might call not yet started
>>>>> task. What will happen in this case?
>>>>
>>>> The pre-restored tree is not running, it's frozen.
>>>>
>>>
>>> I don't think the incremental restore will have any performance benefits, whereas
>>> the lazy one will definitely have. I have done this for VMs. 
>>
>> Lazy will, but it has another drawback -- once the source node is not
>> accessible while the destination still needs pages from it, all the
>> soon-to-be-restored tasks will die.
>>
> 
> I didn't get the point completely. Right now, I am thinking of local
> machine which is both source and destination. The checkpointed
> data has been dumped on the disk and the process is about to be
> restored again. Later, I'll extend the work across multiple nodes i.e.
> source and destination. 

OK, if the source and destination machine is the same box, then no problems.
Issue will arise when source machine gives pages to destination over the
network.

>> Pre-restore should have performance benefits as we will avoid big portion
>> of two stages -- fork() and memory restore -- which currently take quite
>> a lot of time.
>>
> 
> I think, I am still not getting this point. Can you give me an example 
> of how it is going to benefit. Lets take the case of memcached server
> which is running and is checkpointed and then it is again restarted. 
> It does have multiple threads to read/write the data to the memory 
> as queried by the client. In this how, is the incremental approach
> going to work? 

I'm talking about not lazy, but pre-copy migration. It will come in stages.

1. We get all the memory from memcached and send it to destination node,
   while the daemon itself continues running.

2. On dst we pre-create the daemon and pre-populate it with memory. On
   source node daemon is still running.

3. On next step we pick the memory modified by daemon and send it to dst.
   On dst we put the newly arrived memory in place. Daemon is, again, not
   stopped.

4. We repeat step 3 several times.

5. We freeze the daemon on src, get full dump and send it on dst (modulo 
   the pages not changed since step 4). Then we just restore what's missing
   and resume the daemon.

W/o pre-restore step 5 would look like

5. We create daemon and put _all_ its memory from images into respective
   places.

It will be longer.

> Besides, I am thinking that lazy migration approach can be applied 
> on the local machine as well where we are going to restore the already

I don't understand the use-case for "live migration on the local maching".
Live migration is only valuable when we move task(s) from one node to
another. Do you have some use-case we're unaware of?

> checkpointed process. In that case, will the userfaultfd more than sufficient?

It looks like we're talking about 3 different things.

1. Pre-copy migration. This is what criu already can do.

2. Pre-restore. This is an optimization to 1 I have in mind. It can
   be implemented w/o userfaultd, but it's tricky.

3. Lazy or post-copy migration. This is what you're working on. This
   feature would require starting a process on dst node w/o memory and
   getting it eventually using userfaultfd and memcopy syscalls.

>>>> This is what we call "lazy migration" and yes, this is in our plans
>>>> too :) But the existing userfaultfd + memcopy API is not enough. The
>>>> latter syscall should operation on arbitrary task VM, not only on the
>>>> current one.
>>>>
>>>
>>> It would be great if you can give me some details about the memcopy API 
>>> as I would like to work and develop a prototype for the whole lazy migration
>>> process.
>>>
>>> Another question is that can I do it currently without extending the memcopy API?
>>
>> I heavily doubt it. We cannot "make" restored process pull memory for itself.
>> We should have a userfault-daemon that will suck pages from dst node and
>> inject them into the remote processes.
>>
> 
> Can't criu act as a daemon in fetching the pages from the file?

Right now -- no, but it should.

>  
>>> I would also like to discuss the model / approach for the lazy migration. If you 
>>> have something in mind, it would definitely help me a lot. Are you available on
>>> irc, where I can discuss the issues?
>>
>> Sure. What is your timezone? Mine is MSK. I think we can meet some day next week.
>>
> 
> Mine is EST. I can meet anytime you want to. It would be great if we can

Let's then aim at Monday 18:00 MSK (it should be 10AM). Would this be OK?

> as early as possible as I have some more plans as I would like to extend the
> work to seamless kernel update, which I would also like to discuss. 

Seamless kernel update?! This thing is totally different from lazy migration,
and it requires different APIs from the kernel :)

Thanks,
Pavel