[CRIU] criu restore performance

Mon Jul 7 01:41:46 PDT 2014

On 07/07/2014 12:10 AM, J F wrote:
> 
> I had a few follow-up questions. I'm not kernel savvy, so some of my questions may seem naive. 
>  
> 
>     > 1) Is 'criu restore' operation time complexity mostly CPU bound,
>     >    IO bound, or memory bound?
> 
>     It's mostly CPU-bound, but for images with large amount of process memory
>     it can become mem-bound.
> 
> 
> Does CRIU restore operation take advantage of all CPU cores on the system when restoring tasks? 
> I didn't see any calls to pthread in src.

Each task restores himself in parallel with the others. If the image contains
many tasks, this may make use of all the core :) Other than this we haven't
thought about other parallelisms.

>     Between each step there are global synchronization points.
>     Other than this in stage 2 tasks may sometimes wait for each other
>     to restore shared resources, e.g. opened files.
> 
> 
> Any opportunity to reduce or aggregate the number of synchronization points so more stuff can be done in parallel?

There's always a room for improvement. First of all, all the synchronization points
come from shared resources. I'd say that right now we use pretty straightforward, but
reliable model of synchronization -- if we have a resource shared between tasks T1, T2,
... TN, then the Ti with the smallest pid creates one, all the rest wait for it to
appear (when they want to restore it). This is deadlock-free, but, probably, not
extremely optimal.

>     > 3) Is performance of restore a function of the size of the images folder?
> 
>     Well, yes. The more data we have to restore the more time it takes. But
>     the dependency is not researched, but it's non-linear for sure.
> 
>     > 4) Any tricks/advice/hacks to speed up restore?
> 
>     It's a WIP at the moment. We do know some things that slow restore (and
>     dump), but the list is not complete and is not fully fixed yet. E.g.
> 
>     1. More image files we have the slower it works. Currently criu generates
>        8 files per-task, we try to make less of them.
> 
>     2. Criu writes data into images with small portions. This behaves badly
>        due to many actions taken by kernel on every write() call especially
>        for disk FS-s (even for page-cache writes).
> 
> 
> Any reason not to increase write memory block size? 

No reasons other than the lack of time and free hands.

>     3. /proc interface we use heavily on dump is too damn slow
> 
>     4. Shared file descriptors can be inherited by tasks on restore. Instead
>        we share them via unix sockets which is slower.
> 
>     5. Potentially COW-ed pages in memory mapping are memcmp-ed on restore to
>        decide whether or not to COW the page. No good ideas how to deal with it
> 
> 
> Of the 5 items you listed, which do you think is the biggest performance bottleneck for
> a restore operation on a large memory application (e.g. my dump is about .5 - 2 gigs)?

If all ~2gigs are about pages.img of a single task, then the bottleneck would be in
reading all this data from disk (to put into memory). If these 2gigs are spread over
~1k tasks, then we would (should) get stuck in resolving COW regions in open_vmas().
But the latter is mostly a guess.

Thanks,
Pavel