[CRIU] open files issue

Pavel Emelyanov xemul at parallels.com
Wed Dec 17 05:14:32 PST 2014


On 12/15/2014 08:26 PM, Sanidhya Kashyap wrote:
> Hello everyone,
> 
> I have been using criu to checkpoint memcached which works quite well but there
> is an issue with the dumping of the open files - func: dump_task_files_seized in
> cr-dump.c, specially dump_one_file in files.c. 	
> 
> The function dump_one_file actually takes malmost 99% of the time when doing
> either single dump or multiple pre-dumps followed by a single dump. I have
> attached the dump with this email.

How did you get these 99%? Have you used perf tool? If yes, can you show
the full tree of calls?

> I have some questions related to the dumping of the data:
> 
> 1) Why does the open-files takes a lot of time in case of open connections? Is
> this case only with socket connection or even others as well?

The task you dump has 304 file descriptors. Lots of them are connected sockets
and for each of it the iptables command is spawned. This takes most of the time
in your case.

> 2) I keep on getting the error message like these:
> 
> (00.097203) Error (parasite-syscall.c:388): si_code=1 si_pid=2643 si_status=0
> (00.111058) Error (parasite-syscall.c:388): si_code=1 si_pid=2644 si_status=0
> 
> (00.125807) Error (parasite-syscall.c:388): si_code=1 si_pid=2645 si_status=0
> (00.138801) Error (parasite-syscall.c:388): si_code=1 si_pid=2646 si_status=0
> 
> 
> 
> What is the meaning of above messages?

These are spurious from iptables spawn. Need to fix :\

> 3) As you can see that the total dump time is around 8.69 seconds out of which
> the open files dumping lasted from 0.09 second to 8.29 second, which is VERY
> VERY high. Thus, it will be very difficult for anyone to checkpoint for network
> based applications as lot of sockets connections can be opened and this will
> degrade the performance and the dump_one_file is saving the socket info and
> writing appending iptables rules, specially in the live migration and seamless
> kernel upgrade cases.
> What are the optimal approaches to solve this issue either in the userspace or
> even at the kernelspace level?

Of course. Can you check the fdinfo.img file for how many INET files you have?
Most of the time we add new iptables rules. This can be fixed in several ways.

Either you can move the memcached into net namespace, then network lock would
happen extremely fast. Of you can batch the locking with iptables and do it
directly with the kernel API or, at least, with the libnetfilter, not with the
fork + exec.

> I am running this benchmark on 32 core machine (Intel(R) Xeon(R) CPU E5-2630 v3
> @ 2.40GHz) with fedora 20 running kernel version 3.17.4-200.fc20.x86_64. The
> memcached is running with 64GB RAM with 8 threads. The requests are generated
> using memaslap with 4 threads and 256 concurrent connections on a gigabit
> network. I have changed the ip address in the log for the sake of security.
> 
> I run the following command:
> 
> sudo criu dump -t `pgrep memcached` --tcp-established -j -D criu-dump -o
> dump.log -v4
> 
> I shall be grateful if anyone can provide me some insight into this issue and
> how to resolve this.
> 
> Thanks,
> Sanidhya
> 



More information about the CRIU mailing list