[CRIU] Restarting Process in an LXC Container

Andrew Vagin avagin at parallels.com
Wed Jan 28 06:11:06 PST 2015


On Wed, Jan 21, 2015 at 10:42:39PM -0700, Tycho Andersen wrote:
> On Tue, Jan 20, 2015 at 02:43:09PM +0300, Andrew Vagin wrote:
> > On Mon, Jan 19, 2015 at 06:09:47PM +0100, Thouraya TH wrote:
> > > Hello :) thanks a lot for help.
> > > 
> > > How do you enter into CT? Do you use screen or ssh?
> > > 
> > > Before the Dumping Process:
> > > 
> > > root at g-2:~# lxc-ls -f
> > > NAME     STATE    IPV4  IPV6  GROUPS  AUTOSTART 
> > > -----------------------------------------------
> > > worker   STOPPED  -     -     -       NO          
> > > root at g-2:~# lxc-start -n worker
> > > root at g-2:~# lxc-attach -n worker
> > 
> > I've understood the problem and its causes. If you execute a process in
> > CT, it lives in CT's pid namespaces, but it belongs to another process
> > tree, becuase its parent is outside of the CT.
> > 
> > Currently CRIU is able to dump only one process tree. I am not sure that
> > we will fix this problem in a near future. I can suggest you to use
> > screen or tmux, they have to workaround your problem.
> 
> If I've understood the problem correctly, it sounds like one thing we
> could do to prevent this error is to have lxc-checkpoint make sure
> nobody is attached to the container via lxc-attach? (Or ideally via
> nsenter or whatever, although that may be harder.)

Yes, you are right. But I think we need to check this in CRIU. After
freezing processes CRIU should read a process list from /proc and checks
that there are no new processes.

> 
> Tycho
> 
> > Thanks,
> > Andrew Vagin
> > 
> > > root at worker:/home# cat > test.sh <<-EOF
> > > > #!/bin/sh
> > > > while :; do
> > > >     sleep 1
> > > >     date
> > > > done
> > > > EOF
> > > root at worker:/home# chmod +x test.sh
> > > root at worker:/home# ./test.sh
> > > Mon Jan 19 17:58:57 CET 2015
> > > Mon Jan 19 17:58:58 CET 2015
> > > ..........................
> > > After Restart, i have used ssh.
> > > 
> > > 
> > > dump.log:
> > > Warn  (fsnotify.c:183): fsnotify:       Handle 800003:9db17 cannot be opened
> > > Warn  (fsnotify.c:183): fsnotify:       Handle 800003:9db1d cannot be opened
> > > tar: ./udev/control: socket ignored
> > > 
> > > restore.log"
> > > Warn  (cr-restore.c:996): Set CLONE_PARENT | CLONE_NEWPID but it might cause
> > > restore problem,because not all kernels support such clone flags combinations!
> > > RTNETLINK answers: File exists
> > > RTNETLINK answers: File exists
> > > RTNETLINK answers: File exists
> > > 
> > > Thank you so much for help.
> > > Best Regards.
> > > 
> > > 2015-01-16 12:29 GMT+01:00 Andrew Vagin <avagin at parallels.com>:
> > > 
> > >     On Thu, Jan 15, 2015 at 02:02:56PM +0100, Thouraya TH wrote:
> > >     > Hello,
> > > 
> > >     Hello,
> > > 
> > >     Add Tycho into CC.
> > >    
> > >     >
> > >     > Please,i have a question about the restarting process of a LXC container.
> > >     > i run this script http://criu.org/Simple_loop in a container:
> > >     >
> > >     > 1)
> > >     > ubuntu at worker:/home$ ./test.sh
> > >     > Wed Jan 14 22:30:23 CET 2015
> > >     > Wed Jan 14 22:30:24 CET 2015
> > >     > Wed Jan 14 22:30:25 CET 2015
> > >     > Wed Jan 14 22:30:26 CET 2015
> > >     > Wed Jan 14 22:30:35 CET 2015
> > >     > .......................
> > > 
> > >     How do you enter into CT? Do you use screen or ssh?
> > >    
> > > 
> > >     > 2) i have done the dumping process:
> > >     >       root at g-3:/home# lxc-checkpoint -s -D /home/ImGLXC1Worker -n worker
> > >     > 3) i have restart the container:
> > >     >     root at g-3:/home/ImGLXC1Worker# lxc-checkpoint -r -D /home/
> > >     ImGLXC1Worker -n
> > >     > worker
> > >     >      # lxc-ls -f
> > >     > NAME     STATE    IPV4        IPV6  GROUPS  AUTOSTART
> > >     > -----------------------------------------------------
> > >     > worker   RUNNING  10.0.3.109  -     -       NO
> > > 
> > >     Thouraya, Could you increase verbose level for criu and show us dump and
> > >     restore
> > >     logs? Tycho, could you explain how to do this with lxc-checkpoint?
> > > 
> > >     Thanks,
> > >     Andrew
> > >    
> > >     >
> > >     > 4) ssh ubuntu@$(sudo lxc-info -n worker -H -i)
> > >     > 5) ubuntu at worker:/home$ ps
> > >     >   PID TTY          TIME CMD
> > >     >   304 pts/0    00:00:00 bash
> > >     >   318 pts/0    00:00:00 ps
> > >     >
> > >     > The process "test" didn't restart!
> > >     >
> > >     > i have done another test : ubuntu at worker:/home$ ./test.sh > Results.txt /
> > >     > Dumping the container / Restarting the container/ The process "test"
> > >     didn't
> > >     > restart! and the file Results.txt didn't change !
> > >     >
> > >     > Have you an idea please ?
> > >     >
> > >     > Thanks a lot.
> > >     > Best Regards.
> > > 
> > >     > _______________________________________________
> > >     > CRIU mailing list
> > >     > CRIU at openvz.org
> > >     > https://lists.openvz.org/mailman/listinfo/criu
> > > 
> > > 
> > > 


More information about the CRIU mailing list