[CRIU] lxc - cgroup related restore error
Adrian Reber
adrian at lisas.de
Wed Jul 13 08:56:29 PDT 2016
On Wed, Jul 13, 2016 at 09:30:02AM -0600, Tycho Andersen wrote:
> On Wed, Jul 13, 2016 at 05:17:24PM +0200, Adrian Reber wrote:
> > On Wed, Jul 13, 2016 at 08:27:42AM -0600, Tycho Andersen wrote:
> > > On Wed, Jul 13, 2016 at 12:49:07PM +0200, Adrian Reber wrote:
> > > > On Wed, Jul 13, 2016 at 01:41:34PM +0300, Cyrill Gorcunov wrote:
> > > > > On Wed, Jul 13, 2016 at 12:29:01PM +0200, Adrian Reber wrote:
> > > > > >
> > > > > > If I am trying to migrate a process while a LXC container is running on
> > > > > > the source system the migration fails during restore on the destination
> > > > > > system with:
> > > > > >
> > > > > > Error (cgroup.c:1193): cg: Failed writing 0-3 to cpuset//lxc/c7/cpuset.cpus: Numerical result out of range
> > > > > > Error (cgroup.c:1470): cg: Restoring special cpuset props failed!
> > > > > >
> > > > > > This happens with CRIU 2.3 and latest GIT.
> > > > > >
> > > > > > If I am running a LXC container on the destination system I still get
> > > > > > this error. If I am stopping the LXC container on the source system the
> > > > > > error disappears. This is again on a RHEL7 system with a 3.10.something
> > > > > > kernel.
> > > > >
> > > > > Looks like you're migratin into machine with less number of cpus?
> > > >
> > > > Yes, that's true. Haven't checked that before. I am using two virtual
> > > > machines and it seems like I have forgotten that I changed the specs.
> > > >
> > > > But as the migration works when LXC is stopped it would be nice to have
> > > > it working with LXC running. Migrating the container from one system to
> > > > another also works without errors. Only migrating a process unrelated to
> > > > the LXC container does not work.
> > >
> > > Sorry, I'm not sure I understand this paragraph. What does it mean to
> > > migrate when LXC is stopped?
> >
> > I meant, I cannot migrate a process when a LXC container is running as I
> > get the cgroup error from above. When no LXC container is running the
> > cgroup error does not happen. More understandable now?
>
> Hmm. So is the LXC container contained in the process's subtree? What
> cpuset cgroup is it in (cat /proc/pid/cgroup for the task you're
> trying to migrate)?
My test process is called 'minimal'. It malloc()s a page and reads from
that page in a loop with sleeps in-between. That is the cgroup
information of that:
# cat /proc/15950/cgroup
11:memory:/user.slice
10:hugetlb:/
9:devices:/user.slice
8:freezer:/
7:cpuacct,cpu:/user.slice
6:pids:/
5:cpuset:/
4:blkio:/user.slice
3:net_prio,net_cls:/
2:perf_event:/
1:name=systemd:/user.slice/user-0.slice/session-2.scope
This is the process tree of my container, which is unrelated to the
process above:
19440 pts/0 S 0:00 [lxc monitor] /var/lib/lxc c7
19445 ? Ss 0:00 \_ /sbin/init
19476 ? Ss 0:00 \_ /sbin/dhclient -H c7 -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0
19477 ? S 0:10 \_ /usr/bin/postgres -D /var/lib/pgsql/data -p 5432
19502 ? Ss 0:01 | \_ postgres: stats collector process
19503 ? Ss 0:00 | \_ postgres: autovacuum launcher process
19504 ? Ss 0:00 | \_ postgres: wal writer process
19505 ? Ss 0:00 | \_ postgres: writer process
19506 ? Ss 0:00 | \_ postgres: checkpointer process
19507 ? Ss 0:00 | \_ postgres: logger process
19478 ? Ss 0:00 \_ /usr/sbin/rsyslogd -n
19479 ? Ss 0:00 \_ /usr/sbin/sshd -D
19480 ? Ss 0:00 \_ /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
19481 ? Ss 0:00 \_ /usr/lib/systemd/systemd-logind
19482 ? Ssl 0:24 \_ /usr/lib/jvm/jre/bin/java -Djava.security.egd=file:/dev/./urandom -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
19483 ? Ss 0:00 \_ /usr/lib/systemd/systemd-journald
The process 'minimal' and the container 'c7' should be completely
unrelated.
Adrian
More information about the CRIU
mailing list