[CRIU] criu fails to dump JVM processes due to a half-closed socketpair()

Florian Gross Florian.S.Gross at web.de
Tue Jul 2 06:19:59 EDT 2013


Hey there,

I'm trying to use criu to rollback Java applications to a state right after their main() method has run.

Since I'm working with graphical applications, I went with the wiki recommendations and set up a process structure like this (I later plan to switch to Xvfb):

        vnc_server.sh(5197)
		Xvnc(5198)
                eclipse(5200)
			java(5211)


Using criu dump to checkpoint a Java process fails when dumping files:

	root at freya:~# ./criu-0.6/criu dump --file-locks --tcp-established -D eclipse-dump -v4 -t 5197

	[…]

	(00.255592) ========================================
	(00.255630) Dumping task (pid: 5211)
	(00.255667) ========================================

	[…]

	(00.590492) Error (sk-unix.c:211): Dangling in-flight connection 19151
	(00.590529) ----------------------------------------
	(00.590564) Error (cr-dump.c:1468): Dump files (pid: 5211) failed with -1


We can use lsof to map that to a fd, strace and gdb then allow us to find the syscall and code location responsible for setting up the socket. In this case, it is set up in Java_sun_nio_ch_FileDispatcher_init() which looks like this [1]:

	static int preCloseFD = -1;     /* File descriptor to which we dup other fd's
        			                           before closing them for real */

	JNIEXPORT void JNICALL
	Java_sun_nio_ch_FileDispatcher_init(JNIEnv *env, jclass cl)
	{
	    int sp[2];
	    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sp) < 0) {
	        JNU_ThrowIOExceptionWithLastError(env, "socketpair failed");
	        return;
	    }
	    preCloseFD = sp[0];
	    close(sp[1]);
	}


What happens here is that a new pair of unix sockets is created. One of the endpoints is then immediately closed. The other one is kept around in a static variable. See [2] for the rationale behind this.

Unfortunately, criu fails to dump half-closed socket pairs. To confirm this, I wrote this minimal C program, which criu fails to dump with the same symptoms [3]:

	#include <sys/socket.h>
	#include <stdio.h>
	#include <unistd.h>

	int main(int argc, char** argv) {
	  int sp[2];

	  if (socketpair(PF_UNIX, SOCK_STREAM, 0, sp) < 0) {
	    perror("Failed to get socketpair");
	  }

	  close(sp[1]);

	  for (;;) { sleep(1000); }
	}


So here's my question: Would it be possible to specially detect this case and just create a new half-open socketpair on restore? Or is there a better solution? This would help in supporting checkpointing of the Java VM, which would be very awesome. (I'm seeing the same behavior for both OpenJDK 6 and 7.)

Thanks a lot,
Florian Gross


(For completeness sake: This is using the criu-0.6 tools release from the web page together with a self-built kernel with all the appropriate config flags based on 768f616.)

[1] Code is available as part of the OpenJDK, or at http://svn.netlabs.org/repos/java/tags/rc/openjdk/jdk/src/solaris/native/sun/nio/ch/FileDispatcher.c
[2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2008-January/000219.html
[3] Also available from https://gist.github.com/flgr/5909055/raw/




More information about the CRIU mailing list