[CRIU] [PATCH] page-server: Allow blocking on socket

Pavel Emelyanov xemul at virtuozzo.com
Mon Jan 9 02:59:31 PST 2017


On 01/09/2017 11:41 AM, Pavel Emelyanov wrote:
> On 01/02/2017 10:30 PM, Andrei Vagin wrote:
>> On Mon, Dec 19, 2016 at 01:13:51PM +0300, Pavel Emelyanov wrote:
>>> This splice tries to get pages from socket into local pipe to
>>> splice them into images later. The data on the socket may not
>>> be there by the time we get to this splice, so there's no reason
>>> to force non-blocking IO here.
>>
>> This SPLICE_F_NONBLOCK isn't about data on the socket. We don't set
>> SOCK_NONBLOCK, so this splice waits data on the socket even with
>> SPLICE_F_NONBLOCK.
>>
>> ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
>> ...
>>         timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
> 
> True, but when the socket is AF_UNIX one, the issues the other way around:
> 
>         if (sock->file->f_flags & O_NONBLOCK ||
>             flags & SPLICE_F_NONBLOCK)
>                 state.flags = MSG_DONTWAIT;
> 
> so getting data from empty unix socket (it can be empty simply because no
> data other than header has arrived yet) results in EAGAIN.

OK, with the patch below we can revert the original one :)


When splicing page server data from UNIX socket we may get
error (EAGAIN) from splice if no data is available on the
socket yet. This is because the SPLICE_F_NONBLOCK flag is
checked by af_unix.c in the kernel to decide whether or
not to do blocking read.

This is not symmetrical with TCP sockets, which only check
for the socket's O_NONBLOCK flag for the same decicion.

Dropping the SPLICE_F_NONBLOCK flag is not possible too, as
otherwise we'll block on the pipe when trying to put data
into it. Even if part of the data fits into it kernel would
block anyway untill full buffer is in. And there will be
no read() from the pipe, as it should happen one step later 
in the same task.

So to untie this, we need to wait for the data explicitly
with poll().

Signed-off-by: Pavel Emelyanov <xemul at virtuozzo.com>
---
 criu/include/util.h |  7 +++++++
 criu/page-xfer.c    | 13 ++++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/criu/include/util.h b/criu/include/util.h
index 1fa0742..22c9c4d 100644
--- a/criu/include/util.h
+++ b/criu/include/util.h
@@ -12,6 +12,7 @@
 #include <sys/statfs.h>
 #include <sys/sysmacros.h>
 #include <dirent.h>
+#include <poll.h>
 
 #include "int.h"
 #include "common/compiler.h"
@@ -263,6 +264,12 @@ int fd_has_data(int lfd);
 
 int make_yard(char *path);
 
+static inline void sk_wait_data(int sk)
+{
+	struct pollfd pfd = {sk, POLLIN, 0};
+	poll(&pfd, 1, -1);
+}
+
 void tcp_nodelay(int sk, bool on);
 void tcp_cork(int sk, bool on);
 
diff --git a/criu/page-xfer.c b/criu/page-xfer.c
index 39c6977..73173bd 100644
--- a/criu/page-xfer.c
+++ b/criu/page-xfer.c
@@ -596,7 +596,18 @@ static int page_server_add(int sk, struct page_server_iov *pi, u32 flags)
 		if (chunk > cxfer.pipe_size)
 			chunk = cxfer.pipe_size;
 
-		chunk = splice(sk, NULL, cxfer.p[1], NULL, chunk, SPLICE_F_MOVE);
+		/*
+		 * Splicing into a pipe may end up blocking if pipe is "full",
+		 * and we need the SPLICE_F_NONBLOCK flag here. At the same time
+		 * splcing from UNIX socket with this flag aborts splice with
+		 * the EAGAIN if there's no data in it (TCP looks at the socket
+		 * O_NONBLOCK flag _only_ and waits for data), so before doing
+		 * the non-blocking splice we need to explicitly wait.
+		 */
+
+		sk_wait_data(sk);
+
+		chunk = splice(sk, NULL, cxfer.p[1], NULL, chunk, SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
 		if (chunk < 0) {
 			pr_perror("Can't read from socket");
 			return -1;
-- 
2.5.0




More information about the CRIU mailing list