[CRIU] crtools from git tree - Error (sk-inet.c:443): Can't bind inet socket: Address already in use

Pavel Emelyanov xemul at parallels.com
Wed Aug 1 22:39:57 EDT 2012


> => I applied the following changes to crtools-HEAD-368d7ac/sk-inet.c
>    and it seems to work.  Does this make sense to you??
> 
> 
> ----------------------------------------------------------------------
> # diff -Naurp sk-inet.c_orig sk-inet.c
> --- sk-inet.c_orig      2012-08-01 13:39:22.000000000 -0600
> +++ sk-inet.c   2012-08-01 19:54:08.000000000 -0600
> @@ -407,7 +407,8 @@ int inet_bind(int sk, struct inet_sk_inf
>                 struct sockaddr_in6     v6;
>         } addr;
>         int addr_size = 0;
> -
> +       int result;
> +       int optlen;
> 
>         memzero(&addr, sizeof(addr));
>         if (ii->ie->family == AF_INET) {
> @@ -427,7 +428,16 @@ int inet_bind(int sk, struct inet_sk_inf
>         } else
>                 BUG_ON(1);
> 
> +       optlen = 1;
> +  result = setsockopt(sk, SOL_SOCKET, SO_REUSEADDR, &optlen,
> sizeof(optlen));
> +  if (result < 0) {
> +               perror("sk-inet");
> +               return 0;
> +       }
> +       pr_info("SO_REUSEADDR issued on sockfd: %d\n", sk);
> +
>         if (bind(sk, (struct sockaddr *)&addr, addr_size) == -1) {
> +               pr_info("bind on sockfd: %d\n", sk);
>                 pr_perror("Can't bind inet socket");
>                 return -1;
>         }

No, this is not correct. The original socket was created without this option,
so should be the restored one.

I think, that we're facing a race here -- there are two sockets on the same
port here -- the listener and the established conn. As seen from logs the 
connected socket gets restored earlier, than the listening one:

  26829: Restoring TCP connection
  26825:   Restore: family 2 type 1 proto 6 port 12345 state 10 src_addr
  26825: Error (sk-inet.c:443): Can't bind inet socket: Address already in use

Thus the listening one conflicts on bind(). I think the proper fix would be in
setting the SO_REUSEADDR before bind and the dropping it afterwards.

> After applying the above patch...and waiting for the persist-timer on
> the previous socket (127.0.0.1:12345) to expire, I then re-ran the test
> to get:

This is strange -- why do you have to wait for timer to expire? The repaired
socket on close just kills itself w/o any post-connected states.

> # cat srv.log
> Binding to port 12345
> Waiting for connections
> New connection
> Done
> 
> # cat cln.log
> Connecting to 127.0.0.1:12345
> New connection
> Read 79 bytes, sending to sock
> Checking for 79 bytes
> Read 79 bytes, sending to sock
> Checking for 79 bytes
> Done
> 
> 
> => tcp/dump/restore.log
> ...
> ...
> Unlocked 127.0.0.1:12345 - 127.0.0.1:44711 connection
> Go on!!!
> 
> 
> => Does this look as if its (test/tcp/run.sh) working?

Yes, this means that the test passed OK.

> 
> Thanking you in advance.
> - Dilip Daya.
> 



More information about the CRIU mailing list