[CRIU] [PATCH] net: Do not toggle TCP_REPAIR while restoring TCP send queues

Pavel Emelyanov xemul at parallels.com
Mon Feb 9 01:15:49 PST 2015


On 02/06/2015 10:10 PM, Amey Deshpande wrote:
> For an established TCP connection, the send queue is restored in two
> steps: in step (1), we retransmit the data that was sent before but not
> yet acknowledged, and in step (2), we transmit the data that was never
> sent outside before.  The TCP_REPAIR option is disabled before step (2)
> and re-enabled after step (2) (without this patch).

Yes, as Andrey pointed out this was done deliberately. Otherwise we introduce
delays in send queue processing.

> If the amount of data to be sent in step (2) is large, the TCP_REPAIR
> flag on the socket can remain off for some time (O(milliseconds)).  If a
> listen() is called on another socket bound to the same port during this
> time window, it fails. This is because -- turning TCP_REPAIR off clears
> the SO_REUSEADDR flag on the socket.
> 
> There are several possible ways to prevent this problem from happening:
> - The simplest option is to *not* toggle TCP_REPAIR option while
>   restoring the TCP queues.
> - Another way would be to explicitly enable SO_REUSEADDR on the
>   socket after turning TCP_REPAIR off.  This still leaves a small time
>   window, and such race could still occur.
> - A more involved solution would use a mutex per port number, so
>   that a listen() on a port number does not happen while SO_REUSEADDR for
>   another socket on the same port is off.

I vote for this way. We already have the per-socket inet_port object that
is shared between processes and new lock can be safely put there.

> This patch removes the toggling of TCP_REPAIR option during restoring
> TCP send queues.
> 
> Signed-off-by: Amey Deshpande <ameyd at google.com>

Thanks,
Pavel




More information about the CRIU mailing list