[CRIU] Error (sk-inet.c:153): In-flight connection (l) for 4bad5

Pavel Emelyanov xemul at virtuozzo.com
Mon Jun 6 04:56:22 PDT 2016


On 06/01/2016 07:19 PM, Adrian Reber wrote:
> On Wed, Jun 01, 2016 at 11:56:02AM +0300, Pavel Emelyanov wrote:
>> On 05/31/2016 06:27 PM, Adrian Reber wrote:
>>> I have a lxc container with a tomcat server and postgresql database in
>>> it. The tomcat server is running an test application which connects to
>>> the database, reads and modifies a field.
>>>
>>> I can checkpoint and restart that container most of the times. If I am
>>> reloading the tomcat application in a while true; do loop with curl I
>>> sometimes get a dump failure like this:
>>>
>>> 27112 fdinfo 50: pos: 0x               0 flags:             4002/0
>>> 	Searching for socket 4bad5 (family 10.6)
>>> Error (sk-inet.c:153): In-flight connection (l) for 4bad5
>>> ----------------------------------------
>>> Error (cr-dump.c:1312): Dump files (pid: 27112) failed with -1
>>>
>>> I guess this is a TCP connection which is in a state that criu cannot
>>> handle. Is that correct?
>>
>> Yes. This happens when a client sens SYN to server, server responds with
>> SYN-ACK, but the almost-new connection hasn't yet been accept()-ed by
>> server.
> 
> Thanks for the explanation. Is there any way to checkpoint a container
> with such connection?

Only unseizing/unfreezing it and waiting for the connection to get accepted.

> Would it be possible to drop a TCP connection in
> that state. I would rather see my container migrated than an error
> during checkpointing. It would be 'unfortunate' for the client as the
> connection establishment fails and is silently dropped, but I guess
> the TCP stack will resend the packages after a timeout.

Yes, we can patch criu to just ignore this thing. The conn request will end
up with reset flag send to peer and the container being migrated won't
even notice it.

> The reason I am asking this is that if the checkpoint fails because of
> 'In-flight connection' the error I get as a user is the same as if it
> fails for another reason, which might be a real problem. If I retry the
> checkpointing a few moments later it might work.
> 
> In the case of lxc-checkpoint it would be nice if lxc-checkpoint would know
> that it just needs to retry it again a bit later. I guess it could also be
> acceptable to drop connections in this state and hope that the clients
> recover.
> 
> 		Adrian
> .
> 



More information about the CRIU mailing list