[CRIU] Error (sk-inet.c:202): Name resolved on unconnected socket

Andrew Vagin avagin at virtuozzo.com
Thu Jul 21 14:16:39 PDT 2016


On Thu, Jul 21, 2016 at 04:19:49PM +0200, Adrian Reber wrote:
> On Fri, Jul 15, 2016 at 09:43:33PM -0700, Andrew Vagin wrote:
> > On Thu, Jul 14, 2016 at 02:08:23PM +0300, Pavel Emelyanov wrote:
> > > On 07/13/2016 07:10 PM, Adrian Reber wrote:
> > > > On Wed, Jul 13, 2016 at 05:48:16PM +0300, Pavel Emelyanov wrote:
> > > >> On 07/13/2016 04:59 PM, Adrian Reber wrote:
> > > >>> On Wed, Jul 13, 2016 at 03:54:58PM +0300, Pavel Emelyanov wrote:
> > > >>>> On 07/05/2016 04:33 PM, Adrian Reber wrote:
> > > >>>>> Now that CRIU can drop in-flight connections I get much better results
> > > >>>>> checkpointing and restarting my test container while running ab against
> > > >>>>> the tomcat server in the container: ab -n 1000000 -c 20
> > > >>>>>
> > > >>>>> I am still testing with LXC using lxc-checkpoint and once every 100 test
> > > >>>>> runs I get following error:
> > > >>>>>
> > > >>>>> (00.232915) 6693 fdinfo 10: pos:                0 flags:          2000002/0x1
> > > >>>>> (00.232917) fdinfo: type: 0x5 flags: 02000002/01 pos:        0 fd: 10
> > > >>>>> (00.232923) 6693 fdinfo 11: pos:                0 flags:                2/0
> > > >>>>> (00.232926)     Searching for socket 52f7d (family 2.6)
> > > >>>>> (00.232928) Error (sk-inet.c:202): Name resolved on unconnected socket
> > > >>>>> (00.232929) ----------------------------------------
> > > >>>>> (00.232939) Error (cr-dump.c:1323): Dump files (pid: 6693) failed with -1
> > > >>>>> (00.232954) Waiting for 6693 to trap
> > > >>>>> (00.232961) Daemon 6693 exited trapping
> > > >>>>>
> > > >>>>> Can this also be solved in a way like the dropping of in-flight connections?
> > > >>>>
> > > >>>> This is a socket, that turned into connected state while we've been locking
> > > >>>> the network, is it?
> > > >>>
> > > >>> I have no idea, that's why I am asking ;-) But it could be exactly this.
> > > >>>
> > > >>> This sounds like something that cannot be just dropped, because a
> > > >>> established connection would then be missing during restore. If I
> > > >>> understand it correctly the only option is to retry later, right?
> > > >>
> > > >> No, I'd say the way to go is to check the socket's state to be established
> > > >> and dump it. It looks like we should first lock the network and only _after_
> > > >> it collect the sockets via diag, not the vice-versa.
> > > > 
> > > > It sounds great if this is solvable, but I think I do not fully
> > > > understand it. If I look at the log the network seems to get locked
> > > > pretty early:
> > > > 
> > > > http://lisas.de/~adrian/dump.log-name-resolved-on-unconnected-socket
> > > 
> > > Yes, you're right. As Andrey points out network_lock() is called before
> > > collect_namespaces(). Then we need to get more info about this socket,
> > > in particular, the gen_uncon_sk() reads more stuff about socket and
> > > the info.tcpi_state is the most interesting for us. What is it?
> > 
> > https://ci.openvz.org/job/CRIU/job/CRIU-x86_64-dedup/branch/criu-dev/570/console
> > Start test
> > ./socket-closed-tcp --pidfile=socket-closed-tcp.pid
> > --outfile=socket-closed-tcp.out
> > Run criu pre-dump
> > Run criu pre-dump
> > Run criu dump
> > =[log]=> dump/zdtm/static/socket-closed-tcp/24/3/dump.log
> > ------------------------ grep Error ------------------------
> > (00.053676) Error (sk-inet.c:202): Name resolved on unconnected socket
> > (00.053786) Error (cr-dump.c:1322): Dump files (pid: 24) failed with -1
> > (00.056076) Error (cr-dump.c:1633): Dumping FAILED.
> > ------------------------ ERROR OVER ------------------------
> > 
> > Does it the same error?
> 
> This looks like the same error. Do I still need to provide additional
> informations or can this now be reproduced somehow else?

No, You don't. The reason of this error is a socket which was connected
to somewhere and then closed.

> 
> 		Adrian


More information about the CRIU mailing list