[CRIU] stray unix socket not dumped by the diag module?

Tycho Andersen tycho.andersen at canonical.com
Thu Jul 14 14:00:30 PDT 2016


On Thu, Jul 14, 2016 at 08:31:50AM -0600, Tycho Andersen wrote:
> On Thu, Jul 14, 2016 at 02:00:32PM +0300, Pavel Emelyanov wrote:
> > On 07/13/2016 05:53 PM, Tycho Andersen wrote:
> > > On Wed, Jul 13, 2016 at 04:53:53PM +0300, Pavel Emelyanov wrote:
> > >> On 07/13/2016 01:42 AM, Tycho Andersen wrote:
> > >>> Hi guys,
> > >>>
> > >>> I'm getting a,
> > >>>
> > >>> (00.155898) Error (sk-unix.c:294): sk unix: Unix socket 0x5674 not found
> > >>>
> > >>> very rarely when dumping containers. This looks to be some kind of
> > >>> unix socket that systemd has open, full log here:
> > >>>
> > >>> http://paste.ubuntu.com/19219914/
> > >>>
> > >>> but somehow, this socket didn't get dumped in collect_sockets(). I
> > >>> looked through the kernel code, but it's not obvious to me why a
> > >>> socket would be missing. Any ideas where to start digging?
> > >>
> > >> I guess it's the same issue as for TCP sockets -- only bind()-ed or auto-bound
> > >> sockets get into the list reported by the kernel.
> > > 
> > > I see, because they're not in the unix_socket_table I suppose. It
> > > looks like maybe we can get info on these sockets from the diag module
> > > by doing a specific request for them based on their inode number; is
> > > that the right approach?
> > 
> > Nope, the per-socket diag works on the same hash table :( We've met this
> > with TCP sockets already, see the gen_uncon_sk() function in criu/sk-inet.c,
> > I guess we need smth similar for unix sockets.
> 
> Thanks, I missed that. I'll have a look at gen_uncon_sk and see what
> we can do for unix sockets.

So I just started playing around with this (viz. the attached test),
and it passes. I think that's because all unix sockets are actually
added to unix_socket_table in create_unix1:

        unix_insert_socket(unix_sockets_unbound(sk), sk);

Which makes me think that they should always be in there, except for
when they get removed. As near as I can tell, unix_release_sock() is
the only thing that removes them, right at the beginning of the
function.

Is it possible that this is a kernel bug? If the socket is being
released somehow, but the kernel hits the freezer in
unix_release_sock() after they are released at the beginning of the
function? I don't see any obvious place where it would freeze, but
this is the only way I can explain it.

Tycho
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-tests-add-a-test-for-non-auto-bound-unix-sockets.patch
Type: text/x-diff
Size: 2301 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20160714/908a9d3d/attachment.bin>


More information about the CRIU mailing list