[CRIU] Error dumping Android emulator: Name resolved on unconnected socker
Pavel Emelyanov
xemul at parallels.com
Thu Sep 24 11:49:55 PDT 2015
On 09/24/2015 08:48 PM, Julian Andres Klode wrote:
> On Thu, Sep 24, 2015 at 07:32:13PM +0300, Pavel Emelyanov wrote:
>> On 09/24/2015 07:11 PM, Julian Andres Klode wrote:
>>> On Thu, Sep 24, 2015 at 06:10:09PM +0200, Julian Andres Klode wrote:
>>>> On Thu, Sep 24, 2015 at 07:06:42PM +0300, Pavel Emelyanov wrote:
>>>>> On 09/24/2015 05:08 PM, Julian Andres Klode wrote:
>>>>>> Hi,
>>>>>>
>>>>>> while dumping an Android emulator, I always receive the following
>>>>>> errors:
>>>>>>
>>>>>> (00.014220) Error (sk-inet.c:188): Name resolved on unconnected socket
>>>>>
>>>>> Oh! Do you receive it every single time you try to dump? Or just quite often?
>>>>> And which version of criu do you use?
>>>>
>>>> Always. Which I suppose is great because every other instance I've read
>>>> of before was supposedly a race.
>>>>
>>>>>
>>>>>> I am attaching
>>>>>> * the log file (dump.log)
>>>>>> * the lsof output of the to be dumped process (lsof.log)
>>>>>> * the ss -f output (ss-f.log)
>>>>>
>>>>> No attachments in the e-mail :\
>>>>
>>>> Sorry, attached now.
>>>>
>>>
>>> Sorry again, now!
>>>
>>
>> OK, so here's the suspicious thing we have in lsof.log:
>>
>> emulator6 14064 jak 9u sock 0,8 0t0 237290 can't identify protocol
>>
>> It might be not a protocol we support. Can you decode the sk-inet.img file with crit
>> tool and find this socket there? Or send one here, I'll decode it myself.
>>
>> (And yes, we should check for protocol earlier, I'll send a patch).
>
> I applied the patches that checks for a protocol earlier, but this did not change
> anything. I started a new process, the socket is now 8.
>
> I attached a lsof, the crit show of inetsk.img, and the dump log of
> a criu that is patched with my work around and your two patches, so you'll
> see what happens if you force it to dump the socket.
>
> It seems to be a pure TCP socket.
OK, so here's the socket dump log
Dumping: ino 0x 7c601 family 2 type 1 port 0 state 7 src_addr 0.0.0.0
it's indeed TCP socket, it's in state 7, which is TCP_CLOSE, that's why we don't
see this socket when we collect them (it's unhashed from kernel). Also we check
for the state being TCP_CLOSED in CRIU, this check pass, so it's indeed TCP_CLOSE.
Now the question is -- why it has the peer-name set on it? Hmm...
The kernel code says that SO_GETPEERNAME will return one even if socket is in closed
state, this is what we see. Typically, if you close() a socket, it also disappears
from fdtable, but it's not in our case. Double hmm...
The only option to put socket in TCP_CLOSE state w/o removing it from the fdtable
is by calling shutdown() on it with SHUTDOWN_SEND mask. And in this case it will
only flow into the closed state from LISTEN, TIME_WAIT or SYN_SENT state.
Presumably, this is what we see -- an application opens a socket, then either
a) puts it into listen state, or
b) issues a connect, or
c) establishes a connection and it gets closed by peer, socket gets time-wait,
then app calls shutdown(). Socket gets marked as TCP_CLOSE, removed from hashes but
remains in fdtable.
So the fix you did for that is almost-correct. We should support unhashed but present
sockets with peer names, mark them as shut-down-on-send and restore them such.
But you've also mentioned, that you saw some issue regarding "can't handle shutdown
socket", can you elaborate on this? When did you see this and can I see logs for
this too? I'm almost sure these two problems are tightly connected :)
-- Pavel
More information about the CRIU
mailing list