[CRIU] [RFC] run each test case also in --check-only mode
Pavel Emelyanov
xemul at virtuozzo.com
Mon Mar 20 06:21:57 PDT 2017
On 03/20/2017 04:19 PM, Adrian Reber wrote:
> On Mon, Mar 20, 2017 at 03:42:49PM +0300, Pavel Emelyanov wrote:
>> On 03/20/2017 03:33 PM, Adrian Reber wrote:
>>> On Mon, Mar 20, 2017 at 03:22:27PM +0300, Pavel Emelyanov wrote:
>>>> On 03/20/2017 02:39 PM, Adrian Reber wrote:
>>>>> The following patch tries to add the --check-only option to each test case.
>>>>>
>>>>> For each test case now the following step are run:
>>>>>
>>>>> * dump in --check-only mode
>>>>> * real dump
>>>>> * restore in --check-only mode
>>>>> * real restore
>>>>>
>>>>> For most test cases this works successfully. Unfortunately not for network
>>>>> test cases. There are multiple problems:
>>>>>
>>>>> ./zdtm.py run -f h -t zdtm/static/socket-tcp --check-only
>>>>>
>>>>> === Run 1/1 ================ zdtm/static/socket-tcp
>>>>>
>>>>> ======================= Run zdtm/static/socket-tcp in h ========================
>>>>> Start test
>>>>> ./socket-tcp --pidfile=socket-tcp.pid --outfile=socket-tcp.out
>>>>> Run criu dump in check-only mode
>>>>> Only checking if requested operation will succeed
>>>>> Run criu dump
>>>>> Run criu restore in check-only mode
>>>>> Only checking if requested operation will succeed
>>>>> Checking mode enabled
>>>>> Run criu restore
>>>>> =[log]=> dump/zdtm/static/socket-tcp/31/1/restore.log
>>>>> ------------------------ grep Error ------------------------
>>>>> (00.008036) Error (criu/util.c:707): exited, status=1
>>>>> (00.008050) Error (criu/netfilter.c:91): Iptables configuration failed
>>>>> (00.009993) Error (criu/util.c:707): exited, status=1
>>>>> (00.010007) Error (criu/netfilter.c:91): Iptables configuration failed
>>>>> ------------------------ ERROR OVER ------------------------
>>>>> Send the 15 signal to 31
>>>>> Wait for zdtm/static/socket-tcp(31) to die for 0.100000
>>>>> ############### Test zdtm/static/socket-tcp FAIL at result check ###############
>>>>> Test output: ================================
>>>>> 11:30:41.072: 31: ERR: socket-tcp.c:190: can't write (errno = 104 (Connection reset by peer))
>>>>>
>>>>> <<< ================================
>>>>> ##################################### FAIL #####################################
>>>>>
>>>>> The first problem is that the network unlocking fails for the real restore.
>>>>> The '--check-only' restore already unlocked the network. Which is wrong, but
>>>>> I am not sure what the right solution is. Should I just ignore network
>>>>> unlocking in check-only mode?
>>>>
>>>> I would say yes. Since you haven't done real dump, there's no why you'd
>>>> expect the network to be locked.
>>>>
>>>>> The second problem seems to be that when CRIU restores the process in
>>>>> real restore mode the sockets cannot be restored again and I am not sure
>>>>> why.
>>>>
>>>> Would you show the restore.log file for this case?
>>>
>>> https://lisas.de/~adrian/restore.log
>>
>> But that's the "first problem" :) Inability to turn off the netfilter rule
>> used to lock the connection.
>
> That is the log of the real restore which has both problems. The
> unlocking does not work anymore as it already has been unlocked by the
> check-only restore. The check-only restore works without any errors.
> Only the real restore after the check-only fails.
>
> I am guessing that the message from the test case:
>
> 11:30:41.072: 31: ERR: socket-tcp.c:190: can't write (errno = 104 (Connection reset by peer))
Ah, I see. That's because in --check-only restore you've restored the
socket, then unlocked the connection. Peer noticed this and shifted its
sequences. Then you do the 2nd restore which cannot happen, because the
peer's state has changed.
What you should do on --check-only restore is either ignore the restoration
of TCP sockets (which is not nice) or restore the socket, but kill one
right before unlocking the connection, so that the peer doesn't see a single
packet came from the restored socket.
-- Pavel
> is related to the sockets being restored twice somehow. Or something.
> I do not fully understand why this message happens.
>
>>>>> So I am asking for some help how to correctly handle the network in
>>>>> --check-only mode:
>>>>> * What should be done with the unlocking? Just skip it? Can it be simulated?
>>>>> * What should be done with the sockets? How much can be simulated during
>>>>> socket restore? Should it just be skipped completely?
>
> Adrian
> .
>
More information about the CRIU
mailing list