[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs

Pavel Emelyanov xemul at virtuozzo.com
Thu Jul 13 15:26:41 MSK 2017


>>> When you don't send master fle to another tasks, these tasks may reach the stage,
>>
>> Come on, this is pure race. Whether or not the peer task would go sleeping because
>> we've skipped unix open this time is the question of hitting or missing the time
>> frame that it takes to call unix_open by pointer and checking that the return code
>> is 1. How many CPU ticks (not even sched ticks) is that?!
> 
> This can't be accounted, because of this depends on configuration and relationships
> between tasks. I just say, that when you make available a resource as soon as it's ready,
> you reduces the lengths of chains of dependences between tasks existing at a moment.

Yes.

> When it's not made so, the chains of dependencies existing at a moment become longer.
> And this is not a question of ticks and relationships between two separate tasks.
> An only delay may lead to many sleeps of many tasks, which may be avoided by the
> rule to make resource available right after it becomes ready.
> 
> My point is just this.

Got it. Now get my point -- shuffling the code further to make this happen will
cost us time and (!) readability. Current files. and satellites is complex enough,
splitting unix_open into more stages won't make it simpler. The outcome from this
may be noticeable in extremely rare cases (we've had no bugs for inability to
C/R due to SCMs in queue). IOW -- the result doesn't worth the pain.

>>> when the fle is the only thing they are need at the current moment of restore. And
>>> they will just wait you.
>>>
>>> Also, you will need to serve master fle out as soon as possible, when you
>>> will implement scm with unix fds.
>>
>> When we'll implement scm with unix fds just serving master out ASAP won't help, we'll
>> need to reshuffle much more code to resolve circular and mutual scms.
>>
>>>>>>> Now all dependences
>>>>>>> are as strong as they are need, and not more. It seems to be not good
>>>>>>> to move away from this path.
>>>>>>>  
>>>>>>>>>>> We may open the socket and to serve it out, and we do
>>>>>>>>>>> not need wait queuers for these actions.
>>>>>>>>>>
>>>>>>>>>> Not always. There are several cases when we do socketpair() then restore
>>>>>>>>>> a queue then close one of the ends immediately.
>>>>>>>>>
>>>>>>>>> Even if the second end is closed, anither tasks may want this
>>>>>>>>> slave fle, and their eventpolls may depend on it.
>>>>>>>>> I think, we should wait only in the moments, when we must wait, and do
>>>>>>>>> not do that in another cases.
>>>>>>>>
>>>>>>>> This is valid, but must is to be replaced with should :)
>>>>>>>> Is anything broken with the current approach? Or just suboptimal?
>>>>>>>
>>>>>>> Yes, this is suboptiomal, and it's really easy to avoid that, I think.
>>>>>>
>>>>>> Actually it's not. As I said earlier, there are cases when you cannot restore
>>>>>> a socket without a queue and restore the queue later, so for these cases you
>>>>>> need to skip the whole restoration. For the cases when you can restore a socket
>>>>>> and postpone the queue restore, you may do the first step and then return 1
>>>>>> requesting for yet another ->open call. With this only the unix ->open logic
>>>>>> becomes noticeably more complex.
>>>>>
>>>>> Could you give an example of a case, when a socket needs to wait fill another sockets
>>>>> are restored, before it can call socket() syscall?
>>>>
>>>> With SCMs -- easily. open_unix_sk_standalone(), branch (ui->ue->state == TCP_ESTABLISHED) && !ui->ue->peer
>>>> or ui->ue->type == SOCK_DGRAM && !ui->queuer. Both call socketpair(), restore queue at once, then close
>>>> one of the ends. If I have SCM_RIGHTs message in a queue I need to wait for e.g. inet socket to be opened
>>>> before I can call socketpair().
>>>
>>> Then only this type of sockets should wait till scm dependences are restored
>>> before is creates a socket pair.
>>
>> Yes, but what's the point? This won't help us resolving generic unix sent via unix case anyway. 
>> The change you're asking for is doubling the checks for whether or not to call socket() or
>> socketpair() for the sake of spectral parallelism. I can agree with this change, but not as a
>> part of this set, it's large enough already.
> 
> Generic unix sockets sent via unix case solution will continue this way, I assume.
> There will be sequence:
> 1)open
> 2)serve out

Doesn't work for the cases I've mentioned above. You need to do step 4 before this point
in current files.c scheme. Or fix it to allow for keeping socketpair's peers temporarily.

> 3)wait till all scm are ready (if scm is master of our task, wait till it's FLE_OPEN, otherwise till it's received, i.e. FLE_FINISHED).
>   also wait till owners of msg_names are bound (unix_sk_info::bound). these sockets also should be received as fake fles if we need.
> 4)restore your own queue (Using ready scms or sending via msg_names owners)

Not your own, but your peer's queue. If we could sendmsg() data into our queue things
would become much simpler.

> There is no a circular dependency.

You can send socket A via socket B, then socket B via socket A. That's the circular
dependency I'm talking about. It can be untied, but not with the existing code, it
need to account for temporary socketpairs.

> Unix socket queues without such particulars will be restored by their peers.
> But if a socket has a scm or msg_names it's peer field will forced zeroed and
> it will restore its queue by itself.
>  
>>> And the above is only till the moment unix fds in scm are implemented. When
>>> they are, this type of socket also will fit generic model, but we will need
>>> more difficult solution for the second closed end. I think, we will need to
>>> open socketpair, serve first end out and then to pin second end open as fake
>>> file.
>>>
>>>>>>> It would be difficult to rework that later, if we found the limitations
>>>>>>> as too strong that they don't allow to restore some type of files.
>>>>>>
>>>>>> While I do agree with this in general, I see no points in splitting the unix sk
>>>>>> restore into open, postpone, restore the queue (optionally), postpone again, do
>>>>>> post_open. This doesn't seem to solve any real problem, makes something (what?)
>>>>>> better for a really rare case and complicates the ->open logic for unix sockets.
>>>>>
>>>>> It's not need to introduce multy-cases. There is a generic rule:
>>>>> queues with scms or msg names are restored by the socket itself, and the socket
>>>>> uses task's (maybe ghost) fles to do that.
>>>>
>>>> No queues are not restored by the socket itself, they are restored by the socket queuer, that
>>>> is some other socket

-- Pavel



More information about the CRIU mailing list