[CRIU] The progress of Time namespace
Eric W. Biederman
ebiederm at xmission.com
Sat Jun 2 06:12:03 MSK 2018
Andrei Vagin <avagin at virtuozzo.com> writes:
> On Fri, Jun 01, 2018 at 01:20:33PM -0500, Eric W. Biederman wrote:
>> Adrian Reber <adrian at lisas.de> writes:
>>
>> > On Fri, Jun 01, 2018 at 11:04:26AM +0800, yukon wrote:
>> >> I found that the criu community intent to resolve the timer issue[1], I
>> >> wonder if there is an issue to
>> >> track the progress?
>> >
>> > I have heard of other people experimenting with it and I also had a few
>> > patches to try it out. The point where I stopped is when I found out
>> > that most time calls are actually coming from the VDSO and not from the
>> > kernel and it is still unclear to me how to handle namespaces and VDSO
>> > correctly.
>> >
>> > I have also talked with Christian (on CC) about it and I also contacted
>> > Eric at some point (also on CC). Maybe they have more information about
>> > the current status.
>>
>> Andriae. My apologies for not getting back to you earlier (I was
>> swamped) but that is not a good excuse. I was very impressed by what
>> you did.
>>
>> For me personally I have been looking for a real world case where the
>> timers matter. Having that would increase the priority of this work
>> from where I stand.
>>
>> To date all I have done is recognize that a time namespace is almost
>> certainly something that we need, and read the code enough to have a
>> general sense of how the time infrastructure in the kernel works.
>>
>> I think the VDSO has per cpu if not per process constants so we should
>> be able to affect this in a namespace. If the VDSO does not we
>> certainly can make that happen.
>>
>> I would be very happy to merge a time namespace. I would probably even
>> start looking at implementation details if I had a compelling test case
>> in my hand.
>>
>> Yukon. I don't have the beginning of this thread. So if you know of a
>> practical case that does not work because of timers I would love to hear
>> about it.
>
> Hi Eric,
>
> We have a practial case. A few CRIU users reported us situations, when
> applications stop working after migrating them to another host.
>
> Usually this means that they use clock_gettime or timer_settime. The
> problem here is that we can't adjust clocks on a destination host to
> their values on a source host. For example, the application uses
> CLOCK_MONOTONIC to measure time slices, but after migrating to another
> host, clock_gettime(CLOCK_MONOTONIC) may retun a value which is smaller
> than what was gotten on the source host. The application doesn't expect
> such behaviour for CLOCK_MONOTONIC, and it probably will work
> incorrectly (stuck, crash, etc).
>
> Here is one quote from the CRIU mailing list:
>
> Is there a timeline on when the time namespace might be implemented? Or
> else is there anyone, even outside CRIU, working on it that you guys
> know of? It seems like this might be one of the last major obstacles
> keeping migration from being used in production systems, given that not
> all containers and connections can be migrated as long as a time
> dependency is capable of messing it up.
> https://github.com/checkpoint-restore/criu/issues/451#issuecomment-386073812
Is there an open source application that is known to fail that way?
I completely believe the issue is real. But it really helps to have
motivating applications so that some corner case is not skipped.
I will have to look at tcp timestamps, and see how those interact
with the kernel's timers. To see if that is a time namespace issue.
Eric
More information about the CRIU
mailing list