[CRIU] The progress of Time namespace

Eric W. Biederman ebiederm at xmission.com
Sat Jun 2 06:12:03 MSK 2018


Andrei Vagin <avagin at virtuozzo.com> writes:

> On Fri, Jun 01, 2018 at 01:20:33PM -0500, Eric W. Biederman wrote:
>> Adrian Reber <adrian at lisas.de> writes:
>> 
>> > On Fri, Jun 01, 2018 at 11:04:26AM +0800, yukon wrote:
>> >> I found that the criu community intent to resolve the timer issue[1], I
>> >> wonder if there is an issue to
>> >> track the progress?
>> >
>> > I have heard of other people experimenting with it and I also had a few
>> > patches to try it out. The point where I stopped is when I found out
>> > that most time calls are actually coming from the VDSO and not from the
>> > kernel and it is still unclear to me how to handle namespaces and VDSO
>> > correctly.
>> >
>> > I have also talked with Christian (on CC) about it and I also contacted
>> > Eric at some point (also on CC). Maybe they have more information about
>> > the current status.
>> 
>> Andriae.  My apologies for not getting back to you earlier (I was
>> swamped) but that is not a good excuse.  I was very impressed by what
>> you did.
>> 
>> For me personally I have been looking for a real world case where the
>> timers matter.  Having that would increase the priority of this work
>> from where I stand.
>> 
>> To date all I have done is recognize that a time namespace is almost
>> certainly something that we need, and read the code enough to have a
>> general sense of how the time infrastructure in the kernel works.
>> 
>> I think the VDSO has per cpu if not per process constants so we should
>> be able to affect this in a namespace.  If the VDSO does not we
>> certainly can make that happen.
>> 
>> I would be very happy to merge a time namespace.   I would probably even
>> start looking at implementation details if I had a compelling test case
>> in my hand.
>> 
>> Yukon.  I don't have the beginning of this thread.  So if you know of a
>> practical case that does not work because of timers I would love to hear
>> about it.
>
> Hi Eric,
>
> We have a practial case. A few CRIU users reported us situations, when
> applications stop working after migrating them to another host.
>
> Usually this means that they use clock_gettime or timer_settime. The
> problem here is that we can't adjust clocks on a destination host to
> their values on a source host. For example, the application uses
> CLOCK_MONOTONIC to measure time slices, but after migrating to another
> host, clock_gettime(CLOCK_MONOTONIC) may retun a value which is smaller
> than what was gotten on the source host. The application doesn't expect
> such behaviour for CLOCK_MONOTONIC, and it probably will work
> incorrectly (stuck, crash, etc).
>
> Here is one quote from the CRIU mailing list:
>
>   Is there a timeline on when the time namespace might be implemented? Or
>   else is there anyone, even outside CRIU, working on it that you guys
>   know of? It seems like this might be one of the last major obstacles
>   keeping migration from being used in production systems, given that not
>   all containers and connections can be migrated as long as a time
>   dependency is capable of messing it up.
>   https://github.com/checkpoint-restore/criu/issues/451#issuecomment-386073812


Is there an open source application that is known to fail that way?

I completely believe the issue is real.  But it really helps to have
motivating applications so that some corner case is not skipped.

I will have to look at tcp timestamps, and see how those interact
with the kernel's timers.  To see if that is a time namespace issue.

Eric


More information about the CRIU mailing list