[CRIU] [RFC] Fuse "stub-proxy" file system and how it can help to migrate network file system
Pavel Emelyanov
xemul at parallels.com
Thu Dec 17 05:01:37 PST 2015
On 12/17/2015 03:53 PM, Stanislav Kinsburskiy wrote:
>
>
> 17.12.2015 13:44, Pavel Emelyanov пишет:
>> On 12/12/2015 06:04 PM, Stanislav Kinsburskiy wrote:
>>> Hello,
>>>
>>> There is a class of objects, which CRIU can't migrate. They are network
>>> file systems and all the related objects like opened and mapped files.
>>> The major problem with migration of a network file system is that it can
>>> be unreachable when restore is happening, while all opened and mapped
>>> files have to be restored somehow.
>>>
>>> Another file system can become a solution for this problem.
>>> Below are main characteristics of it:
>>> 1) Fuse-based filesystem to make it generic for any kernel.
>>> 2) Some methods: open() and mmap() (probably getattr() as well) return
>>> some dummy handle. This is required to be able to restore and remap
>>> files on restore.
>>> 3) Other methods are stubs: any process, called such a method will be
>>> put to sleep, waiting for original network file system is remounted.
>>> 4) Once it's remounted, this files system starts working as proxy,
>>> bypassing requests to desired file system.
>>>
>>> This approach allows to restore a container with any network file system
>>> inside by mounting fuse fs instead of unreachable network file system..
>>> But this proxy mode is slow.
>>>
>>> Fortunately, in can be optimized.
>>> CRIU can wait till real file system is remounted, and after that seize
>>> all the processes, using fuse, reopen (or map) a real file and replace
>>> fuse's files.
>>> This will allow to restore native state of the processes and proxy file
>>> system can be shut down.
>>>
>>> But still one more problem is left. Real file system have to be mounted
>>> to the same dentry, where fuse fs is mounted. There is no atomic way of
>>> replacing one file system with another. Unmounting fuse first and
>>> mounting real fs afterwards gives a race window, which is unacceptable.
>>> Thus only mounting of the real file system on top of fuse in suitable.
>>> But in this case fuse can be unmounted. Thus, alien mount point appears
>>> after restore.
>>>
>>> This problem can be solved by freezing all the process to eliminate the
>>> race window mentioned above by the following algorithm:
>>> 1) Mounting real file system to some other dentry first.
>>> 2) Freezing all processes.
>>> 3) Lazy unmounting fuse.
>>> 4) Moving network file system to the desired dentry
>>> 5) Thawing all processes.
>>>
>>> With this solution fuse mount point will disappear once all file
>>> references are closed.
>> OK, this might work, let's try. The freezeing fs might make its usage
>> outside of criu, so I'd make all this work within the recently proposed
>> criu-2.0 effort and put it not inside criu tool code, but next to it,
>> like the compel lib and tool.
>
> Agreed, that this shouldn't be a part of criu itself.
> I was thinking more about this "freezing fs" concept and would like to
> share my thoughts.
> This "stub-proxy" approach with an ability to change processes
> descriptors and mapping can become a valuable thing by itself, as you
> mentioned.
> Say, for switching task and group of tasks from one fs to another,
> making thus a kind of "mirror fs" (similar to mirror drive, etc). Might
> be usefull for backups or something like this.
> Or silent "freezing" of group of processes till fs is checked, fixed,
> updated, remounted from another source.
> Maybe you also have something in mind, how such approach can be use?
I recall only one more thing -- in kernel people wanted to shrink the
dentry and inode tree to relocate the objects and for doing this they
needed some way to get tasks hands off the corresponding files.
> Would be nice to collect some of them, because it will help to shape the
> project in a proper way from the beginning.
We'll start with live migration and disk maintenance mode use cases and
see how it goes.
-- Pavel
More information about the CRIU
mailing list