[CRIU] [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

Bhardwaj, Rajneesh rajneesh.bhardwaj at amd.com
Thu Dec 23 04:51:57 MSK 2021


Sorry for the typo in my previous email. Please read Adrian Reber*

On 12/22/2021 8:49 PM, Bhardwaj, Rajneesh wrote:
>
> Adding Adrian Rebel who is the CRIU maintainer and CRIU list
>
> On 12/22/2021 3:53 PM, Daniel Vetter wrote:
>> On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:
>>> On 12/20/2021 4:29 AM, Daniel Vetter wrote:
>>>> On Fri, Dec 10, 2021 at 07:58:50AM +0100, Christian König wrote:
>>>>> Am 09.12.21 um 19:28 schrieb Felix Kuehling:
>>>>>> Am 2021-12-09 um 10:30 a.m. schrieb Christian König:
>>>>>>> That still won't work.
>>>>>>>
>>>>>>> But I think we could do this change for the amdgpu mmap callback only.
>>>>>> If graphics user mode has problems with it, we could even make this
>>>>>> specific to KFD BOs in the amdgpu_gem_object_mmap callback.
>>>>> I think it's fine for the whole amdgpu stack, my concern is more about
>>>>> radeon, nouveau and the ARM stacks which are using this as well.
>>>>>
>>>>> That blew up so nicely the last time we tried to change it and I know of at
>>>>> least one case where radeon was/is used with BOs in a child process.
>>>> I'm way late and burried again, but I think it'd be good to be consistent
>
>
> I had committed this change into our amd-staging-drm-next branch last 
> week after I got the ACK and RB from Felix and Christian.
>
>
>>>> here across drivers. Or at least across drm drivers. And we've had the vma
>>>> open/close refcounting to make fork work since forever.
>>>>
>>>> I think if we do this we should really only do this for mmap() where this
>>>> applies, but reading through the thread here I'm honestly confused why
>>>> this is a problem. If CRIU can't handle forked mmaps it needs to be
>>>> thought that, not hacked around. Or at least I'm not understanding why
>>>> this shouldn't work ...
>>>> -Daniel
>>>>
>>> Hi Daniel
>>>
>>> In the v2
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2Fa1a865f5-ad2c-29c8-cbe4-2635d53eceb6%40amd.com%2FT%2F&data=04%7C01%7Crajneesh.bhardwaj%40amd.com%7Ce4634a16c37149da173408d9c58d1338%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637758031981907821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=h0z4sO19bsJecMqeHGdz%2BHZElKuyzK%2BW%2FMbLWA79I10%3D&reserved=0
>>> I pretty much limited the scope of the change to KFD BOs on mmap. Regarding
>>> CRIU, I think its not a CRIU problem as CRIU on restore, only tries to
>>> recreate all the child processes and then mmaps all the VMAs it sees (as per
>>> checkpoint snapshot) in the new process address space after the VMA
>>> placements are finalized in the position independent code phase. Since the
>>> inherited VMAs don't have access rights the criu mmap fails.
>> Still sounds funky. I think minimally we should have an ack from CRIU
>> developers that this is officially the right way to solve this problem. I
>> really don't want to have random one-off hacks that don't work across the
>> board, for a problem where we (drm subsystem) really shouldn't be the only
>> one with this problem. Where "this problem" means that the mmap space is
>> per file description, and not per underlying inode or real device or
>> whatever. That part sounds like a CRIU problem, and I expect CRIU folks
>> want a consistent solution across the board for this. Hence please grab an
>> ack from them.
>>
>> Cheers, Daniel
>
>
> Maybe Adrian can share his views on this.
>
> Hi Adrian - For the context, on CRIU restore we see mmap failures ( in 
> PIE restore phase) due to permission issues on the (render node) VMAs 
> that were inherited since the application that check pointed had 
> forked.  The VMAs ideally should not be in the child process but the 
> smaps file shows these VMAs in the child address space. We didn't want 
> to use madvise to avoid this copy and rather change in the kernel mode 
> to limit the impact to our user space library thunk. Based on my 
> understanding, during PIE restore phase, after the VMA placements are 
> finalized, CRIU does a sys_mmap on all the VMA it sees in the VmaEntry 
> list and I think its not an issue as per CRIU design but do you think 
> we could handle this corner case better inside CRIU?
>
>
>>> Regards,
>>>
>>> Rajneesh
>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Regards,
>>>>>>      Felix
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 09.12.21 um 16:29 schrieb Bhardwaj, Rajneesh:
>>>>>>>> Sounds good. I will send a v2 with only ttm_bo_mmap_obj change. Thank
>>>>>>>> you!
>>>>>>>>
>>>>>>>> On 12/9/2021 10:27 AM, Christian König wrote:
>>>>>>>>> Hi Rajneesh,
>>>>>>>>>
>>>>>>>>> yes, separating this from the drm_gem_mmap_obj() change is certainly
>>>>>>>>> a good idea.
>>>>>>>>>
>>>>>>>>>> The child cannot access the BOs mapped by the parent anyway with
>>>>>>>>>> access restrictions applied
>>>>>>>>> exactly that is not correct. That behavior is actively used by some
>>>>>>>>> userspace stacks as far as I know.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 09.12.21 um 16:23 schrieb Bhardwaj, Rajneesh:
>>>>>>>>>> Thanks Christian. Would it make it less intrusive if I just use the
>>>>>>>>>> flag for ttm bo mmap and remove the drm_gem_mmap_obj change from
>>>>>>>>>> this patch? For our use case, just the ttm_bo_mmap_obj change
>>>>>>>>>> should suffice and we don't want to put any more work arounds in
>>>>>>>>>> the user space (thunk, in our case).
>>>>>>>>>>
>>>>>>>>>> The child cannot access the BOs mapped by the parent anyway with
>>>>>>>>>> access restrictions applied so I wonder why even inherit the vma?
>>>>>>>>>>
>>>>>>>>>> On 12/9/2021 2:54 AM, Christian König wrote:
>>>>>>>>>>> Am 08.12.21 um 21:53 schrieb Rajneesh Bhardwaj:
>>>>>>>>>>>> When an application having open file access to a node forks, its
>>>>>>>>>>>> shared
>>>>>>>>>>>> mappings also get reflected in the address space of child process
>>>>>>>>>>>> even
>>>>>>>>>>>> though it cannot access them with the object permissions applied.
>>>>>>>>>>>> With the
>>>>>>>>>>>> existing permission checks on the gem objects, it might be
>>>>>>>>>>>> reasonable to
>>>>>>>>>>>> also create the VMAs with VM_DONTCOPY flag so a user space
>>>>>>>>>>>> application
>>>>>>>>>>>> doesn't need to explicitly call the madvise(addr, len,
>>>>>>>>>>>> MADV_DONTFORK)
>>>>>>>>>>>> system call to prevent the pages in the mapped range to appear in
>>>>>>>>>>>> the
>>>>>>>>>>>> address space of the child process. It also prevents the memory
>>>>>>>>>>>> leaks
>>>>>>>>>>>> due to additional reference counts on the mapped BOs in the child
>>>>>>>>>>>> process that prevented freeing the memory in the parent for which
>>>>>>>>>>>> we had
>>>>>>>>>>>> worked around earlier in the user space inside the thunk library.
>>>>>>>>>>>>
>>>>>>>>>>>> Additionally, we faced this issue when using CRIU to checkpoint
>>>>>>>>>>>> restore
>>>>>>>>>>>> an application that had such inherited mappings in the child which
>>>>>>>>>>>> confuse CRIU when it mmaps on restore. Having this flag set for the
>>>>>>>>>>>> render node VMAs helps. VMAs mapped via KFD already take care of
>>>>>>>>>>>> this so
>>>>>>>>>>>> this is needed only for the render nodes.
>>>>>>>>>>> Unfortunately that is most likely a NAK. We already tried
>>>>>>>>>>> something similar.
>>>>>>>>>>>
>>>>>>>>>>> While it is illegal by the OpenGL specification and doesn't work
>>>>>>>>>>> for most userspace stacks, we do have some implementations which
>>>>>>>>>>> call fork() with a GL context open and expect it to work.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Cc: Felix Kuehling<Felix.Kuehling at amd.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: David Yat Sin<david.yatsin at amd.com>
>>>>>>>>>>>> Signed-off-by: Rajneesh Bhardwaj<rajneesh.bhardwaj at amd.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>      drivers/gpu/drm/drm_gem.c       | 3 ++-
>>>>>>>>>>>>      drivers/gpu/drm/ttm/ttm_bo_vm.c | 2 +-
>>>>>>>>>>>>      2 files changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>>>>>>>>>>>> index 09c820045859..d9c4149f36dd 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_gem.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_gem.c
>>>>>>>>>>>> @@ -1058,7 +1058,8 @@ int drm_gem_mmap_obj(struct drm_gem_object
>>>>>>>>>>>> *obj, unsigned long obj_size,
>>>>>>>>>>>>                  goto err_drm_gem_object_put;
>>>>>>>>>>>>              }
>>>>>>>>>>>>      -        vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND |
>>>>>>>>>>>> VM_DONTDUMP;
>>>>>>>>>>>> +        vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND
>>>>>>>>>>>> +                | VM_DONTDUMP | VM_DONTCOPY;
>>>>>>>>>>>>              vma->vm_page_prot =
>>>>>>>>>>>> pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
>>>>>>>>>>>>              vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
>>>>>>>>>>>>          }
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>>>> index 33680c94127c..420a4898fdd2 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>>>> @@ -566,7 +566,7 @@ int ttm_bo_mmap_obj(struct vm_area_struct
>>>>>>>>>>>> *vma, struct ttm_buffer_object *bo)
>>>>>>>>>>>>            vma->vm_private_data = bo;
>>>>>>>>>>>>      -    vma->vm_flags |= VM_PFNMAP;
>>>>>>>>>>>> +    vma->vm_flags |= VM_PFNMAP | VM_DONTCOPY;
>>>>>>>>>>>>          vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>>>>>>>>>>>>          return 0;
>>>>>>>>>>>>      }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20211222/0b1d2196/attachment-0001.html>


More information about the CRIU mailing list