[CRIU] Another question / roadblock

Eliot Moss moss at cs.umass.edu
Sat Oct 19 18:51:16 MSK 2019


On 10/15/2019 2:02 PM, Eliot Moss wrote:
> On 10/15/2019 2:10 AM, Andrei Vagin wrote:
>> On Mon, Oct 14, 2019 at 09:49:31PM -0400, Eliot Moss wrote:
>>>
>>> Now that I have figured out how to adjust file lengths before invoking
>>> restore, I have another "interesting" issue.
>>>
>>> My jobs have one part that is some layers of shell script that bottoms out
>>> with an invocation of valgrind, which produces output to a named pipe (fifo).
>>> Then they have another part that reads from the named pipe, sends the output to
>>> about 8 analysis programs, compresses their output, etc.
>>>
>>> This second part is created, and then disowned with the shell disown command.
>>>
>>> Applying dump to the first part does not capture the second part.  So my
>>> question is, how do I capture both parts?
>>>
>>> (Explanation: I did things this way so that the analysis jobs don't die
>>> when the valgrind jobs finishes, but finish reading from the fifo and
>>> processing the buffered data.)
>>
>> I think you need to run your processes in a new pid namespace.
>> http://man7.org/linux/man-pages/man7/pid_namespaces.7.html
>>
>> The easiest way to run a process in a new pid namespace is to use
>> the unshare tool:
>>
>> sudo unshare -pf sh -c 'echo "My pid is $$"'
> 
> I now reach this point:
> 
>    sudo criu dump --tree 85697 --images-dir dump999/1/ --leave-running --track-mem --shell-job
>    Warn  (criu/image.c:134): Failed to open parent directory
>    pie: 1: Error (criu/pie/parasite.c:429): can't dump unpriviliged task whose /proc doesn't belong 
> to it
>    pie: 1: Error (criu/pie/parasite.c:445): Can't get /proc fd
>    pie: 1: Close the control socket for writing
>    Error (criu/parasite-syscall.c:428): Can't retrieve FD from socket
>    Error (compel/src/lib/infect-rpc.c:46): Message reply from daemon is trimmed (12/0)
>    Error (criu/cr-dump.c:1291): Can't get proc fd (pid: 85697)
>    Error (criu/cr-dump.c:1742): Dumping FAILED.
> 
> This suggests to me that I need to use --mount-proc with unshare.
> What are your thoughts?
> 
> Also, it is not wonderful that I seem to have to do all this as root.  I can
> do so on my own cluster, but not on shared ones owned by others.  Any way to
> deal with that?

Just trying again, since there has been that flurry of activity around patches
and this query may have been overlooked :-) ...    Regards - Eliot


More information about the CRIU mailing list