[CRIU] Another question / roadblock

Eliot Moss moss at cs.umass.edu
Tue Oct 15 21:02:09 MSK 2019


On 10/15/2019 2:10 AM, Andrei Vagin wrote:
> On Mon, Oct 14, 2019 at 09:49:31PM -0400, Eliot Moss wrote:
>>
>> Now that I have figured out how to adjust file lengths before invoking
>> restore, I have another "interesting" issue.
>>
>> My jobs have one part that is some layers of shell script that bottoms out
>> with an invocation of valgrind, which produces output to a named pipe (fifo).
>> Then they have another part that reads from the named pipe, sends the output to
>> about 8 analysis programs, compresses their output, etc.
>>
>> This second part is created, and then disowned with the shell disown command.
>>
>> Applying dump to the first part does not capture the second part.  So my
>> question is, how do I capture both parts?
>>
>> (Explanation: I did things this way so that the analysis jobs don't die
>> when the valgrind jobs finishes, but finish reading from the fifo and
>> processing the buffered data.)
> 
> I think you need to run your processes in a new pid namespace.
> http://man7.org/linux/man-pages/man7/pid_namespaces.7.html
> 
> The easiest way to run a process in a new pid namespace is to use
> the unshare tool:
> 
> sudo unshare -pf sh -c 'echo "My pid is $$"'

I now reach this point:

   sudo criu dump --tree 85697 --images-dir dump999/1/ --leave-running --track-mem --shell-job
   Warn  (criu/image.c:134): Failed to open parent directory
   pie: 1: Error (criu/pie/parasite.c:429): can't dump unpriviliged task whose /proc doesn't belong 
to it
   pie: 1: Error (criu/pie/parasite.c:445): Can't get /proc fd
   pie: 1: Close the control socket for writing
   Error (criu/parasite-syscall.c:428): Can't retrieve FD from socket
   Error (compel/src/lib/infect-rpc.c:46): Message reply from daemon is trimmed (12/0)
   Error (criu/cr-dump.c:1291): Can't get proc fd (pid: 85697)
   Error (criu/cr-dump.c:1742): Dumping FAILED.

This suggests to me that I need to use --mount-proc with unshare.
What are your thoughts?

Also, it is not wonderful that I seem to have to do all this as root.  I can
do so on my own cluster, but not on shared ones owned by others.  Any way to
deal with that?

Regards - Eliot


More information about the CRIU mailing list