[CRIU] Another question / roadblock
Eliot Moss
moss at cs.umass.edu
Tue Oct 15 21:02:09 MSK 2019
On 10/15/2019 2:10 AM, Andrei Vagin wrote:
> On Mon, Oct 14, 2019 at 09:49:31PM -0400, Eliot Moss wrote:
>>
>> Now that I have figured out how to adjust file lengths before invoking
>> restore, I have another "interesting" issue.
>>
>> My jobs have one part that is some layers of shell script that bottoms out
>> with an invocation of valgrind, which produces output to a named pipe (fifo).
>> Then they have another part that reads from the named pipe, sends the output to
>> about 8 analysis programs, compresses their output, etc.
>>
>> This second part is created, and then disowned with the shell disown command.
>>
>> Applying dump to the first part does not capture the second part. So my
>> question is, how do I capture both parts?
>>
>> (Explanation: I did things this way so that the analysis jobs don't die
>> when the valgrind jobs finishes, but finish reading from the fifo and
>> processing the buffered data.)
>
> I think you need to run your processes in a new pid namespace.
> http://man7.org/linux/man-pages/man7/pid_namespaces.7.html
>
> The easiest way to run a process in a new pid namespace is to use
> the unshare tool:
>
> sudo unshare -pf sh -c 'echo "My pid is $$"'
I now reach this point:
sudo criu dump --tree 85697 --images-dir dump999/1/ --leave-running --track-mem --shell-job
Warn (criu/image.c:134): Failed to open parent directory
pie: 1: Error (criu/pie/parasite.c:429): can't dump unpriviliged task whose /proc doesn't belong
to it
pie: 1: Error (criu/pie/parasite.c:445): Can't get /proc fd
pie: 1: Close the control socket for writing
Error (criu/parasite-syscall.c:428): Can't retrieve FD from socket
Error (compel/src/lib/infect-rpc.c:46): Message reply from daemon is trimmed (12/0)
Error (criu/cr-dump.c:1291): Can't get proc fd (pid: 85697)
Error (criu/cr-dump.c:1742): Dumping FAILED.
This suggests to me that I need to use --mount-proc with unshare.
What are your thoughts?
Also, it is not wonderful that I seem to have to do all this as root. I can
do so on my own cluster, but not on shared ones owned by others. Any way to
deal with that?
Regards - Eliot
More information about the CRIU
mailing list