[CRIU] [PATCH]v2 service: SIGCHLD handler

Fri Nov 1 02:31:58 PDT 2013

On 11/01/2013 03:52 AM, Ruslan Kuprieiev wrote:
> On 31.10.2013 20:06, Pavel Emelyanov wrote:
>> On 10/31/2013 04:44 AM, Ruslan Kuprieiev wrote:
>>> Hi!
>>>
>>> Now we're ignoring SIGCHLD from forked children.
>>> This patch provides SIGCHLD handler, that will wait for child and write some
>>> information about it's exit status to logfile.
>>>
>>> P.S. This is my first signal handler, so please take a good look at this patch.
>>>
>>> v2: fixed few mistakes. Thanks Andrew Vagin <avagin at parallels.com> for help.
>>>
>>> Signed-off-by: Ruslan Kuprieiev <kupruser at gmail.com>
>>> ---
>>>   cr-service.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 75 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/cr-service.c b/cr-service.c
>>> index a79ebc7..a920824 100644
>>> --- a/cr-service.c
>>> +++ b/cr-service.c
>>> @@ -253,6 +253,53 @@ err:
>>>   	return -1;
>>>   }
>>>   
>>> +static void reap_worker(int signo)
>>> +{
>>> +	int saved_errno;
>>> +	int status;
>>> +	pid_t pid;
>>> +
>>> +	saved_errno = errno;
>>> +
>>> +	/*
>>> +	 * Accordingly to signal(7):
>>> +	 *   Multiple instances of real-time signals can be queued.  By
>>> +	 *   contrast, if multiple instances of a standard signal are
>>> +	 *   delivered while that signal is currently blocked, then only one
>>> +	 *   instance is queued.
>> No need in citing man page in code.
>>
>>> +	 * So, as we block SIGCHLD, lets wait for every child that has
>>> +	 * changed state.
>>> +	 */
>>> +	while (1) {
>>> +		pid = waitpid(-1, &status, WNOHANG|WUNTRACED|WCONTINUED);
>> Why just WNOHANG is not enough?
> 
> To provide more information.
> 
>>> +
>>> +		if (pid <= 0) {
>>> +			errno = saved_errno;
>>> +			return;
>>> +		}
>>> +
>>> +		if (WIFEXITED(status))
>>> +			pr_info("Worker(pid %d) exited with %d\n",
>>> +				pid, WEXITSTATUS(status));
>>> +
>>> +		else if (WIFSIGNALED(status)) {
>>> +			pr_info("Worker(pid %d) ", pid);
>>> +#ifdef WCOREDUMP
>>> +			if (WCOREDUMP(status))
>>> +				pr_info("prodused a core dump and ");
>>> +#endif
>>> +			pr_info("was killed by %d\n", WTERMSIG(status));
>>> +
>>> +		} else if (WIFSTOPPED(status))
>>> +			pr_info("Worker(pid %d) was stopped by %d\n",
>>> +				pid, WSTOPSIG(status));
>>> +
>>> +		else if (WIFCONTINUED(status))
>>> +			pr_info("Worker(pid %d) was continued\n", pid);
>> I'm not sure we need all the events. Exited and killed is more than enough.
> 
> I thought more information is better. Especially, when we got all 
> information about children from logfile. I also thought, criu has 
> parasite and stuff and works with kernel, so it wouldn't do any harm, if 
> we'll provide more information. Or would it?

Typically people only need to know about termination.

>>> +
>>> +	}
>>> +}
>>> +
>>>   int cr_service(bool daemon_mode)
>>>   {
>>>   	int server_fd = -1;
>>> @@ -264,6 +311,9 @@ int cr_service(bool daemon_mode)
>>>   	socklen_t server_addr_len;
>>>   	socklen_t client_addr_len;
>>>   
>>> +	struct sigaction new_action, old_action;
>>> +	sigset_t block_mask;
>> No need in dedicated block_mask, you can fill the mask
>> right on sigaction.
>>
>>> +
>>>   	server_fd = socket(AF_LOCAL, SOCK_SEQPACKET, 0);
>>>   	if (server_fd == -1) {
>>>   		pr_perror("Can't initialize service socket.");
>>> @@ -323,8 +373,25 @@ int cr_service(bool daemon_mode)
>>>   		}
>>>   	}
>>>   
>>> -	/* FIXME Do not ignore children's return values */
>>> -	signal(SIGCHLD, SIG_IGN);
>>> +	/* Set SIGCHLD handler */
>>> +	sigemptyset(&block_mask);
>>> +
>>> +	/* FIXME May need to block some other signals too */
>>> +	sigaddset(&block_mask, SIGCHLD);
>>> +	sigaddset(&block_mask, SIGINT);
>>> +	sigaddset(&block_mask, SIGQUIT);
>>> +	sigaddset(&block_mask, SIGHUP);
>>> +	sigaddset(&block_mask, SIGTERM);
>>> +	sigaddset(&block_mask, SIGTSTP);
>> Any explanation to this choice?
> 
> I've opened man and looked for signals, that may wait for handler to 
> finish writing to logfile.

Much more unhandled signals will terminate the app while it's writing
the log file. Let's leave only those necessary for correct child handling.

>>
>>> +
>>> +	new_action.sa_handler = reap_worker;
>>> +	new_action.sa_mask	= block_mask;
>>> +	new_action.sa_flags	= SA_RESTART;
>>> +
>>> +	if (sigaction(SIGCHLD, &new_action, &old_action)) {
>>> +		pr_perror("Can't setup SIGCHLD handler");
>>> +		return -1;
>>> +	}
>>>   
>>>   	while (1) {
>>>   		int sk;
>>> @@ -342,6 +409,12 @@ int cr_service(bool daemon_mode)
>>>   		if (child_pid == 0) {
>>>   			int ret;
>>>   
>>> +			/* Set default SIGCHLD handler back */
>>> +			if (sigaction(SIGCHLD, &old_action, NULL)) {
>>> +				pr_perror("Can't setup default SIGCHLD handler back");
>>> +				exit(1);
>>> +			}
>>> +
>>>   			close(server_fd);
>>>   			ret = cr_service_work(sk);
>>>   			close(sk);
>>>
>>
> 
> .
>