[Devel] [PATCH RH7 2/2] fence-watchdog: panic after 30 sec of reboot/halt
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Wed Jul 19 17:08:27 MSK 2017
We schedule to system_wq, which has max_active=256, from
Documentation/workqueue.txt:
@max_active determines the maximum number of execution contexts per
CPU which can be assigned to the work items of a wq. For example,
with @max_active of 16, at most 16 work items of the wq can be
executing at the same time per CPU.
So for each workqueue we have a pool of processes handling scheduled
works on it, and sleeping in one process waiting for fsync will still
allow other works to run.
==
Second thing - I will try use kernel_write instead of just
file->f_op->write and resend.
On 07/19/2017 12:03 PM, Pavel Tikhomirov wrote:
> As we do reboot and halt actions in scope of scheduled worker
> it can never happen if scheduling does not work properly, so
> panic in case that previous action was not successful.
>
> https://jira.sw.ru/browse/PSBM-54747
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
> kernel/fence-watchdog.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/fence-watchdog.c b/kernel/fence-watchdog.c
> index 607045a..3ee5b89 100644
> --- a/kernel/fence-watchdog.c
> +++ b/kernel/fence-watchdog.c
> @@ -42,7 +42,14 @@ const char *action_names[] = {"crash", "reboot", "halt", "netfilter", NULL};
>
> DEFINE_VVAR(volatile unsigned long, fence_wdog_jiffies64) = MAX_U64;
> static int fence_wdog_action = FENCE_WDOG_CRASH;
> -static atomic_t not_fenced = ATOMIC_INIT(-1);
> +
> +enum {
> + NOT_FENCED = 0,
> + FENCED = 1,
> + FENCED_TIMEOUT = 2,
> +};
> +
> +static atomic_t fence_stage = ATOMIC_INIT(NOT_FENCED);
> static char fence_wdog_log_path[PATH_MAX] = "/fence_wdog.log";
>
> #define MSG_LEN 32
> @@ -114,19 +121,26 @@ static DECLARE_WORK(halt_or_reboot_work, do_halt_or_reboot);
>
> void fence_wdog_do_fence(void)
> {
> - if (fence_wdog_action == FENCE_WDOG_CRASH)
> + if (fence_wdog_action == FENCE_WDOG_CRASH ||
> + atomic_read(&fence_stage) == FENCED_TIMEOUT)
> panic("fence-watchdog: %s\n",
> action_names[fence_wdog_action]);
> else
> schedule_work(&halt_or_reboot_work);
> }
>
> +#define FENCE_WDOG_TIMEOUT 30
> +
> inline int fence_wdog_check_timer(void)
> {
> if (unlikely(get_jiffies_64() > fence_wdog_jiffies64 &&
> fence_wdog_action != FENCE_WDOG_NETFILTER)) {
> - if (atomic_inc_not_zero(¬_fenced))
> + if (atomic_cmpxchg(&fence_stage, NOT_FENCED, FENCED) == NOT_FENCED
> + || (get_jiffies_64() > fence_wdog_jiffies64
> + + FENCE_WDOG_TIMEOUT * HZ
> + && atomic_cmpxchg(&fence_stage, FENCED, FENCED_TIMEOUT) == FENCED))
> fence_wdog_do_fence();
> +
> return 1;
> }
>
>
--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
More information about the Devel
mailing list