[Users] Lots of interrupts since latest el6 kernel

Karl Johnson karljohnson.it at gmail.com
Fri Aug 2 21:54:59 MSK 2019


Hello,

I tried kernel 139.1, same issue, 2 ksoftirqd process are taking 200% cpu
forever and load is steady 2.00. I guess this server is stuck with kernel
133.2.

[root at x ~]# uname -r
2.6.32-042stab139.1
[root at x ~]# cat /proc/loadavg
2.08 2.21 2.33 3/981 29612

21 root      20   0     0    0    0 R 100.0  0.0  25:50.13 ksoftirqd/4

25 root      20   0     0    0    0 R 100.0  0.0  25:50.42 ksoftirqd/5

Karl

On Sun, Jun 2, 2019 at 4:17 PM Karl Johnson <karljohnson.it at gmail.com>
wrote:

> Quick follow-up, I tried kernel 042stab137.1 with and without nohz=off,
> same issue, 3 cores taking 100% each and load at 3.00+:
>
> [root at core1 ~]# ps auxf|grep ksoftirqd|grep 99
> root           9 99.1  0.0      0     0 ?        R    14:54  13:10  \_
> [ksoftirqd/1]
> root          17 99.1  0.0      0     0 ?        R    14:54  13:10  \_
> [ksoftirqd/3]
> root          33 99.1  0.0      0     0 ?        R    14:54  13:10  \_
> [ksoftirqd/7]
>
> [root at core1 ~]# cat /proc/loadavg
> 3.22 3.61 2.83 4/975 20478
>
> I've downgraded to 2.6.32-042stab133.2 and everything is fine, load at
> 0.00, no CPU usage. There's something wrong between kernel 133.2 and 137.1.
> I haven't tested them all.
>
> Karl
>
> On Fri, May 31, 2019 at 1:55 AM Vasily Averin <vvs at virtuozzo.com> wrote:
>
>> On 5/30/19 10:39 PM, Karl Johnson wrote:
>> > Hello,
>> >
>> > It's always related to swapper and ksoftirqd:
>> "swapper" is idle thread, it is called if CPU does not have any active
>> tasks
>> it would be interesting to look at state of "ksoftirqd" processes several
>> times, to see any changes.
>>
>> In provided example I see that this process was captured during
>> processing of top-level function handles soft interrupts:
>> do_softirq()-> call_softirq(). Usually these function handles network
>> packets and I expected your example will contain more deep calltraces.
>> Probably this happen next time.
>>
>> Anyway, these calltraces shows that CPUs are NOT 100% busy by processing
>> of timer interrupts,
>> so in general the situation looks like expected: in current theory
>> ksoftirq processes handles network traffic.
>>
>> Thank you,
>>         Vasily Averin
>>
>> > Some examples here: https://pastebin.com/wn0nCwce
>> >
>> > Karl
>> >
>> > On Thu, May 30, 2019 at 3:11 PM Vasily Averin <vvs at virtuozzo.com
>> <mailto:vvs at virtuozzo.com>> wrote:
>> >
>> >     Dear Karl,
>> >     thank you for reporting the problem.
>> >
>> >     no, it is not known issue.
>> >     moreover, I doubt it is related to real hardware interrupts,
>> >     soft-interrupts handles delayed procedures like processing of
>> network packets.
>> >
>> >     For troubleshooting is to look at stack of affected running
>> processes via /proc/<pid>/stack
>> >     alternatively you can use magic sysrq key
>> >     # echo l > /proc/sysrq-trigger
>> >     it should dump current state of all running processors.
>> >     you can do it few times to monitor state of affected processes.
>> >
>> >     Thank you,
>> >             Vasily Averin
>> >
>> >
>> >     On 5/30/19 7:54 PM, Karl Johnson wrote:
>> >     > Hello,
>> >     >
>> >     > I've upgraded from 2.6.32-042stab133.2 to 2.6.32-042stab138.1 and
>> since boot, 2 cores are using 100% cpu on ksoftirqd:
>> >     >
>> >     > root          21 99.9  0.0      0     0 ?        R    May29
>> 1178:07  \_ [ksoftirqd/4]
>> >     > root          25 99.9  0.0      0     0 ?        R    May29
>> 1177:51  \_ [ksoftirqd/5]
>> >     >
>> >     > From /proc/interrupts I can see that it's caused by
>> IR-IO-APIC-edge      timer:
>> >     >
>> >     >            CPU0       CPU1       CPU2       CPU3       CPU4
>> CPU5       CPU6       CPU7
>> >     >   0:     136922     103603      26928      27528  112318229
>> 71888343      73755     285735  IR-IO-APIC-edge      timer
>> >     >
>> >     > kernel /vmlinuz-2.6.32-042stab138.1 ro
>> root=UUID=7367aa0f-8216-44ca-9cc4-affed22bbd9c rd_NO_LUKS rd_NO_LVM
>> LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto
>>  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM nohz=off nopti
>> >     >
>> >     > Any way to troubleshoot this? Is it a known issue?
>> >     >
>> >     > Karl
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > Users mailing list
>> >     > Users at openvz.org <mailto:Users at openvz.org>
>> >     > https://lists.openvz.org/mailman/listinfo/users
>> >     >
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20190802/0853b21b/attachment.html>


More information about the Users mailing list