[Devel] [PATCH RH7 1/2] ixgbe: Refactor busy poll socket code to address multiple issues
Andrew Vagin
avagin at virtuozzo.com
Thu Dec 10 22:45:31 PST 2015
Acked-by: Andrew Vagin <avagin at virtuozzo.com>
On Fri, Dec 04, 2015 at 05:58:20PM +0300, Pavel Tikhomirov wrote:
> In netpoll_send_skb_on_dev we do not trigger warn so irqs are disabled.
> And in ixgbe_poll->ixgbe_qv_unlock_napi with irqs disabled we use
> spin_unlock_bh, that is bug as inside it has local_irq_enable, witch
> will enable hardware interrupts.
>
> So port patch witch changes locking here to atomic_cmpxchg.
>
> https://jira.sw.ru/browse/PSBM-40330
>
> WARNING:
> Dec 1 18:03:10 p9 kernel: netpoll: netconsole: local port 6666
> Dec 1 18:03:10 p9 kernel: netpoll: netconsole: local IPv4 address 10.94.2.172
> Dec 1 18:03:10 p9 kernel: netpoll: netconsole: interface 'br0'
> Dec 1 18:03:10 p9 kernel: netpoll: netconsole: remote port 6666
> Dec 1 18:03:10 p9 kernel: netpoll: netconsole: remote IPv4 address 10.29.0.70
> Dec 1 18:03:10 p9 kernel: netpoll: netconsole: remote ethernet address 7c:95:f3:2e:05:64
> Dec 1 18:03:10 p9 kernel: ------------[ cut here ]------------
> Dec 1 18:03:10 p9 kernel: WARNING: at kernel/softirq.c:162 local_bh_enable_ip+0x7a/0xa0()
> Dec 1 18:03:10 p9 kernel: Modules linked in: netconsole(+) xt_CHECKSUM iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables 8021q garp mrp intel_powerclamp coretemp kvm_intel kvm snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul ioatdma glue_helper soundcore ablk_helper cryptd mei_me pcspkr ses sb_edac enclosure edac_core mei i2c_i801 ipmi_si shpchp lpc_ich mfd_core wmi ipmi_msghandler dm_mod ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common mgag200 isci syscopyarea sysfillrect sysimgblt drm_kms_helper ttm libsas ixgbe drm igb scsi_transport_sas ahci libahci megaraid_sas libata mdio
> Dec 1 18:03:10 p9 kernel: ptp pps_core dca i2c_algo_bit i2c_core ip6_vzprivnet ip6_vznetstat pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_vznetstat ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge stp llc
> Dec 1 18:03:10 p9 kernel: CPU: 31 PID: 3649 Comm: modprobe ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.14-00004-gcc670bd-dirty #2 9.14
> Dec 1 18:03:10 p9 kernel: Hardware name: DEPO Computers X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> Dec 1 18:03:10 p9 kernel: 00000000000000a2 000000001e836364 ffff883fd0aeb908 ffffffff816222a3
> Dec 1 18:03:10 p9 kernel: ffff883fd0aeb940 ffffffff8107025b ffff883fc90d2040 ffff883fc90d2368
> Dec 1 18:03:10 p9 kernel: 0000000000000005 0000000000000001 00000000fffffe05 ffff883fd0aeb950
> Dec 1 18:03:10 p9 kernel: Call Trace:
> Dec 1 18:03:10 p9 kernel: [<ffffffff816222a3>] dump_stack+0x19/0x1b
> Dec 1 18:03:10 p9 kernel: [<ffffffff8107025b>] warn_slowpath_common+0x6b/0xa0
> Dec 1 18:03:10 p9 kernel: [<ffffffff8107039a>] warn_slowpath_null+0x1a/0x20
> Dec 1 18:03:10 p9 kernel: [<ffffffff81079bea>] local_bh_enable_ip+0x7a/0xa0
> Dec 1 18:03:10 p9 kernel: [<ffffffff816285cb>] _raw_spin_unlock_bh+0x1b/0x70
> Dec 1 18:03:10 p9 kernel: [<ffffffffa02e123d>] ixgbe_poll+0x54d/0x980 [ixgbe]
> Dec 1 18:03:10 p9 kernel: [<ffffffff8153d1b6>] netpoll_poll_dev+0x1b6/0xa80
> Dec 1 18:03:10 p9 kernel: [<ffffffffa02dcf4b>] ? ixgbe_select_queue+0xbb/0x120 [ixgbe]
> Dec 1 18:03:10 p9 kernel: [<ffffffff8153ddc0>] netpoll_send_skb_on_dev+0x340/0x480
> Dec 1 18:03:10 p9 kernel: [<ffffffff8119cf78>] ? __list_lru_walk_one.isra.3+0x148/0x170
> Dec 1 18:03:10 p9 kernel: [<ffffffffa00129d6>] __br_deliver+0x106/0x130 [bridge]
> Dec 1 18:03:10 p9 kernel: [<ffffffffa0012e83>] br_deliver+0x63/0x70 [bridge]
> Dec 1 18:03:10 p9 kernel: [<ffffffffa0010250>] br_dev_xmit+0x240/0x2a0 [bridge]
> Dec 1 18:03:10 p9 kernel: [<ffffffff8153dcf6>] netpoll_send_skb_on_dev+0x276/0x480
> Dec 1 18:03:10 p9 kernel: [<ffffffff8153e22b>] netpoll_send_udp+0x27b/0x390
> Dec 1 18:03:10 p9 kernel: [<ffffffffa05d383f>] write_msg+0xcf/0x140 [netconsole]
> Dec 1 18:03:10 p9 kernel: [<ffffffff81070f6f>] call_console_drivers.constprop.21+0x9f/0xf0
> Dec 1 18:03:10 p9 kernel: [<ffffffff810718b3>] console_unlock+0x293/0x410
> Dec 1 18:03:10 p9 kernel: [<ffffffff810726de>] register_console+0x14e/0x3e0
> Dec 1 18:03:10 p9 kernel: [<ffffffffa05d81a0>] init_netconsole+0x1a0/0x1000 [netconsole]
> Dec 1 18:03:10 p9 kernel: [<ffffffffa05d8000>] ? 0xffffffffa05d7fff
> Dec 1 18:03:10 p9 kernel: [<ffffffff810020e8>] do_one_initcall+0xb8/0x230
> Dec 1 18:03:10 p9 kernel: [<ffffffff810f3996>] load_module+0x1656/0x1cc0
> Dec 1 18:03:10 p9 kernel: [<ffffffff8130c7a0>] ? ddebug_proc_write+0xf0/0xf0
> Dec 1 18:03:10 p9 kernel: [<ffffffff810ef6aa>] ? copy_module_from_fd.isra.42+0x5a/0x150
> Dec 1 18:03:10 p9 kernel: [<ffffffff810f41de>] SyS_finit_module+0xbe/0xf0
> Dec 1 18:03:10 p9 kernel: [<ffffffff816318c9>] system_call_fastpath+0x16/0x1b
> Dec 1 18:03:10 p9 kernel: ---[ end trace 44dd3910a6b1ff80 ]---
> Dec 1 18:03:10 p9 kernel: Tainting kernel with flag 0x9
>
> Port ms commit:
> commit adc810900a703ee78fe88fd65e086d359fec04b2
> Author: Alexander Duyck <alexander.h.duyck at intel.com>
> Date: Sat Jul 26 02:42:44 2014 +0000
>
> This change addresses several issues in the current ixgbe implementation of
> busy poll sockets.
>
> First was the fact that it was possible for frames to be delivered out of
> order if they were held in GRO. This is addressed by flushing the GRO buffers
> before releasing the q_vector back to the idle state.
>
> The other issue was the fact that we were having to take a spinlock on
> changing the state to and from idle. To resolve this I have replaced the
> state value with an atomic and use atomic_cmpxchg to change the value from
> idle, and a simple atomic set to restore it back to idle after we have
> acquired it. This allows us to only use a locked operation on acquiring the
> vector without a need for a locked operation to release it.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck at intel.com>
> Tested-by: Phil Schmitt <phillip.j.schmitt at intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher at intel.com>
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 112 ++++++++++-----------------
> drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 5 ++
> 2 files changed, 45 insertions(+), 72 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 211c30e..00da363 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -371,119 +371,87 @@ struct ixgbe_q_vector {
> char name[IFNAMSIZ + 9];
>
> #ifdef CONFIG_NET_RX_BUSY_POLL
> - unsigned int state;
> -#define IXGBE_QV_STATE_IDLE 0
> -#define IXGBE_QV_STATE_NAPI 1 /* NAPI owns this QV */
> -#define IXGBE_QV_STATE_POLL 2 /* poll owns this QV */
> -#define IXGBE_QV_STATE_DISABLED 4 /* QV is disabled */
> -#define IXGBE_QV_OWNED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
> -#define IXGBE_QV_LOCKED (IXGBE_QV_OWNED | IXGBE_QV_STATE_DISABLED)
> -#define IXGBE_QV_STATE_NAPI_YIELD 8 /* NAPI yielded this QV */
> -#define IXGBE_QV_STATE_POLL_YIELD 16 /* poll yielded this QV */
> -#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
> -#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
> - spinlock_t lock;
> #endif /* CONFIG_NET_RX_BUSY_POLL */
> + atomic_t state;
>
> /* for dynamic allocation of rings associated with this q_vector */
> struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
> };
> +
> #ifdef CONFIG_NET_RX_BUSY_POLL
> +enum ixgbe_qv_state_t {
> + IXGBE_QV_STATE_IDLE = 0,
> + IXGBE_QV_STATE_NAPI,
> + IXGBE_QV_STATE_POLL,
> + IXGBE_QV_STATE_DISABLE
> +};
> +
> static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
> {
> -
> - spin_lock_init(&q_vector->lock);
> - q_vector->state = IXGBE_QV_STATE_IDLE;
> + /* reset state to idle */
> + atomic_set(&q_vector->state, IXGBE_QV_STATE_IDLE);
> }
>
> /* called from the device poll routine to get ownership of a q_vector */
> static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
> {
> - int rc = true;
> - spin_lock_bh(&q_vector->lock);
> - if (q_vector->state & IXGBE_QV_LOCKED) {
> - WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
> - q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
> - rc = false;
> + int rc = atomic_cmpxchg(&q_vector->state, IXGBE_QV_STATE_IDLE,
> + IXGBE_QV_STATE_NAPI);
> #ifdef LL_EXTENDED_STATS
> + if (rc != IXGBE_QV_STATE_IDLE)
> q_vector->tx.ring->stats.yields++;
> #endif
> - } else {
> - /* we don't care if someone yielded */
> - q_vector->state = IXGBE_QV_STATE_NAPI;
> - }
> - spin_unlock_bh(&q_vector->lock);
> - return rc;
> +
> + return rc == IXGBE_QV_STATE_IDLE;
> }
>
> /* returns true is someone tried to get the qv while napi had it */
> -static inline bool ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
> +static inline void ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
> {
> - int rc = false;
> - spin_lock_bh(&q_vector->lock);
> - WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
> - IXGBE_QV_STATE_NAPI_YIELD));
> -
> - if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
> - rc = true;
> - /* will reset state to idle, unless QV is disabled */
> - q_vector->state &= IXGBE_QV_STATE_DISABLED;
> - spin_unlock_bh(&q_vector->lock);
> - return rc;
> + WARN_ON(atomic_read(&q_vector->state) != IXGBE_QV_STATE_NAPI);
> +
> + /* flush any outstanding Rx frames */
> + if (q_vector->napi.gro_list)
> + napi_gro_flush(&q_vector->napi, false);
> +
> + /* reset state to idle */
> + atomic_set(&q_vector->state, IXGBE_QV_STATE_IDLE);
> }
>
> /* called from ixgbe_low_latency_poll() */
> static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
> {
> - int rc = true;
> - spin_lock_bh(&q_vector->lock);
> - if ((q_vector->state & IXGBE_QV_LOCKED)) {
> - q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
> - rc = false;
> + int rc = atomic_cmpxchg(&q_vector->state, IXGBE_QV_STATE_IDLE,
> + IXGBE_QV_STATE_POLL);
> #ifdef LL_EXTENDED_STATS
> - q_vector->rx.ring->stats.yields++;
> + if (rc != IXGBE_QV_STATE_IDLE)
> + q_vector->tx.ring->stats.yields++;
> #endif
> - } else {
> - /* preserve yield marks */
> - q_vector->state |= IXGBE_QV_STATE_POLL;
> - }
> - spin_unlock_bh(&q_vector->lock);
> - return rc;
> + return rc == IXGBE_QV_STATE_IDLE;
> }
>
> /* returns true if someone tried to get the qv while it was locked */
> -static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
> +static inline void ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
> {
> - int rc = false;
> - spin_lock_bh(&q_vector->lock);
> - WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
> -
> - if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
> - rc = true;
> - /* will reset state to idle, unless QV is disabled */
> - q_vector->state &= IXGBE_QV_STATE_DISABLED;
> - spin_unlock_bh(&q_vector->lock);
> - return rc;
> + WARN_ON(atomic_read(&q_vector->state) != IXGBE_QV_STATE_POLL);
> +
> + /* reset state to idle */
> + atomic_set(&q_vector->state, IXGBE_QV_STATE_IDLE);
> }
>
> /* true if a socket is polling, even if it did not get the lock */
> static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
> {
> - WARN_ON(!(q_vector->state & IXGBE_QV_OWNED));
> - return q_vector->state & IXGBE_QV_USER_PEND;
> + return atomic_read(&q_vector->state) == IXGBE_QV_STATE_POLL;
> }
>
> /* false if QV is currently owned */
> static inline bool ixgbe_qv_disable(struct ixgbe_q_vector *q_vector)
> {
> - int rc = true;
> - spin_lock_bh(&q_vector->lock);
> - if (q_vector->state & IXGBE_QV_OWNED)
> - rc = false;
> - q_vector->state |= IXGBE_QV_STATE_DISABLED;
> - spin_unlock_bh(&q_vector->lock);
> -
> - return rc;
> + int rc = atomic_cmpxchg(&q_vector->state, IXGBE_QV_STATE_IDLE,
> + IXGBE_QV_STATE_DISABLE);
> +
> + return rc == IXGBE_QV_STATE_IDLE;
> }
>
> #else /* CONFIG_NET_RX_BUSY_POLL */
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index dcd2d1ec..b42e44d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -808,6 +808,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
> ixgbe_poll, 64);
> napi_hash_add(&q_vector->napi);
>
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> + /* initialize busy poll */
> + atomic_set(&q_vector->state, IXGBE_QV_STATE_DISABLE);
> +
> +#endif
> /* tie q_vector and adapter together */
> adapter->q_vector[v_idx] = q_vector;
> q_vector->adapter = adapter;
> --
> 1.9.3
>
More information about the Devel
mailing list