[Devel] [PATCH RH7 1/2] ixgbe: Refactor busy poll socket code to address multiple issues

Andrew Vagin avagin at virtuozzo.com
Thu Dec 10 22:45:31 PST 2015


Acked-by: Andrew Vagin <avagin at virtuozzo.com>

On Fri, Dec 04, 2015 at 05:58:20PM +0300, Pavel Tikhomirov wrote:
> In netpoll_send_skb_on_dev we do not trigger warn so irqs are disabled.
> And in ixgbe_poll->ixgbe_qv_unlock_napi with irqs disabled we use
> spin_unlock_bh, that is bug as inside it has local_irq_enable, witch
> will enable hardware interrupts.
> 
> So port patch witch changes locking here to atomic_cmpxchg.
> 
> https://jira.sw.ru/browse/PSBM-40330
> 
> WARNING:
> Dec  1 18:03:10 p9 kernel: netpoll: netconsole: local port 6666
> Dec  1 18:03:10 p9 kernel: netpoll: netconsole: local IPv4 address 10.94.2.172
> Dec  1 18:03:10 p9 kernel: netpoll: netconsole: interface 'br0'
> Dec  1 18:03:10 p9 kernel: netpoll: netconsole: remote port 6666
> Dec  1 18:03:10 p9 kernel: netpoll: netconsole: remote IPv4 address 10.29.0.70
> Dec  1 18:03:10 p9 kernel: netpoll: netconsole: remote ethernet address 7c:95:f3:2e:05:64
> Dec  1 18:03:10 p9 kernel: ------------[ cut here ]------------
> Dec  1 18:03:10 p9 kernel: WARNING: at kernel/softirq.c:162 local_bh_enable_ip+0x7a/0xa0()
> Dec  1 18:03:10 p9 kernel: Modules linked in: netconsole(+) xt_CHECKSUM iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables 8021q garp mrp intel_powerclamp coretemp kvm_intel kvm snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul ioatdma glue_helper soundcore ablk_helper cryptd mei_me pcspkr ses sb_edac enclosure edac_core mei i2c_i801 ipmi_si shpchp lpc_ich mfd_core wmi ipmi_msghandler dm_mod ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common mgag200 isci syscopyarea sysfillrect sysimgblt drm_kms_helper ttm libsas ixgbe drm igb scsi_transport_sas ahci libahci megaraid_sas libata mdio
> Dec  1 18:03:10 p9 kernel: ptp pps_core dca i2c_algo_bit i2c_core ip6_vzprivnet ip6_vznetstat pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_vznetstat ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge stp llc
> Dec  1 18:03:10 p9 kernel: CPU: 31 PID: 3649 Comm: modprobe ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.14-00004-gcc670bd-dirty #2 9.14
> Dec  1 18:03:10 p9 kernel: Hardware name: DEPO Computers X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> Dec  1 18:03:10 p9 kernel: 00000000000000a2 000000001e836364 ffff883fd0aeb908 ffffffff816222a3
> Dec  1 18:03:10 p9 kernel: ffff883fd0aeb940 ffffffff8107025b ffff883fc90d2040 ffff883fc90d2368
> Dec  1 18:03:10 p9 kernel: 0000000000000005 0000000000000001 00000000fffffe05 ffff883fd0aeb950
> Dec  1 18:03:10 p9 kernel: Call Trace:
> Dec  1 18:03:10 p9 kernel: [<ffffffff816222a3>] dump_stack+0x19/0x1b
> Dec  1 18:03:10 p9 kernel: [<ffffffff8107025b>] warn_slowpath_common+0x6b/0xa0
> Dec  1 18:03:10 p9 kernel: [<ffffffff8107039a>] warn_slowpath_null+0x1a/0x20
> Dec  1 18:03:10 p9 kernel: [<ffffffff81079bea>] local_bh_enable_ip+0x7a/0xa0
> Dec  1 18:03:10 p9 kernel: [<ffffffff816285cb>] _raw_spin_unlock_bh+0x1b/0x70
> Dec  1 18:03:10 p9 kernel: [<ffffffffa02e123d>] ixgbe_poll+0x54d/0x980 [ixgbe]
> Dec  1 18:03:10 p9 kernel: [<ffffffff8153d1b6>] netpoll_poll_dev+0x1b6/0xa80
> Dec  1 18:03:10 p9 kernel: [<ffffffffa02dcf4b>] ? ixgbe_select_queue+0xbb/0x120 [ixgbe]
> Dec  1 18:03:10 p9 kernel: [<ffffffff8153ddc0>] netpoll_send_skb_on_dev+0x340/0x480
> Dec  1 18:03:10 p9 kernel: [<ffffffff8119cf78>] ? __list_lru_walk_one.isra.3+0x148/0x170
> Dec  1 18:03:10 p9 kernel: [<ffffffffa00129d6>] __br_deliver+0x106/0x130 [bridge]
> Dec  1 18:03:10 p9 kernel: [<ffffffffa0012e83>] br_deliver+0x63/0x70 [bridge]
> Dec  1 18:03:10 p9 kernel: [<ffffffffa0010250>] br_dev_xmit+0x240/0x2a0 [bridge]
> Dec  1 18:03:10 p9 kernel: [<ffffffff8153dcf6>] netpoll_send_skb_on_dev+0x276/0x480
> Dec  1 18:03:10 p9 kernel: [<ffffffff8153e22b>] netpoll_send_udp+0x27b/0x390
> Dec  1 18:03:10 p9 kernel: [<ffffffffa05d383f>] write_msg+0xcf/0x140 [netconsole]
> Dec  1 18:03:10 p9 kernel: [<ffffffff81070f6f>] call_console_drivers.constprop.21+0x9f/0xf0
> Dec  1 18:03:10 p9 kernel: [<ffffffff810718b3>] console_unlock+0x293/0x410
> Dec  1 18:03:10 p9 kernel: [<ffffffff810726de>] register_console+0x14e/0x3e0
> Dec  1 18:03:10 p9 kernel: [<ffffffffa05d81a0>] init_netconsole+0x1a0/0x1000 [netconsole]
> Dec  1 18:03:10 p9 kernel: [<ffffffffa05d8000>] ? 0xffffffffa05d7fff
> Dec  1 18:03:10 p9 kernel: [<ffffffff810020e8>] do_one_initcall+0xb8/0x230
> Dec  1 18:03:10 p9 kernel: [<ffffffff810f3996>] load_module+0x1656/0x1cc0
> Dec  1 18:03:10 p9 kernel: [<ffffffff8130c7a0>] ? ddebug_proc_write+0xf0/0xf0
> Dec  1 18:03:10 p9 kernel: [<ffffffff810ef6aa>] ? copy_module_from_fd.isra.42+0x5a/0x150
> Dec  1 18:03:10 p9 kernel: [<ffffffff810f41de>] SyS_finit_module+0xbe/0xf0
> Dec  1 18:03:10 p9 kernel: [<ffffffff816318c9>] system_call_fastpath+0x16/0x1b
> Dec  1 18:03:10 p9 kernel: ---[ end trace 44dd3910a6b1ff80 ]---
> Dec  1 18:03:10 p9 kernel: Tainting kernel with flag 0x9
> 
> Port ms commit:
>   commit adc810900a703ee78fe88fd65e086d359fec04b2
>   Author: Alexander Duyck <alexander.h.duyck at intel.com>
>   Date:   Sat Jul 26 02:42:44 2014 +0000
> 
>     This change addresses several issues in the current ixgbe implementation of
>     busy poll sockets.
> 
>     First was the fact that it was possible for frames to be delivered out of
>     order if they were held in GRO.  This is addressed by flushing the GRO buffers
>     before releasing the q_vector back to the idle state.
> 
>     The other issue was the fact that we were having to take a spinlock on
>     changing the state to and from idle.  To resolve this I have replaced the
>     state value with an atomic and use atomic_cmpxchg to change the value from
>     idle, and a simple atomic set to restore it back to idle after we have
>     acquired it.  This allows us to only use a locked operation on acquiring the
>     vector without a need for a locked operation to release it.
> 
>     Signed-off-by: Alexander Duyck <alexander.h.duyck at intel.com>
>     Tested-by: Phil Schmitt <phillip.j.schmitt at intel.com>
>     Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher at intel.com>
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h     | 112 ++++++++++-----------------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c |   5 ++
>  2 files changed, 45 insertions(+), 72 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 211c30e..00da363 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -371,119 +371,87 @@ struct ixgbe_q_vector {
>  	char name[IFNAMSIZ + 9];
>  
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> -	unsigned int state;
> -#define IXGBE_QV_STATE_IDLE        0
> -#define IXGBE_QV_STATE_NAPI	   1     /* NAPI owns this QV */
> -#define IXGBE_QV_STATE_POLL	   2     /* poll owns this QV */
> -#define IXGBE_QV_STATE_DISABLED	   4     /* QV is disabled */
> -#define IXGBE_QV_OWNED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
> -#define IXGBE_QV_LOCKED (IXGBE_QV_OWNED | IXGBE_QV_STATE_DISABLED)
> -#define IXGBE_QV_STATE_NAPI_YIELD  8     /* NAPI yielded this QV */
> -#define IXGBE_QV_STATE_POLL_YIELD  16    /* poll yielded this QV */
> -#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
> -#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
> -	spinlock_t lock;
>  #endif  /* CONFIG_NET_RX_BUSY_POLL */
> +	atomic_t state;
>  
>  	/* for dynamic allocation of rings associated with this q_vector */
>  	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
>  };
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> +enum ixgbe_qv_state_t {
> +	IXGBE_QV_STATE_IDLE = 0,
> +	IXGBE_QV_STATE_NAPI,
> +	IXGBE_QV_STATE_POLL,
> +	IXGBE_QV_STATE_DISABLE
> +};
> +
>  static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
>  {
> -
> -	spin_lock_init(&q_vector->lock);
> -	q_vector->state = IXGBE_QV_STATE_IDLE;
> +	/* reset state to idle */
> +	atomic_set(&q_vector->state, IXGBE_QV_STATE_IDLE);
>  }
>  
>  /* called from the device poll routine to get ownership of a q_vector */
>  static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
>  {
> -	int rc = true;
> -	spin_lock_bh(&q_vector->lock);
> -	if (q_vector->state & IXGBE_QV_LOCKED) {
> -		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
> -		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
> -		rc = false;
> +	int rc = atomic_cmpxchg(&q_vector->state, IXGBE_QV_STATE_IDLE,
> +				IXGBE_QV_STATE_NAPI);
>  #ifdef LL_EXTENDED_STATS
> +	if (rc != IXGBE_QV_STATE_IDLE)
>  		q_vector->tx.ring->stats.yields++;
>  #endif
> -	} else {
> -		/* we don't care if someone yielded */
> -		q_vector->state = IXGBE_QV_STATE_NAPI;
> -	}
> -	spin_unlock_bh(&q_vector->lock);
> -	return rc;
> +
> +	return rc == IXGBE_QV_STATE_IDLE;
>  }
>  
>  /* returns true is someone tried to get the qv while napi had it */
> -static inline bool ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
> +static inline void ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
>  {
> -	int rc = false;
> -	spin_lock_bh(&q_vector->lock);
> -	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
> -			       IXGBE_QV_STATE_NAPI_YIELD));
> -
> -	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
> -		rc = true;
> -	/* will reset state to idle, unless QV is disabled */
> -	q_vector->state &= IXGBE_QV_STATE_DISABLED;
> -	spin_unlock_bh(&q_vector->lock);
> -	return rc;
> +	WARN_ON(atomic_read(&q_vector->state) != IXGBE_QV_STATE_NAPI);
> +
> +	/* flush any outstanding Rx frames */
> +	if (q_vector->napi.gro_list)
> +		napi_gro_flush(&q_vector->napi, false);
> +
> +	/* reset state to idle */
> +	atomic_set(&q_vector->state, IXGBE_QV_STATE_IDLE);
>  }
>  
>  /* called from ixgbe_low_latency_poll() */
>  static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
>  {
> -	int rc = true;
> -	spin_lock_bh(&q_vector->lock);
> -	if ((q_vector->state & IXGBE_QV_LOCKED)) {
> -		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
> -		rc = false;
> +	int rc = atomic_cmpxchg(&q_vector->state, IXGBE_QV_STATE_IDLE,
> +				IXGBE_QV_STATE_POLL);
>  #ifdef LL_EXTENDED_STATS
> -		q_vector->rx.ring->stats.yields++;
> +	if (rc != IXGBE_QV_STATE_IDLE)
> +		q_vector->tx.ring->stats.yields++;
>  #endif
> -	} else {
> -		/* preserve yield marks */
> -		q_vector->state |= IXGBE_QV_STATE_POLL;
> -	}
> -	spin_unlock_bh(&q_vector->lock);
> -	return rc;
> +	return rc == IXGBE_QV_STATE_IDLE;
>  }
>  
>  /* returns true if someone tried to get the qv while it was locked */
> -static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
> +static inline void ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
>  {
> -	int rc = false;
> -	spin_lock_bh(&q_vector->lock);
> -	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
> -
> -	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
> -		rc = true;
> -	/* will reset state to idle, unless QV is disabled */
> -	q_vector->state &= IXGBE_QV_STATE_DISABLED;
> -	spin_unlock_bh(&q_vector->lock);
> -	return rc;
> +	WARN_ON(atomic_read(&q_vector->state) != IXGBE_QV_STATE_POLL);
> +
> +	/* reset state to idle */
> +	atomic_set(&q_vector->state, IXGBE_QV_STATE_IDLE);
>  }
>  
>  /* true if a socket is polling, even if it did not get the lock */
>  static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
>  {
> -	WARN_ON(!(q_vector->state & IXGBE_QV_OWNED));
> -	return q_vector->state & IXGBE_QV_USER_PEND;
> +	return atomic_read(&q_vector->state) == IXGBE_QV_STATE_POLL;
>  }
>  
>  /* false if QV is currently owned */
>  static inline bool ixgbe_qv_disable(struct ixgbe_q_vector *q_vector)
>  {
> -	int rc = true;
> -	spin_lock_bh(&q_vector->lock);
> -	if (q_vector->state & IXGBE_QV_OWNED)
> -		rc = false;
> -	q_vector->state |= IXGBE_QV_STATE_DISABLED;
> -	spin_unlock_bh(&q_vector->lock);
> -
> -	return rc;
> +	int rc = atomic_cmpxchg(&q_vector->state, IXGBE_QV_STATE_IDLE,
> +				IXGBE_QV_STATE_DISABLE);
> +
> +	return rc == IXGBE_QV_STATE_IDLE;
>  }
>  
>  #else /* CONFIG_NET_RX_BUSY_POLL */
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index dcd2d1ec..b42e44d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -808,6 +808,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>  		       ixgbe_poll, 64);
>  	napi_hash_add(&q_vector->napi);
>  
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	/* initialize busy poll */
> +	atomic_set(&q_vector->state, IXGBE_QV_STATE_DISABLE);
> +
> +#endif
>  	/* tie q_vector and adapter together */
>  	adapter->q_vector[v_idx] = q_vector;
>  	q_vector->adapter = adapter;
> -- 
> 1.9.3
> 


More information about the Devel mailing list