[Devel] [PATCH RFC rh7] vznetstat: Move the code to drop redundant skb marks to *_xmit() functions #PSBM-122082

Konstantin Khorenko khorenko at virtuozzo.com
Tue Nov 17 14:54:14 MSK 2020


On 11/17/2020 01:07 PM, Konstantin Khorenko wrote:
> On 11/17/2020 08:55 AM, Vasily Averin wrote:
>> See comments below
>> On 11/16/20 10:40 PM, Konstantin Khorenko wrote:
>>> We have added code to clear vz specific skb marks, but it's done in the
>>> hook NF_INET_LOCAL_IN (which is not the very first one) and if someone
>>> sets a mark with xt_mark in PREROUTING chain, the mark will be set in
>>> NF_INET_PRE_ROUTING hook (which is prior to NF_INET_LOCAL_IN) and
>>> dropped by our hook venet_acct_in_hook() in later NF_INET_LOCAL_IN hook.
>>>
>>
>>> diff --git a/drivers/net/venetdev.c b/drivers/net/venetdev.c
>>> index 28d06bca1f86..5385e3e7f30f 100644
>>> --- a/drivers/net/venetdev.c
>>> +++ b/drivers/net/venetdev.c
>>> @@ -464,6 +464,12 @@ static int venet_xmit(struct sk_buff *skb, struct net_device *dev)
>>>
>>>  	stats = venet_stats(dev, smp_processor_id());
>>>  	ve = dev_net(dev)->owner_ve;
>>> +	/*
>>> +	 * If the packet goes in VE0 -> VE direction, drop the skb mark used
>>> +	 * for vz traffic shaping.
>>
>> Please make comment more clear.
>> Why we should drop vz traffic shaping mark here?
>> You know that it is used for outgoing traffic only, but reader does not.
>
> Ok, will do in the next version.
>
>>
>>> diff --git a/kernel/ve/vznetstat/ip_vznetstat.c b/kernel/ve/vznetstat/ip_vznetstat.c
>>> index 5ea978d6dd88..a33c41c05dba 100644
>>> --- a/kernel/ve/vznetstat/ip_vznetstat.c
>>> +++ b/kernel/ve/vznetstat/ip_vznetstat.c
>>> @@ -92,6 +103,15 @@ static unsigned int venet_acct_out_hook(const struct nf_hook_ops *hook,
>>>  	 */
>>>  	if (dst && (dst->dev->flags & IFF_LOOPBACK))
>>>  		goto out;
>>> +	/*
>>> +	 * Don't account and mark VE traffic if it does not cross VE/VE0
>>> +	 * boundary.
>>> +	 * For VE0 there can be extra accounting in case of traffic
>>> +	 * between Docker Containers in VE0.
>>> +	 */
>>> +	if (!(ve_is_super(out->nd_net->owner_ve)) &&
>>> +	    dst && !(dst->dev->features & NETIF_F_VENET))
>>> +		goto out;
>>
>> Are you sure that we can use dst for correct "outgoing traffic"-check here?
>> Can be traffic re-routed after this chain?
>>
>> I mean the following scenario:
>> dst can  be "internal" first but the skb is rerouted to outside
>> or vice versa:
>> when "external" traffic is re-routed to local netdevices?
>
> 1) i do not change the place for accounting and setting mark by this patch,
>     so i definitely don't make it worse.
>
> 2) i do see packets in venet_acct_out_hook() with "venet0" "out" net device and
>     "lo" net device in skb->dst, so at least _some_ rerouting to local netdevice
>     is already done up to the moment.
>
> So the only question left - can dst change after NF_INET_LOCAL_OUT netfilter hook?
> Cannot find the answer up to the moment, will dig further.

Well, the code is ... not trivial.
There are places (most of cases) where dst is changed in NF_INET_LOCAL_OUT hook,
but our venet hook goes with NF_IP_PRI_LAST priority and no other hook uses that priority.
So this is fine.

Nevertheless there are cases when dst is changed after this case, for example in IPSec:

iptable_nat_ipv4_out()
  nf_nat_ipv4_out()
   nf_xfrm_me_harder()
{
...
         dst = xfrm_lookup(net, dst, &fl, sk, 0);
         if (IS_ERR(dst))
                 return PTR_ERR(dst);

         skb_dst_drop(skb);
         skb_dst_set(skb, dst);
...
}
         {
                 .hook           = iptable_nat_ipv4_out,
                 .owner          = THIS_MODULE,
                 .pf             = NFPROTO_IPV4,
                 .hooknum        = NF_INET_POST_ROUTING,
                 .priority       = NF_IP_PRI_NAT_SRC,
         },

And dst is changed in some exotic cases even in some *_xmit() functions.
Well, may be for the sake of fairness we better move the vznetstat accounting
(and setting mark) code to venet_xmit()/veth_xmit() functions as well,
but honestly i do not make such big changes in vz7.
We live with this kind of accounting for a long time and it works well for most usecases.

So i'd leave setting mark on the same place along with accounting.


> + i've just found that i forgot to handle ipv6 venet hooks in the similar manner.


More information about the Devel mailing list