[Devel] Re: [PATCHSET] 2.6.20-lxc8

Kirill Korotaev dev at openvz.org
Fri Mar 23 02:35:01 PDT 2007


Eric W. Biederman wrote:
> Benjamin Thery <benjamin.thery at bull.net> writes:
> 
> 
>>My investigations on the increase of cpu load when running netperf inside a
>>container (ie. through etun2<->etun1) is progressing slowly.
>>
>>I'm not sure the cause is fragmentation as we supposed initially.
>>In fact, it seems related to forwarding the packets between the devices.
>>
>>Here is what I've tracked so far:
>>* when we run netperf from the container, oprofile reports that the top
>>"consuming" symbol is: "pskb_expand_head". Next comes
>>"csum_partial_copy_generic". these symbols represents respectively 13.5% and
>>9.7% of the samples.
>>* Without container, these symbols don't show up in the first 20 entries.
>>
>>Who is calling "pskb_expand_head" in this case?
>>
>>Using systemtap, I determined that the call to "pskb_expand_head" comes from the
>>skb_cow() in ip_forward() (l.90 in 2.6.20-rc5-netns).
>>
>>The number of calls to "pskb_expand_head" matches the number of invocations of
>>ip_forward() (268000 calls for a 20 seconds netperf session in my case).
> 
> 
> Ok.  This seems to make sense, and is related to how we have configured the
> network in this case.
> 
> It looks like pskb_expand_head is called by skb_cow.
> 
> skb_cow has two cases when it calls pskb_expand_head.
> - When there are multiple people who have a copy of the packet
>   (tcpdump and friends)
> - When there isn't enough room for the hard header.
> 
> Any chance one of you guys looking into this can instrument up
> ip_foward just before the call to skb_cow and find out which
> reason it is?
> 
> A cheap trick to make the overhead go away is probably to setup
> ethernet bridging in this case...
> 
> But if we can ensure the ip_foward case does not need to do anything
> more than modify the ttl and update the destination that would
> be good to.
> 
> Anyway this does look very solvable.

we have the hack below in ip_forward() to avoid skb_cow(),
Banjamin, can you check whether it helps in your case please?
(NOTE: you will need to replace check for NETIF_F_VENET with something else
 or introduce the same flag on etun device).

diff -upr linux-2.6.18-rhel5.orig/net/ipv4/ip_forward.c linux-2.6.18-rhel5-028stab023/net/ipv4/ip_forward.c
--- linux-2.6.18-rhel5.orig/net/ipv4/ip_forward.c       2006-09-20 07:42:06.000000000 +0400
+++ linux-2.6.18-rhel5-028stab023/net/ipv4/ip_forward.c 2007-03-20 17:22:45.000000000 +0300
@@ -86,6 +86,24 @@ int ip_forward(struct sk_buff *skb)
        if (opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
                goto sr_failed;

+       /*
+        * We try to optimize forwarding of VE packets:
+        * do not decrement TTL (and so save skb_cow)
+        * during forwarding of outgoing pkts from VE.
+        * For incoming pkts we still do ttl decr,
+        * since such skb is not cloned and does not require
+        * actual cow. So, there is at least one place
+        * in pkts path with mandatory ttl decr, that is
+        * sufficient to prevent routing loops.
+        */
+       iph = skb->nh.iph;
+       if (
+#ifdef CONFIG_IP_ROUTE_NAT
+           (rt->rt_flags & RTCF_NAT) == 0 &&     /* no NAT mangling expected */
+#endif                                           /* and */
+           (skb->dev->features & NETIF_F_VENET)) /* src is VENET device */
+               goto no_ttl_decr;
+
        /* We are about to mangle packet. Copy it! */
        if (skb_cow(skb, LL_RESERVED_SPACE(rt->u.dst.dev)+rt->u.dst.header_len))
                goto drop;
@@ -94,6 +112,8 @@ int ip_forward(struct sk_buff *skb)
        /* Decrease ttl after skb cow done */
        ip_decrease_ttl(iph);

+no_ttl_decr:
+
        /*
         *      We now generate an ICMP HOST REDIRECT giving the route
         *      we calculated.
@@ -121,3 +141,5 @@ drop:
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list