[Devel] [PATCH 2/7] tcp: Charge socket buffers into cg memory (v2)

Thu Jun 4 03:46:26 PDT 2015

On Thu, Jun 04, 2015 at 01:07:08PM +0300, Pavel Emelyanov wrote:
> >>>> +void memcg_charge_kmem_nofail(struct mem_cgroup *memcg, u64 size)
> >>>>  {
> >>>> +	struct res_counter *fail_res;
> >>>> +
> >>>> +	/*
> >>>> +	 * FIXME -- strictly speaking, this value should _also_
> >>>> +	 * be charged into kmem counter. But since res_counter_charge
> >>>> +	 * is sub-optimal (takes locks) AND we do not care much
> >>>> +	 * about kmem limits (at least for now) we can just directly
> >>>> +	 * charge into mem counter.
> >>>> +	 */
> >>>
> >>> Please charge kmem too. As I've already told you it should not make any
> >>> difference in terms of performance, because we already have a bottleneck
> >>> of the same bandwidth.
> >>>
> >>> Anyway, if we see any performance degradation, I will convert
> >>> mem_cgroup->kmem to a percpu counter.
> >>
> >> No, let's do it vice-versa -- first you fix the locking, then I update this code.
> > 
> > I don't understand why, because you provide no arguments and keep
> > ignoring my reasoning why I think charging kmem along with res is OK,
> > which is one paragraph above.
> 
> The bandwidth of the bottleneck doesn't look to be the same -- the res counters
> in question are not in one cache-line and adding one more

> (btw, do we have swap account turned on by default?)

Yes it is. We'll have to switch to percpu stocks here.

> will not come unnoticed.

Fair enough. But there are already 3 calls to res_counter_charge in a
row, which is terrible and must be reworked. My point is that this vague
performance implications are not worth complicating the code that badly
w/o any hope to recover performance back to the level w/o using cgroups.
Performance must be up to a separate task.

> 
> Yet again -- I don't mind changing this and charge TCP into kmem too, I'll do
> it, but after this charging becomes fast enough.

Once again, performance improvements are not for Beta1. I'll file a
separate task for Beta2 for switching from atomics/spinlocks to percpu
counters wherever possible.