[Devel] [PATCH 7/7] packet: Pre-account maximum socket buffer into cg memory

Pavel Emelyanov xemul at parallels.com
Mon Jun 1 09:14:24 PDT 2015


On 06/01/2015 07:01 PM, Vladimir Davydov wrote:
> On Fri, May 29, 2015 at 05:19:15PM +0300, Pavel Emelyanov wrote:
>> Packet sockets have incoming queue of packets that is only limited
>> with per-socket wmem buffer. Strictly speaking we should sum up
>> all the queues and charge them into kmem once new packet arrives,
>> but this will result in huge patch. Since there's typically quite
>> a few of packet sockets in container (tcpdump) we can just forward
>> charge the maximum socket rmem size into cg memory upon socket
>> creation.
> 
> rmem_max ~ 200K
> 
> Small? If you say so, OK.

Yes, 200K is quite small amount. In this context.

>> @@ -554,6 +554,20 @@ void sock_update_memcg(struct sock *sk)
>>  }
>>  EXPORT_SYMBOL(sock_update_memcg);
>>  
>> +struct mem_cgroup *sock_get_current_memcg(void)
> 
> I don't like this funciton, because it has nothing to do with sockets.
> May be, you'd better use try_get_mem_cgroup_from_mm?

>From what mm? From current->mm? I tried to find for try_get_mm_cgroup_from_task
but failed, this is the one I need :D

>> +{
>> +	struct mem_cgroup *cg;
>> +
>> +	rcu_read_lock();
>> +	cg = mem_cgroup_from_task(current);
>> +	if (mem_cgroup_is_root(cg) || !css_tryget(&cg->css))
>> +		cg = NULL;
>> +	rcu_read_unlock();
>> +
>> +	return cg;
>> +}
>> +EXPORT_SYMBOL(sock_get_current_memcg);
>> +
>>  void sock_release_memcg(struct sock *sk)
>>  {
>>  	if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
>> @@ -564,6 +578,12 @@ void sock_release_memcg(struct sock *sk)
>>  	}
>>  }
>>  
>> +void sock_put_memcg(struct mem_cgroup *cg)
>> +{
>> +	css_put(&cg->css);
>> +}
>> +EXPORT_SYMBOL(sock_put_memcg);
>> +
> 
> I would use css_put(mem_cgroup_css(memcg)) instead. Awkward? Perhaps,
> but memcg users do that, and you wouldn't have to introduce yet another
> function.

OK, but I will introduce it anyway with the body you suggest :)

>>  struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
>>  {
>>  	if (!memcg || mem_cgroup_is_root(memcg))
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index e8b5a0d..5a13dfa 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -2420,6 +2420,67 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
>>  		return packet_snd(sock, msg, len);
>>  }
>>  
>> +#ifdef CONFIG_MEMCG_KMEM
>> +struct packet_sk_charge {
>> +	struct mem_cgroup	*memcg;
>> +	unsigned long		amt;
>> +};
>> +
>> +static struct cg_proto *packet_sk_charge(void)
>> +{
>> +	struct packet_sk_charge *psc;
>> +
>> +	if (!mem_cgroup_sockets_enabled)
>> +		return NULL;
>> +
>> +	psc = kmalloc(sizeof(*psc), GFP_KERNEL);
>> +	if (!psc)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	psc->memcg = sock_get_current_memcg();
>> +	if (psc->memcg) {
>> +		/*
>> +		 * Forcedly charge the maximum amount of data this socket
>> +		 * may have. It's typically not huge and packet sockets are
>> +		 * rare guests in containers, so we don't disturb the memory
>> +		 * consumption much.
>> +		 */
>> +		psc->amt = sysctl_rmem_max;
> 
> If psc->amt is constant, may be it'd better not to introduce struct
> packet_sk_charge, but return memcg instead?

No, this constant can be different at uncharge time.

>> +		memcg_charge_kmem_bytes(psc->memcg, psc->amt);
>> +	} else {
>> +		kfree(psc);
>> +		psc = NULL;
>> +	}
>> +	rcu_read_unlock();
> 
> Stale?

Yes :(

>> +
>> +	/*
>> +	 * The sk->sk_cgrp is not used for packet sockets,
>> +	 * so we'll just put the smaller structure into it.
>> +	 */
>> +	return (struct cg_proto *)psc;
>> +}
>> +
>> +static void packet_sk_uncharge(struct cg_proto *cg)
>> +{
>> +	struct packet_sk_charge *psc = (struct packet_sk_charge *)cg;
>> +
>> +	if (psc) {
>> +		memcg_uncharge_kmem_bytes(psc->memcg, psc->amt);
>> +		sock_put_memcg(psc->memcg);
>> +		kfree(psc);
>> +	}
>> +}
>> +#else
>> +static struct cg_proto *packet_sk_charge(void)
>> +{
>> +	return NULL;
>> +}
>> +
>> +static void packet_sk_uncharge(struct sock *sk)
> 
> It does not match the definition above.

:D

-- Pavel




More information about the Devel mailing list