[Devel] Re: [RFC][PATCH 2/7] RSS controller core
Herbert Poetzl
herbert at 13thfloor.at
Mon Mar 19 08:48:08 PDT 2007
On Sun, Mar 18, 2007 at 11:42:15AM -0600, Eric W. Biederman wrote:
> Dave Hansen <hansendc at us.ibm.com> writes:
>
> > On Fri, 2007-03-16 at 12:54 -0600, Eric W. Biederman wrote:
> >> Dave Hansen <hansendc at us.ibm.com> writes:
> >
> >> - Why do limits have to apply to the unmapped page cache?
> >
> > To me, it is just because it consumes memory. Unmapped cache is, of
> > couse, much more easily reclaimed than mapped files, but it still
> > fundamentally causes pressure on the VM.
> >
> > To me, a process sitting there doing constant reads of 10 pages has the
> > same overhead to the VM as a process sitting there with a 10 page file
> > mmaped, and reading that.
>
> I can see temporarily accounting for pages in use for such a
> read/write and possibly during things such as read ahead.
>
> However I doubt it is enough memory to be significant, and as
> such is probably a waste of time accounting for it.
>
> A memory limit is not about accounting for memory pressure, so I think
> the reasoning for wanting to account for unmapped pages as a hard
> requirement is still suspect.
> A memory limit is to prevent one container from hogging all of the
> memory in the system, and denying it to other containers.
exactly!
nevertheless, you might want to extend that to swapping
and to the very expensive page in/out operations too
> The page cache by definition is a global resource that facilitates
> global kernel optimizations. If we kill those optimizations we
> are on the wrong track. By requiring limits there I think we are
> very likely to kill our very important global optimizations, and bring
> the performance of the entire system down.
that is my major concern for most of the 'straight forward'
virtualizations proposed (see Xen comment)
> >> - Could you mention proper multi process RSS limits.
> >> (I.e. we count the number of pages each group of processes have mapped
> >> and limit that).
> >> It is the same basic idea as partial page ownership, but instead of
> >> page ownership you just count how many pages each group is using and
> >> strictly limit that. There is no page owner ship or partial charges.
> >> The overhead is just walking the rmap list at map and unmap time to
> >> see if this is the first users in the container. No additional kernel
> >> data structures are needed.
> >
> > I've tried to capture this. Let me know what else you think it
> > needs.
>
> Requirements:
> - The current kernel global optimizations are preserved and useful.
>
> This does mean one container can affect another when the
> optimizations go awry but on average it means much better
> performance. For many the global optimizations are what make
> the in-kernel approach attractive over paravirtualization.
total agreement here
> Very nice to have:
> - Limits should be on things user space have control of.
>
> Saying you can only have X bytes of kernel memory for file
> descriptors and the like is very hard to work with. Saying you
> can have only N file descriptors open is much easier to deal with.
yep, and IMHO more natural ...
> - SMP Scalability.
>
> The final implementation should have per cpu counters or per task
> reservations so in most instances we don't need to bounce a global
> cache line around to perform the accounting.
agreed, we want to optimize for small systems
as well as for large ones, and SMP/NUMA is quite
common in the server area (even for small servers)
> Nice to have:
>
> - Perfect precision.
>
> Having every last byte always accounted for is nice but a
> little bit of bounded fuzziness in the accounting is acceptable
> if it that make the accounting problem more tractable.
as long as the accounting is consistant, i.e.
you do not lose resources by repetitive operations
inside the guest (or through guest-guest interaction)
as this could be used for DoS and intentional unfairness
> We need several more limits in this discussion to get a full picture,
> otherwise we may to try and build the all singing all dancing limit.
> - A limit on the number of anonymous pages.
> (Pages that are or may be in the swap cache).
> - Filesystem per container quotas.
> (Only applicable in some contexts but you get the idea).
with shared files, otherwise an lvm partition does
a good job for that already ...
> - Inode, file descriptor, and similar limits.
> - I/O limits.
I/O and CPU limits are special, as they have the temporal
component, i.e. you are not interested in 10s CPU time,
instead you want 0.5s/s CPU (same for I/O)
note: this is probably also true for page in/out
- sockets
- locks
- dentries
HTH,
Herbert
> Eric
>
> _______________________________________________
> Containers mailing list
> Containers at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list