[Devel] Re: [RFC][PATCH] another swap controller for cgroup
YAMAMOTO Takashi
yamamoto at valinux.co.jp
Tue May 6 22:50:44 PDT 2008
hi,
> Hi, Thanks for the patches and your patience. I've just applied your
> patches on top of 2.6.25-mm1 (it had a minor reject, that I've fixed).
> I am building and testing the patches along with KAMEZAWA-San's low
> overhead patches.
thanks.
> > +#include <linux/err.h>
> > +#include <linux/cgroup.h>
> > +#include <linux/hugetlb.h>
>
> My powerpc build fails, we need to move hugetlb.h down to the bottom
what's the error message?
> > +struct swap_cgroup {
> > + struct cgroup_subsys_state scg_css;
>
> Can't we call this just css. Since the structure is swap_cgroup it
> already has the required namespace required to distinguish it from
> other css's. Please see page 4 of "The practice of programming", be
> consistent. The same comment applies to all members below.
i don't have the book.
i like this kind of prefixes as it's grep-friendly.
> > +#define task_to_css(task) task_subsys_state((task), swap_cgroup_subsys_id)
> > +#define css_to_scg(css) container_of((css), struct swap_cgroup, scg_css)
> > +#define cg_to_css(cg) cgroup_subsys_state((cg), swap_cgroup_subsys_id)
> > +#define cg_to_scg(cg) css_to_scg(cg_to_css(cg))
>
> Aren't static inline better than macros? I would suggest moving to
> them.
sounds like a matter of preference.
either ok for me.
> > +static struct swap_cgroup *
> > +swap_cgroup_prepare_ptp(struct page *ptp, struct mm_struct *mm)
> > +{
> > + struct swap_cgroup *scg = ptp->ptp_swap_cgroup;
> > +
>
> Is this routine safe w.r.t. concurrent operations, modifications to
> ptp_swap_cgroup?
it's always accessed with the page table locked.
> > + BUG_ON(mm == NULL);
> > + BUG_ON(mm->swap_cgroup == NULL);
> > + if (scg == NULL) {
> > + /*
> > + * see swap_cgroup_attach.
> > + */
> > + smp_rmb();
> > + scg = mm->swap_cgroup;
>
> With the mm->owner patches, we need not maintain a separate
> mm->swap_cgroup.
i don't think the mm->owner patch, at least with the current form,
can replace it.
> > + /*
> > + * swap_cgroup_attach is in progress.
> > + */
> > +
> > + res_counter_charge_force(&newscg->scg_counter, PAGE_CACHE_SIZE);
>
> So, we force the counter to go over limit?
yes.
as newscg != oldscg here means the task is being moved between cgroups,
this instance of res_counter_charge_force should not matter much.
> > +static int
> > +swap_cgroup_write_u64(struct cgroup *cg, struct cftype *cft, u64 val)
> > +{
> > + struct res_counter *counter = &cg_to_scg(cg)->scg_counter;
> > + unsigned long flags;
> > +
> > + /* XXX res_counter_write_u64 */
> > + BUG_ON(cft->private != RES_LIMIT);
> > + spin_lock_irqsave(&counter->lock, flags);
> > + counter->limit = val;
> > + spin_unlock_irqrestore(&counter->lock, flags);
> > + return 0;
> > +}
> > +
>
> We need to write actual numbers here? Can't we keep the write
> interface consistent with the memory controller?
i'm not sure what you mean here. can you explain a bit more?
do you mean K, M, etc?
> > +static void
> > +swap_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cg)
> > +{
> > + struct swap_cgroup *oldscg = cg_to_scg(cg);
> > + struct swap_cgroup *newscg;
> > + struct list_head *pos;
> > + struct list_head *next;
> > +
> > + /*
> > + * move our anonymous objects to init_mm's group.
> > + */
>
> Is this good design, should be allow a swap_cgroup to be destroyed,
> even though pages are allocated to it?
first of all, i'm not quite happy with this design. :)
having said that, what else can we do?
i tend to think that trying to swap-in these pages is too much effort
for little benefit.
> Is moving to init_mm (root
> cgroup) a good idea? Ideally with support for hierarchies, if we ever
> do move things, it should be to the parent cgroup.
i chose init_mm because there seemed to be no consensus about
cgroup hierarchy semantics.
> > + info->swap_cgroup = newscg;
> > + res_counter_uncharge(&oldscg->scg_counter, bytes);
> > + res_counter_charge_force(&newscg->scg_counter, bytes);
>
> I don't like the excessive use of res_counter_charge_force(), it seems
> like we might end up bypassing the controller all together. I would
> rather fail the destroy operation if the charge fails.
> > + bytes = swslots * PAGE_CACHE_SIZE;
> > + res_counter_uncharge(&oldscg->scg_counter, bytes);
> > + /*
> > + * XXX ignore newscg's limit because cgroup ->attach method can't fail.
> > + */
> > + res_counter_charge_force(&newscg->scg_counter, bytes);
>
> But in the future, we could plan on making attach fail (I have a
> requirement for it). Again, I don't like the _force operation
allowing these operations fail implies to have code to back out
partial operations. i'm afraid that it will be too complex.
> > +static void
> > +swap_cgroup_attach_mm(struct mm_struct *mm, struct swap_cgroup *oldscg,
> > + struct swap_cgroup *newscg)
>
> We need comments about the function, why do we attach an mm?
instead of a task, you mean?
because we count the number of ptes which points to swap
and ptes belong to an mm, not a task.
YAMAMOTO Takashi
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list