[Devel] Re: [PATCH] io-controller: Add io group reference handling for request

Andrea Righi righi.andrea at gmail.com
Mon May 18 07:39:23 PDT 2009


On Mon, May 18, 2009 at 10:01:14AM -0400, Vivek Goyal wrote:
> On Sun, May 17, 2009 at 12:26:06PM +0200, Andrea Righi wrote:
> > On Fri, May 15, 2009 at 10:06:43AM -0400, Vivek Goyal wrote:
> > > On Fri, May 15, 2009 at 09:48:40AM +0200, Andrea Righi wrote:
> > > > On Fri, May 15, 2009 at 01:15:24PM +0800, Gui Jianfeng wrote:
> > > > > Vivek Goyal wrote:
> > > > > ...
> > > > > >  }
> > > > > > @@ -1462,20 +1462,27 @@ struct io_cgroup *get_iocg_from_bio(stru
> > > > > >  /*
> > > > > >   * Find the io group bio belongs to.
> > > > > >   * If "create" is set, io group is created if it is not already present.
> > > > > > + * If "curr" is set, io group is information is searched for current
> > > > > > + * task and not with the help of bio.
> > > > > > + *
> > > > > > + * FIXME: Can we assume that if bio is NULL then lookup group for current
> > > > > > + * task and not create extra function parameter ?
> > > > > >   *
> > > > > > - * Note: There is a narrow window of race where a group is being freed
> > > > > > - * by cgroup deletion path and some rq has slipped through in this group.
> > > > > > - * Fix it.
> > > > > >   */
> > > > > > -struct io_group *io_get_io_group_bio(struct request_queue *q, struct bio *bio,
> > > > > > -					int create)
> > > > > > +struct io_group *io_get_io_group(struct request_queue *q, struct bio *bio,
> > > > > > +					int create, int curr)
> > > > > 
> > > > >   Hi Vivek,
> > > > > 
> > > > >   IIUC we can get rid of curr, and just determine iog from bio. If bio is not NULL,
> > > > >   get iog from bio, otherwise get it from current task.
> > > > 
> > > > Consider also that get_cgroup_from_bio() is much more slow than
> > > > task_cgroup() and need to lock/unlock_page_cgroup() in
> > > > get_blkio_cgroup_id(), while task_cgroup() is rcu protected.
> > > > 
> > > 
> > > True.
> > > 
> > > > BTW another optimization could be to use the blkio-cgroup functionality
> > > > only for dirty pages and cut out some blkio_set_owner(). For all the
> > > > other cases IO always occurs in the same context of the current task,
> > > > and you can use task_cgroup().
> > > > 
> > > 
> > > Yes, may be in some cases we can avoid setting page owner. I will get
> > > to it once I have got functionality going well. In the mean time if
> > > you have a patch for it, it will be great.
> > > 
> > > > However, this is true only for page cache pages, for IO generated by
> > > > anonymous pages (swap) you still need the page tracking functionality
> > > > both for reads and writes.
> > > > 
> > > 
> > > Right now I am assuming that all the sync IO will belong to task
> > > submitting the bio hence use task_cgroup() for that. Only for async
> > > IO, I am trying to use page tracking functionality to determine the owner.
> > > Look at elv_bio_sync(bio).
> > > 
> > > You seem to be saying that there are cases where even for sync IO, we
> > > can't use submitting task's context and need to rely on page tracking
> > > functionlity? In case of getting page (read) from swap, will it not happen
> > > in the context of process who will take a page fault and initiate the
> > > swap read?
> > 
> > No, for example in read_swap_cache_async():
> > 
> > @@ -308,6 +309,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
> >  		 */
> >  		__set_page_locked(new_page);
> >  		SetPageSwapBacked(new_page);
> > +		blkio_cgroup_set_owner(new_page, current->mm);
> >  		err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
> >  		if (likely(!err)) {
> >  			/*
> > 
> > This is a read, but the current task is not always the owner of this
> > swap cache page, because it's a readahead operation.
> > 
> 
> But will this readahead be not initiated in the context of the task taking
> the page fault?
> 
> handle_pte_fault()
> 	do_swap_page()
> 		swapin_readahead()
> 			read_swap_cache_async()
> 
> If yes, then swap reads issued will still be in the context of process and
> we should be fine?

Right. I was trying to say that the current task may swap-in also pages
belonging to a different task, so from a certain point of view it's not
so fair to charge the current task for the whole activity. But ok, I
think it's a minor issue.

> 
> > Anyway, this is a minor corner case I think. And probably it is safe to
> > consider this like any other read IO and get rid of the
> > blkio_cgroup_set_owner().
> 
> Agreed.
> 
> > 
> > I wonder if it would be better to attach the blkio_cgroup to the
> > anonymous page only when swap-out occurs.
> 
> Swap seems to be an interesting case in general. Somebody raised this
> question on lwn io controller article also. A user process never asked
> for swap activity. It is something enforced by kernel. So while doing
> some swap outs, it does not seem too fair to charge the write out to
> the process page belongs to and the fact of the matter may be that there
> is some other memory hungry application which is forcing these swap outs.
> 
> Keeping this in mind, should swap activity be considered as system
> activity and be charged to root group instead of to user tasks in other
> cgroups?

In this case I assume the swap-in activity should be charged to the root
cgroup as well.

Anyway, in the logic of the memory and swap control it would seem
reasonable to provide IO separation also for the swap IO activity.

In the MEMHOG example, it would be unfair if the memory pressure is
caused by a task in another cgroup, but with memory and swap isolation a
memory pressure condition can only be caused by a memory hog that runs
in the same cgroup. From this point of view it seems more fair to
consider the swap activity as the particular cgroup IO activity, instead
of charging always the root cgroup.

Otherwise, I suspect, memory pressure would be a simple way to blow away
any kind of QoS guarantees provided by the IO controller.

>   
> > I mean, just put the
> > blkio_cgroup_set_owner() hook in try_to_umap() in order to keep track of
> > the IO generated by direct reclaim of anon memory. For all the other
> > cases we can simply use the submitting task's context.
> > 
> > BTW, O_DIRECT is another case that is possible to optimize, because all
> > the bios generated by direct IO occur in the same context of the current
> > task.
> 
> Agreed about the direct IO optimization.
> 
> Ryo, what do you think? would you like to do include these optimizations
> by the Andrea in next version of IO tracking patches?
>  
> Thanks
> Vivek

Thanks,
-Andrea
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list