[Devel] Re: [PATCH] io-controller: Add io group reference handling for request

Vivek Goyal vgoyal at redhat.com
Mon May 18 07:01:14 PDT 2009


On Sun, May 17, 2009 at 12:26:06PM +0200, Andrea Righi wrote:
> On Fri, May 15, 2009 at 10:06:43AM -0400, Vivek Goyal wrote:
> > On Fri, May 15, 2009 at 09:48:40AM +0200, Andrea Righi wrote:
> > > On Fri, May 15, 2009 at 01:15:24PM +0800, Gui Jianfeng wrote:
> > > > Vivek Goyal wrote:
> > > > ...
> > > > >  }
> > > > > @@ -1462,20 +1462,27 @@ struct io_cgroup *get_iocg_from_bio(stru
> > > > >  /*
> > > > >   * Find the io group bio belongs to.
> > > > >   * If "create" is set, io group is created if it is not already present.
> > > > > + * If "curr" is set, io group is information is searched for current
> > > > > + * task and not with the help of bio.
> > > > > + *
> > > > > + * FIXME: Can we assume that if bio is NULL then lookup group for current
> > > > > + * task and not create extra function parameter ?
> > > > >   *
> > > > > - * Note: There is a narrow window of race where a group is being freed
> > > > > - * by cgroup deletion path and some rq has slipped through in this group.
> > > > > - * Fix it.
> > > > >   */
> > > > > -struct io_group *io_get_io_group_bio(struct request_queue *q, struct bio *bio,
> > > > > -					int create)
> > > > > +struct io_group *io_get_io_group(struct request_queue *q, struct bio *bio,
> > > > > +					int create, int curr)
> > > > 
> > > >   Hi Vivek,
> > > > 
> > > >   IIUC we can get rid of curr, and just determine iog from bio. If bio is not NULL,
> > > >   get iog from bio, otherwise get it from current task.
> > > 
> > > Consider also that get_cgroup_from_bio() is much more slow than
> > > task_cgroup() and need to lock/unlock_page_cgroup() in
> > > get_blkio_cgroup_id(), while task_cgroup() is rcu protected.
> > > 
> > 
> > True.
> > 
> > > BTW another optimization could be to use the blkio-cgroup functionality
> > > only for dirty pages and cut out some blkio_set_owner(). For all the
> > > other cases IO always occurs in the same context of the current task,
> > > and you can use task_cgroup().
> > > 
> > 
> > Yes, may be in some cases we can avoid setting page owner. I will get
> > to it once I have got functionality going well. In the mean time if
> > you have a patch for it, it will be great.
> > 
> > > However, this is true only for page cache pages, for IO generated by
> > > anonymous pages (swap) you still need the page tracking functionality
> > > both for reads and writes.
> > > 
> > 
> > Right now I am assuming that all the sync IO will belong to task
> > submitting the bio hence use task_cgroup() for that. Only for async
> > IO, I am trying to use page tracking functionality to determine the owner.
> > Look at elv_bio_sync(bio).
> > 
> > You seem to be saying that there are cases where even for sync IO, we
> > can't use submitting task's context and need to rely on page tracking
> > functionlity? In case of getting page (read) from swap, will it not happen
> > in the context of process who will take a page fault and initiate the
> > swap read?
> 
> No, for example in read_swap_cache_async():
> 
> @@ -308,6 +309,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>  		 */
>  		__set_page_locked(new_page);
>  		SetPageSwapBacked(new_page);
> +		blkio_cgroup_set_owner(new_page, current->mm);
>  		err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
>  		if (likely(!err)) {
>  			/*
> 
> This is a read, but the current task is not always the owner of this
> swap cache page, because it's a readahead operation.
> 

But will this readahead be not initiated in the context of the task taking
the page fault?

handle_pte_fault()
	do_swap_page()
		swapin_readahead()
			read_swap_cache_async()

If yes, then swap reads issued will still be in the context of process and
we should be fine?

> Anyway, this is a minor corner case I think. And probably it is safe to
> consider this like any other read IO and get rid of the
> blkio_cgroup_set_owner().

Agreed.

> 
> I wonder if it would be better to attach the blkio_cgroup to the
> anonymous page only when swap-out occurs.

Swap seems to be an interesting case in general. Somebody raised this
question on lwn io controller article also. A user process never asked
for swap activity. It is something enforced by kernel. So while doing
some swap outs, it does not seem too fair to charge the write out to
the process page belongs to and the fact of the matter may be that there
is some other memory hungry application which is forcing these swap outs.

Keeping this in mind, should swap activity be considered as system
activity and be charged to root group instead of to user tasks in other
cgroups?
  
> I mean, just put the
> blkio_cgroup_set_owner() hook in try_to_umap() in order to keep track of
> the IO generated by direct reclaim of anon memory. For all the other
> cases we can simply use the submitting task's context.
> 
> BTW, O_DIRECT is another case that is possible to optimize, because all
> the bios generated by direct IO occur in the same context of the current
> task.

Agreed about the direct IO optimization.

Ryo, what do you think? would you like to do include these optimizations
by the Andrea in next version of IO tracking patches?
 
Thanks
Vivek
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list