[Devel] [PATCH RFC] fsio: filesystem io accounting cgroup

Tue Jul 9 08:08:15 PDT 2013

Hello, Vivek.

On Tue, Jul 09, 2013 at 10:54:30AM -0400, Vivek Goyal wrote:
> It is not clear whether counting bio or counting request is right
> thing to do here. It depends where you are trying to throttle. For
> bio based drivers there is request and they need throttling mechanism
> too. So keeping it common for both, kind of makes sense.

It gets weird because we may end up with wildy disagreeing statistics
from queue and the resource management.  It should have been part of
request_queue not something sitting on top.  Note that with
multi-queue support, we're unlikely to need bio based drivers except
for the stacking ones.

> Ok, so first of all you agree that time slice management is not a
> requirement for fast devices.

Not fast, but consistent.

> So time slice management is a problem even on slow devices which implement
> NCQ. IIRC, in the beginning even CFQ as doing some kind of request
> management (and not time slice management). And later it switched to
> time slice management in an effort to provide better fairness (If somebody
> is doing random IO and seek takes more time the process should be
> accounted for it).
> 
> But ideal time slice accounting requires driving a queue depth of 1
> and for any non-sequential IO, it kills performance.

Yeap, complete control only works with qd == 1 and even then write
buffering will throw you off.  But even w/ qd > 1 and write buffering,
time slice is fundamentally right thing to manage and than iops for
disks - e.g. you want to group IOs from the same issuer in the same
time slice even if the time accounting for that is not accurate so
that you can size the slice according to the operating characteristics
of the device and do things like idling inbetween.

> Seriously, time slice accounting is one way of managing resource. Same
> disk resource can be divided proportionally by counting either iops
> or by counting amount of IO done (bandwidth).

In practice, bio iops based proportional control becomes almost
completely worthless if you have any mix of random and sequential
accesses.  cfq wouldn't be accurate but it'd be *far* closer than
anything based on iops.

> If we count iops or bandwidth, it might not be most fair way of doing
> things on rotational media but it also should provide more accurate
> results in case of NCQ. When multiple requests have been dispatched
> to disk we have no idea which request consumed how much of disk time.
> So there is no way to account it properly. Iops or bandwidth based
> accounting will work just fine even with NCQ.

Sure, if iops or bw is what you explicitly want to control with hard
limits, it's fine, but doing proportional control with that on
rotating disk is just silly.

> So you want this generic block layer proportional implementation to
> do time slice management?
> 
> I thought we talked about this implementation to use some kind of
> token based mechanism so that it scales better on faster
> devices. And on slower devices one will continue to use CFQ.

I want to leave rotating disk proportional control to cfq-iosched for
as long as it matters and do iops / bw based things in the generic
layer.

Thanks.

-- 
tejun