path: root/Documentation
diff options
authorLinus Torvalds <torvalds@linux-foundation.org>2013-02-28 12:52:24 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2013-02-28 12:52:24 -0800
commitee89f81252179dcbf6cd65bd48299f5e52292d88 (patch)
tree805846cd12821f84cfe619d44c9e3e36e0b0f9e6 /Documentation
parent21f3b24da9328415792efc780f50b9f434c12465 (diff)
parentde33127d8d3f1d570aad8c2223cd81b206636bc1 (diff)
Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block
Pull block IO core bits from Jens Axboe: "Below are the core block IO bits for 3.9. It was delayed a few days since my workstation kept crashing every 2-8h after pulling it into current -git, but turns out it is a bug in the new pstate code (divide by zero, will report separately). In any case, it contains: - The big cfq/blkcg update from Tejun and and Vivek. - Additional block and writeback tracepoints from Tejun. - Improvement of the should sort (based on queues) logic in the plug flushing. - _io() variants of the wait_for_completion() interface, using io_schedule() instead of schedule() to contribute to io wait properly. - Various little fixes. You'll get two trivial merge conflicts, which should be easy enough to fix up" Fix up the trivial conflicts due to hlist traversal cleanups (commit b67bfe0d42ca: "hlist: drop the node parameter from iterators"). * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits) block: remove redundant check to bd_openers() block: use i_size_write() in bd_set_size() cfq: fix lock imbalance with failed allocations drivers/block/swim3.c: fix null pointer dereference block: don't select PERCPU_RWSEM block: account iowait time when waiting for completion of IO request sched: add wait_for_completion_io[_timeout] writeback: add more tracepoints block: add block_{touch|dirty}_buffer tracepoint buffer: make touch_buffer() an exported function block: add @req to bio_{front|back}_merge tracepoints block: add missing block_bio_complete() tracepoint block: Remove should_sort judgement when flush blk_plug block,elevator: use new hashtable implementation cfq-iosched: add hierarchical cfq_group statistics cfq-iosched: collect stats from dead cfqgs cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock block: RCU free request_queue blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() ...
Diffstat (limited to 'Documentation')
2 files changed, 82 insertions, 11 deletions
diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
index d89b4fe724d..a5eb7d19a65 100644
--- a/Documentation/block/cfq-iosched.txt
+++ b/Documentation/block/cfq-iosched.txt
@@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the
performace although this can cause the latency of some I/O to increase due
to more number of requests.
+CFQ Group scheduling
+CFQ supports blkio cgroup and has "blkio." prefixed files in each
+blkio cgroup directory. It is weight-based and there are four knobs
+for configuration - weight[_device] and leaf_weight[_device].
+Internal cgroup nodes (the ones with children) can also have tasks in
+them, so the former two configure how much proportion the cgroup as a
+whole is entitled to at its parent's level while the latter two
+configure how much proportion the tasks in the cgroup have compared to
+its direct children.
+Another way to think about it is assuming that each internal node has
+an implicit leaf child node which hosts all the tasks whose weight is
+configured by leaf_weight[_device]. Let's assume a blkio hierarchy
+composed of five cgroups - root, A, B, AA and AB - with the following
+weights where the names represent the hierarchy.
+ weight leaf_weight
+ root : 125 125
+ A : 500 750
+ B : 250 500
+ AA : 500 500
+ AB : 1000 500
+root never has a parent making its weight is meaningless. For backward
+compatibility, weight is always kept in sync with leaf_weight. B, AA
+and AB have no child and thus its tasks have no children cgroup to
+compete with. They always get 100% of what the cgroup won at the
+parent level. Considering only the weights which matter, the hierarchy
+looks like the following.
+ root
+ / | \
+ A B leaf
+ 500 250 125
+ / | \
+ AA AB leaf
+ 500 1000 750
+If all cgroups have active IOs and competing with each other, disk
+time will be distributed like the following.
+Distribution below root. The total active weight at this level is
+A:500 + B:250 + C:125 = 875.
+ root-leaf : 125 / 875 =~ 14%
+ A : 500 / 875 =~ 57%
+ B(-leaf) : 250 / 875 =~ 28%
+A has children and further distributes its 57% among the children and
+the implicit leaf node. The total active weight at this level is
+AA:500 + AB:1000 + A-leaf:750 = 2250.
+ A-leaf : ( 750 / 2250) * A =~ 19%
+ AA(-leaf) : ( 500 / 2250) * A =~ 12%
+ AB(-leaf) : (1000 / 2250) * A =~ 25%
CFQ IOPS Mode for group scheduling
Basic CFQ design is to provide priority based time slices. Higher priority
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index a794ce91a2d..da272c8f44e 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -94,13 +94,11 @@ Throttling/Upper Limit policy
Hierarchical Cgroups
-- Currently none of the IO control policy supports hierarchical groups. But
- cgroup interface does allow creation of hierarchical cgroups and internally
- IO policies treat them as flat hierarchy.
+- Currently only CFQ supports hierarchical groups. For throttling,
+ cgroup interface does allow creation of hierarchical cgroups and
+ internally it treats them as flat hierarchy.
- So this patch will allow creation of cgroup hierarchcy but at the backend
- everything will be treated as flat. So if somebody created a hierarchy like
- as follows.
+ If somebody created a hierarchy like as follows.
/ \
@@ -108,16 +106,20 @@ Hierarchical Cgroups
- CFQ and throttling will practically treat all groups at same level.
+ CFQ will handle the hierarchy correctly but and throttling will
+ practically treat all groups at same level. For details on CFQ
+ hierarchy support, refer to Documentation/block/cfq-iosched.txt.
+ Throttling will treat the hierarchy as if it looks like the
+ following.
/ / \ \
root test1 test2 test3
- Down the line we can implement hierarchical accounting/control support
- and also introduce a new cgroup file "use_hierarchy" which will control
- whether cgroup hierarchy is viewed as flat or hierarchical by the policy..
- This is how memory controller also has implemented the things.
+ Nesting cgroups, while allowed, isn't officially supported and blkio
+ genereates warning when cgroups nest. Once throttling implements
+ hierarchy support, hierarchy will be supported and the warning will
+ be removed.
Various user visible config options
@@ -172,6 +174,12 @@ Proportional weight policy files
dev weight
8:16 300
+- blkio.leaf_weight[_device]
+ - Equivalents of blkio.weight[_device] for the purpose of
+ deciding how much weight tasks in the given cgroup has while
+ competing with the cgroup's child cgroups. For details,
+ please refer to Documentation/block/cfq-iosched.txt.
- blkio.time
- disk time allocated to cgroup per device in milliseconds. First
two fields specify the major and minor number of the device and
@@ -279,6 +287,11 @@ Proportional weight policy files
and minor number of the device and third field specifies the number
of times a group was dequeued from a particular device.
+- blkio.*_recursive
+ - Recursive version of various stats. These files show the
+ same information as their non-recursive counterparts but
+ include stats from all the descendant cgroups.
Throttling/Upper limit policy files
- blkio.throttle.read_bps_device