genpd

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
author: Viresh Kumar <viresh.kumar@linaro.org> 2017-12-15 11:05:31 +0530
committer: Viresh Kumar <viresh.kumar@linaro.org> 2017-12-15 11:05:31 +0530
commit: 8b11c875e52a0fbe54c744876dc8f6f022909f4c (patch)
tree: f2f758e27140ec67ee175d2705b7257506fb0e59
parent: b6bff70778cf7b8686d7872898684562f9fe7097 (diff)
2 files changed, 303 insertions, 0 deletions
diff --git a/genpd/genpd_performance_states.html b/genpd/genpd_performance_states.html
new file mode 100644
index 0000000..1069ed9
--- /dev/null
+++ b/genpd/genpd_performance_states.html
@@ -0,0 +1,146 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta name="generator" content="AsciiDoc 8.6.9">
+<title>Active state management of power domains</title>
+</head>
+<body>
+<h1>Active state management of power domains</h1>
+<p>
+</p>
+<a name="preamble"></a>
+<p>The Linux kernel power domains are used to group devices that share clock or
+other power resources and are all enabled or disabled together; Though these
+devices may further have fine-grained control over individual resources. Power
+domains can be nested; The nested domain is called as sub-domain of the master
+domain.</p>
+<p>The power domains support a limited number of operations today, most of which
+eventually resolve to enabling or disabling the power domain; Though the generic
+power domains (aka genpd) support idle states the of power domains as well. The
+4.15 kernel release, though, will enhance the generic power domain core to
+support active state management of the generic power domains.</p>
+<p>Some platforms have the capability to control the active states of their power
+domains. The active states of power domains are called as <code>performance states</code>
+within the Linux kernel. The performance states (within the genpd core) are
+identified by positive integer values; A lower value represents a lower
+performance state. All the devices controlled by a power domain can vote for a
+target performance state, based on their own requirements, and the power domain
+will get configured to the highest target performance state requested by its
+devices. The performance state zero is special; Devices can request for
+performance state zero if they want to drop their vote, i.e. They do not want to
+get considered in finding the target performance state of the power domain.</p>
+<p>The following helper is introduced for a device to request a performance state
+for its power domain.</p>
+<pre><code>    int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state);</code></pre>
+<p>Here, <code>dev</code> is the pointer to the device structure and <code>state</code> is the target
+performance state of the power domain that controls the device. Once called,
+this updates the performance state constraint of the device on its PM domain.
+Following that the genpd core finds the next performance state of the genpd
+based on the requests from the devices the genpd controls, and then updates the
+performance state of the power domain in a platform dependent way. This
+happens synchronously and the performance state of the power domain is updated,
+if required, before this helper returns. <code>dev_pm_genpd_set_performance_state()</code>
+returns zero on success and an error number otherwise; Return value <code>-ENODEV</code>
+is special and is returned if the power domain of the device doesn&#8217;t support
+configuring performance states.</p>
+<p>On a call to <code>dev_pm_genpd_set_performance_state()</code>, the genpd core calls the
+power domain specific callback (described below) if the performance state of the
+power domain needs to be updated. This callback must be supplied by the power
+domain drivers that support configuring performance states.</p>
+<pre><code>    struct generic_pm_domain {
+        ...
+
+        int (*set_performance_state)(struct generic_pm_domain *genpd, unsigned int state);
+
+        ...
+    };</code></pre>
+<p>Here, <code>genpd</code> is the generic power domain and <code>state</code> is the target performance
+state based on the requests from all the devices managed by the <code>genpd</code>. As
+pointed out earlier, if the genpd doesn&#8217;t have this callback set, the helper
+<code>dev_pm_genpd_set_performance_state()</code> would return <code>-ENODEV</code>.</p>
+<p>The mechanism by which the performance state of a power domain is changed is
+left for the implementation and is platform dependent. For some platforms the
+<code>set_performance_state()</code> callback may configure some regulator(s)
+and/or clock(s), which are also managed by Linux. While in other cases the
+<code>set_performance_state()</code> callback may end up informing the firmware running on
+an external processor (not managed by Linux) about the target performance state,
+which eventually may program the power resources locally.</p>
+<p>Also note that in the current implementation, performance state updates aren&#8217;t
+propagated to master domains from sub-domains and only devices (i.e. no
+sub-domains) directly controlled by the power domain are considered while
+finding its effective performance state. The reason being none of the current
+hardware designs have such a configuration that need this feature. And more
+thought needs to be put on that for various reasons. For example, there may not
+be one-to-one mapping between performance states of sub-domains and their master
+domains. We can also have multiple master domains for a sub-domain and the
+master domains may need to be configured to different performance states for a
+single performance state of the sub-domain. And so this work is deferred until
+the time we have hardware that needs it.</p>
+<hr>
+<h2><a name="_interaction_with_opp_layer"></a>Interaction with OPP layer</h2>
+<p>While a lot of devices do not need to change their performance state
+requirements on the fly, there are few that do based on their own operating
+performance point (OPP). Example of such a device can be Multi Media Card (MMC)
+controller or a CPU.</p>
+<p>Devices with fixed performance state requirements can call
+<code>dev_pm_genpd_set_performance_state()</code> just once, while they are enabled by
+their drivers and they don&#8217;t need to worry about power domain&#8217;s performance
+state after that. But other devices may need to call
+<code>dev_pm_genpd_set_performance_state()</code> whenever they change their OPP, if the
+performance state is different for the new OPP. The OPP core is enhanced to
+store a performance state corresponding to each OPP node of the device and can
+do the conversion from an OPP to device&#8217;s power domain&#8217;s performance state now.
+The OPP core helper <code>dev_pm_opp_set_rate()</code> (described
+<a href="https://lwn.net/Articles/718632/">previously</a>) is also updated to handle
+performance state updates automatically along with clock and regulator updates.</p>
+<p>Ideally, the OPP core should get this information from the device tree (DT)
+somehow, but after several rounds of
+<a href="https://marc.info/?l=linux-kernel&amp;m=149410710629056&amp;w=2">discussion</a> over
+LKML we decided to merge a non DT solution first and then attempt to add new DT
+bindings for power domain performance states. As a result, the OPP core gained a
+pair of new helpers to link device&#8217;s OPP to its power domain&#8217;s performance
+state.</p>
+<pre><code>    struct opp_table *dev_pm_opp_register_get_pstate_helper(struct device *dev,
+                    int (*get_pstate)(struct device *dev, unsigned long rate));</code></pre>
+<p>Here, <code>dev</code> is the pointer to the device structure and <code>get_pstate()</code> is the
+platform specific callback that takes the device pointer <code>dev</code> and its clock
+<code>rate</code> as arguments and returns performance state corresponding to device&#8217;s
+<code>rate</code> on success or an error number on failure.
+<code>dev_pm_opp_register_get_pstate_helper()</code> returns pointer to the OPP table on
+success and an error number (cast as pointer) on failure. It must be called
+before any OPPs are added for the device, as the OPP core calls this callback
+while OPPs are added to get performance state corresponding to OPPs (and hence
+target frequencies). <code>dev_pm_opp_unregister_get_pstate_helper()</code> takes a
+reference of the OPP table and that must be put (so that the table can get freed
+once we don&#8217;t need it anymore) with the help of following helper:</p>
+<pre><code>    void dev_pm_opp_unregister_get_pstate_helper(struct opp_table *opp_table);</code></pre>
+<p>Here, <code>opp_table</code> is the pointer to the OPP table, earlier returned by
+<code>dev_pm_opp_register_get_pstate_helper()</code>.</p>
+<p>Note that the above pair of helpers are added temporarily to the OPP core to
+support initial platforms, that need to configure performance states of power
+domains. These helpers will get removed once we have proper DT bindings (and
+corresponding kernel code) in place.</p>
+<p>The basic infrastructure is in place now to implement platform specific power
+domain drivers that allow configuring performance state and its time to take
+this work to the next level. The
+<a href="https://marc.info/?l=linux-kernel&amp;m=150945404818511&amp;w=2">proposal</a> for DT
+bindings to get the performances state information is already posted on LKML and
+code updates will be sent once DT bindings are merged. In future, we may also
+want to drive the devices controlled by a power domain at the highest OPP
+permitted by the current performance state of the power domain. For example, a
+device may have requested performance state 5 as it needs to run at 900 MHz
+currently, but because of the votes from other devices (controlled by the same
+power domain) the effective performance state selected is 8. At this point it
+maybe better, power and performance wise, to run the device at 1.3 GHz (highest
+device OPP supported at performance state 8) as that may not result in lot of
+power consumption as the power domain is already configured for state 8. But
+yeah, it needs more thinking and work is in progress for that.</p>
+<p></p>
+<p></p>
+<hr><p><small>
+Last updated
+ 2017-11-14 16:01:10 IST
+</small></p>
+</body>
+</html>
diff --git a/genpd/genpd_performance_states.txt b/genpd/genpd_performance_states.txt
new file mode 100644
index 0000000..89aade9
--- /dev/null
+++ b/genpd/genpd_performance_states.txt
@@ -0,0 +1,157 @@
+Active state management of power domains
+========================================
+
+The Linux kernel power domains are used to group devices that share clock or
+other power resources and are all enabled or disabled together; Though these
+devices may further have fine-grained control over individual resources. Power
+domains can be nested; The nested domain is called as sub-domain of the master
+domain.
+
+The power domains support a limited number of operations today, most of which
+eventually resolve to enabling or disabling the power domain; Though the generic
+power domains (aka genpd) support idle states the of power domains as well. The
+4.15 kernel release, though, will enhance the generic power domain core to
+support active state management of the generic power domains.
+
+Some platforms have the capability to control the active states of their power
+domains. The active states of power domains are called as `performance states`
+within the Linux kernel. The performance states (within the genpd core) are
+identified by positive integer values; A lower value represents a lower
+performance state. All the devices controlled by a power domain can vote for a
+target performance state, based on their own requirements, and the power domain
+will get configured to the highest target performance state requested by its
+devices. The performance state zero is special; Devices can request for
+performance state zero if they want to drop their vote, i.e. They do not want to
+get considered in finding the target performance state of the power domain.
+
+The following helper is introduced for a device to request a performance state
+for its power domain.
+
+....
+    int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state);
+....
+
+Here, `dev` is the pointer to the device structure and `state` is the target
+performance state of the power domain that controls the device. Once called,
+this updates the performance state constraint of the device on its PM domain.
+Following that the genpd core finds the next performance state of the genpd
+based on the requests from the devices the genpd controls, and then updates the
+performance state of the power domain in a platform dependent way. This
+happens synchronously and the performance state of the power domain is updated,
+if required, before this helper returns. `dev_pm_genpd_set_performance_state()`
+returns zero on success and an error number otherwise; Return value `-ENODEV`
+is special and is returned if the power domain of the device doesn't support
+configuring performance states.
+
+On a call to `dev_pm_genpd_set_performance_state()`, the genpd core calls the
+power domain specific callback (described below) if the performance state of the
+power domain needs to be updated. This callback must be supplied by the power
+domain drivers that support configuring performance states.
+
+....
+    struct generic_pm_domain {
+        ...
+
+	int (*set_performance_state)(struct generic_pm_domain *genpd, unsigned int state);
+
+	...
+    };
+....
+
+Here, `genpd` is the generic power domain and `state` is the target performance
+state based on the requests from all the devices managed by the `genpd`. As
+pointed out earlier, if the genpd doesn't have this callback set, the helper
+`dev_pm_genpd_set_performance_state()` would return `-ENODEV`.
+
+The mechanism by which the performance state of a power domain is changed is
+left for the implementation and is platform dependent. For some platforms the
+`set_performance_state()` callback may configure some regulator(s)
+and/or clock(s), which are also managed by Linux. While in other cases the
+`set_performance_state()` callback may end up informing the firmware running on
+an external processor (not managed by Linux) about the target performance state,
+which eventually may program the power resources locally.
+
+Also note that in the current implementation, performance state updates aren't
+propagated to master domains from sub-domains and only devices (i.e. no
+sub-domains) directly controlled by the power domain are considered while
+finding its effective performance state. The reason being none of the current
+hardware designs have such a configuration that need this feature. And more
+thought needs to be put on that for various reasons. For example, there may not
+be one-to-one mapping between performance states of sub-domains and their master
+domains. We can also have multiple master domains for a sub-domain and the
+master domains may need to be configured to different performance states for a
+single performance state of the sub-domain. And so this work is deferred until
+the time we have hardware that needs it.
+
+Interaction with OPP layer
+--------------------------
+
+While a lot of devices do not need to change their performance state
+requirements on the fly, there are few that do based on their own operating
+performance point (OPP). Example of such a device can be Multi Media Card (MMC)
+controller or a CPU.
+
+Devices with fixed performance state requirements can call
+`dev_pm_genpd_set_performance_state()` just once, while they are enabled by
+their drivers and they don't need to worry about power domain's performance
+state after that. But other devices may need to call
+`dev_pm_genpd_set_performance_state()` whenever they change their OPP, if the
+performance state is different for the new OPP. The OPP core is enhanced to
+store a performance state corresponding to each OPP node of the device and can
+do the conversion from an OPP to device's power domain's performance state now.
+The OPP core helper `dev_pm_opp_set_rate()` (described
+link:https://lwn.net/Articles/718632/[previously]) is also updated to handle
+performance state updates automatically along with clock and regulator updates.
+
+Ideally, the OPP core should get this information from the device tree (DT)
+somehow, but after several rounds of
+link:https://marc.info/?l=linux-kernel&m=149410710629056&w=2[discussion] over
+LKML we decided to merge a non DT solution first and then attempt to add new DT
+bindings for power domain performance states. As a result, the OPP core gained a
+pair of new helpers to link device's OPP to its power domain's performance
+state.
+
+....
+    struct opp_table *dev_pm_opp_register_get_pstate_helper(struct device *dev,
+		    int (*get_pstate)(struct device *dev, unsigned long rate));
+....
+
+Here, `dev` is the pointer to the device structure and `get_pstate()` is the
+platform specific callback that takes the device pointer `dev` and its clock
+`rate` as arguments and returns performance state corresponding to device's
+`rate` on success or an error number on failure.
+`dev_pm_opp_register_get_pstate_helper()` returns pointer to the OPP table on
+success and an error number (cast as pointer) on failure. It must be called
+before any OPPs are added for the device, as the OPP core calls this callback
+while OPPs are added to get performance state corresponding to OPPs (and hence
+target frequencies). `dev_pm_opp_unregister_get_pstate_helper()` takes a
+reference of the OPP table and that must be put (so that the table can get freed
+once we don't need it anymore) with the help of following helper:
+
+....
+    void dev_pm_opp_unregister_get_pstate_helper(struct opp_table *opp_table);
+....
+
+Here, `opp_table` is the pointer to the OPP table, earlier returned by
+`dev_pm_opp_register_get_pstate_helper()`.
+
+Note that the above pair of helpers are added temporarily to the OPP core to
+support initial platforms, that need to configure performance states of power
+domains. These helpers will get removed once we have proper DT bindings (and
+corresponding kernel code) in place.
+
+The basic infrastructure is in place now to implement platform specific power
+domain drivers that allow configuring performance state and its time to take
+this work to the next level. The
+link:https://marc.info/?l=linux-kernel&m=150945404818511&w=2[proposal] for DT
+bindings to get the performances state information is already posted on LKML and
+code updates will be sent once DT bindings are merged. In future, we may also
+want to drive the devices controlled by a power domain at the highest OPP
+permitted by the current performance state of the power domain. For example, a
+device may have requested performance state 5 as it needs to run at 900 MHz
+currently, but because of the votes from other devices (controlled by the same
+power domain) the effective performance state selected is 8. At this point it
+maybe better, power and performance wise, to run the device at 1.3 GHz (highest
+device OPP supported at performance state 8) as that may not result in lot of
+power consumption as the power domain is already configured for state 8. But
+yeah, it needs more thinking and work is in progress for that.
author	Viresh Kumar <viresh.kumar@linaro.org>	2017-12-15 11:05:31 +0530
committer	Viresh Kumar <viresh.kumar@linaro.org>	2017-12-15 11:05:31 +0530
commit	8b11c875e52a0fbe54c744876dc8f6f022909f4c (patch)
tree	f2f758e27140ec67ee175d2705b7257506fb0e59
parent	b6bff70778cf7b8686d7872898684562f9fe7097 (diff)