diff options
Diffstat (limited to 'genpd/genpd_performance_states.txt')
-rw-r--r-- | genpd/genpd_performance_states.txt | 157 |
1 files changed, 157 insertions, 0 deletions
diff --git a/genpd/genpd_performance_states.txt b/genpd/genpd_performance_states.txt new file mode 100644 index 0000000..89aade9 --- /dev/null +++ b/genpd/genpd_performance_states.txt @@ -0,0 +1,157 @@ +Active state management of power domains +======================================== + +The Linux kernel power domains are used to group devices that share clock or +other power resources and are all enabled or disabled together; Though these +devices may further have fine-grained control over individual resources. Power +domains can be nested; The nested domain is called as sub-domain of the master +domain. + +The power domains support a limited number of operations today, most of which +eventually resolve to enabling or disabling the power domain; Though the generic +power domains (aka genpd) support idle states the of power domains as well. The +4.15 kernel release, though, will enhance the generic power domain core to +support active state management of the generic power domains. + +Some platforms have the capability to control the active states of their power +domains. The active states of power domains are called as `performance states` +within the Linux kernel. The performance states (within the genpd core) are +identified by positive integer values; A lower value represents a lower +performance state. All the devices controlled by a power domain can vote for a +target performance state, based on their own requirements, and the power domain +will get configured to the highest target performance state requested by its +devices. The performance state zero is special; Devices can request for +performance state zero if they want to drop their vote, i.e. They do not want to +get considered in finding the target performance state of the power domain. + +The following helper is introduced for a device to request a performance state +for its power domain. + +.... + int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state); +.... + +Here, `dev` is the pointer to the device structure and `state` is the target +performance state of the power domain that controls the device. Once called, +this updates the performance state constraint of the device on its PM domain. +Following that the genpd core finds the next performance state of the genpd +based on the requests from the devices the genpd controls, and then updates the +performance state of the power domain in a platform dependent way. This +happens synchronously and the performance state of the power domain is updated, +if required, before this helper returns. `dev_pm_genpd_set_performance_state()` +returns zero on success and an error number otherwise; Return value `-ENODEV` +is special and is returned if the power domain of the device doesn't support +configuring performance states. + +On a call to `dev_pm_genpd_set_performance_state()`, the genpd core calls the +power domain specific callback (described below) if the performance state of the +power domain needs to be updated. This callback must be supplied by the power +domain drivers that support configuring performance states. + +.... + struct generic_pm_domain { + ... + + int (*set_performance_state)(struct generic_pm_domain *genpd, unsigned int state); + + ... + }; +.... + +Here, `genpd` is the generic power domain and `state` is the target performance +state based on the requests from all the devices managed by the `genpd`. As +pointed out earlier, if the genpd doesn't have this callback set, the helper +`dev_pm_genpd_set_performance_state()` would return `-ENODEV`. + +The mechanism by which the performance state of a power domain is changed is +left for the implementation and is platform dependent. For some platforms the +`set_performance_state()` callback may configure some regulator(s) +and/or clock(s), which are also managed by Linux. While in other cases the +`set_performance_state()` callback may end up informing the firmware running on +an external processor (not managed by Linux) about the target performance state, +which eventually may program the power resources locally. + +Also note that in the current implementation, performance state updates aren't +propagated to master domains from sub-domains and only devices (i.e. no +sub-domains) directly controlled by the power domain are considered while +finding its effective performance state. The reason being none of the current +hardware designs have such a configuration that need this feature. And more +thought needs to be put on that for various reasons. For example, there may not +be one-to-one mapping between performance states of sub-domains and their master +domains. We can also have multiple master domains for a sub-domain and the +master domains may need to be configured to different performance states for a +single performance state of the sub-domain. And so this work is deferred until +the time we have hardware that needs it. + +Interaction with OPP layer +-------------------------- + +While a lot of devices do not need to change their performance state +requirements on the fly, there are few that do based on their own operating +performance point (OPP). Example of such a device can be Multi Media Card (MMC) +controller or a CPU. + +Devices with fixed performance state requirements can call +`dev_pm_genpd_set_performance_state()` just once, while they are enabled by +their drivers and they don't need to worry about power domain's performance +state after that. But other devices may need to call +`dev_pm_genpd_set_performance_state()` whenever they change their OPP, if the +performance state is different for the new OPP. The OPP core is enhanced to +store a performance state corresponding to each OPP node of the device and can +do the conversion from an OPP to device's power domain's performance state now. +The OPP core helper `dev_pm_opp_set_rate()` (described +link:https://lwn.net/Articles/718632/[previously]) is also updated to handle +performance state updates automatically along with clock and regulator updates. + +Ideally, the OPP core should get this information from the device tree (DT) +somehow, but after several rounds of +link:https://marc.info/?l=linux-kernel&m=149410710629056&w=2[discussion] over +LKML we decided to merge a non DT solution first and then attempt to add new DT +bindings for power domain performance states. As a result, the OPP core gained a +pair of new helpers to link device's OPP to its power domain's performance +state. + +.... + struct opp_table *dev_pm_opp_register_get_pstate_helper(struct device *dev, + int (*get_pstate)(struct device *dev, unsigned long rate)); +.... + +Here, `dev` is the pointer to the device structure and `get_pstate()` is the +platform specific callback that takes the device pointer `dev` and its clock +`rate` as arguments and returns performance state corresponding to device's +`rate` on success or an error number on failure. +`dev_pm_opp_register_get_pstate_helper()` returns pointer to the OPP table on +success and an error number (cast as pointer) on failure. It must be called +before any OPPs are added for the device, as the OPP core calls this callback +while OPPs are added to get performance state corresponding to OPPs (and hence +target frequencies). `dev_pm_opp_unregister_get_pstate_helper()` takes a +reference of the OPP table and that must be put (so that the table can get freed +once we don't need it anymore) with the help of following helper: + +.... + void dev_pm_opp_unregister_get_pstate_helper(struct opp_table *opp_table); +.... + +Here, `opp_table` is the pointer to the OPP table, earlier returned by +`dev_pm_opp_register_get_pstate_helper()`. + +Note that the above pair of helpers are added temporarily to the OPP core to +support initial platforms, that need to configure performance states of power +domains. These helpers will get removed once we have proper DT bindings (and +corresponding kernel code) in place. + +The basic infrastructure is in place now to implement platform specific power +domain drivers that allow configuring performance state and its time to take +this work to the next level. The +link:https://marc.info/?l=linux-kernel&m=150945404818511&w=2[proposal] for DT +bindings to get the performances state information is already posted on LKML and +code updates will be sent once DT bindings are merged. In future, we may also +want to drive the devices controlled by a power domain at the highest OPP +permitted by the current performance state of the power domain. For example, a +device may have requested performance state 5 as it needs to run at 900 MHz +currently, but because of the votes from other devices (controlled by the same +power domain) the effective performance state selected is 8. At this point it +maybe better, power and performance wise, to run the device at 1.3 GHz (highest +device OPP supported at performance state 8) as that may not result in lot of +power consumption as the power domain is already configured for state 8. But +yeah, it needs more thinking and work is in progress for that. |