diff options
Diffstat (limited to 'Documentation/cpu-freq')
-rw-r--r-- | Documentation/cpu-freq/boost.txt | 93 | ||||
-rw-r--r-- | Documentation/cpu-freq/core.txt | 28 | ||||
-rw-r--r-- | Documentation/cpu-freq/cpu-drivers.txt | 181 | ||||
-rw-r--r-- | Documentation/cpu-freq/cpufreq-stats.txt | 32 | ||||
-rw-r--r-- | Documentation/cpu-freq/governors.txt | 269 | ||||
-rw-r--r-- | Documentation/cpu-freq/index.txt | 22 | ||||
-rw-r--r-- | Documentation/cpu-freq/intel-pstate.txt | 222 | ||||
-rw-r--r-- | Documentation/cpu-freq/pcc-cpufreq.txt | 4 | ||||
-rw-r--r-- | Documentation/cpu-freq/user-guide.txt | 222 |
9 files changed, 143 insertions, 930 deletions
diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt deleted file mode 100644 index dd62e1334f0a..000000000000 --- a/Documentation/cpu-freq/boost.txt +++ /dev/null @@ -1,93 +0,0 @@ -Processor boosting control - - - information for users - - -Quick guide for the impatient: --------------------- -/sys/devices/system/cpu/cpufreq/boost -controls the boost setting for the whole system. You can read and write -that file with either "0" (boosting disabled) or "1" (boosting allowed). -Reading or writing 1 does not mean that the system is boosting at this -very moment, but only that the CPU _may_ raise the frequency at it's -discretion. --------------------- - -Introduction -------------- -Some CPUs support a functionality to raise the operating frequency of -some cores in a multi-core package if certain conditions apply, mostly -if the whole chip is not fully utilized and below it's intended thermal -budget. The decision about boost disable/enable is made either at hardware -(e.g. x86) or software (e.g ARM). -On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", -in technical documentation "Core performance boost". In Linux we use -the term "boost" for convenience. - -Rationale for disable switch ----------------------------- - -Though the idea is to just give better performance without any user -intervention, sometimes the need arises to disable this functionality. -Most systems offer a switch in the (BIOS) firmware to disable the -functionality at all, but a more fine-grained and dynamic control would -be desirable: -1. While running benchmarks, reproducible results are important. Since - the boosting functionality depends on the load of the whole package, - single thread performance can vary. By explicitly disabling the boost - functionality at least for the benchmark's run-time the system will run - at a fixed frequency and results are reproducible again. -2. To examine the impact of the boosting functionality it is helpful - to do tests with and without boosting. -3. Boosting means overclocking the processor, though under controlled - conditions. By raising the frequency and the voltage the processor - will consume more power than without the boosting, which may be - undesirable for instance for mobile users. Disabling boosting may - save power here, though this depends on the workload. - - -User controlled switch ----------------------- - -To allow the user to toggle the boosting functionality, the cpufreq core -driver exports a sysfs knob to enable or disable it. There is a file: -/sys/devices/system/cpu/cpufreq/boost -which can either read "0" (boosting disabled) or "1" (boosting enabled). -The file is exported only when cpufreq driver supports boosting. -Explicitly changing the permissions and writing to that file anyway will -return EINVAL. - -On supported CPUs one can write either a "0" or a "1" into this file. -This will either disable the boost functionality on all cores in the -whole system (0) or will allow the software or hardware to boost at will -(1). - -Writing a "1" does not explicitly boost the system, but just allows the -CPU to boost at their discretion. Some implementations take external -factors like the chip's temperature into account, so boosting once does -not necessarily mean that it will occur every time even using the exact -same software setup. - - -AMD legacy cpb switch ---------------------- -The AMD powernow-k8 driver used to support a very similar switch to -disable or enable the "Core Performance Boost" feature of some AMD CPUs. -This switch was instantiated in each CPU's cpufreq directory -(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". -Though the per CPU existence hints at a more fine grained control, the -actual implementation only supported a system-global switch semantics, -which was simply reflected into each CPU's file. Writing a 0 or 1 into it -would pull the other CPUs to the same state. -For compatibility reasons this file and its behavior is still supported -on AMD CPUs, though it is now protected by a config switch -(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, -even with the config option set. -This functionality is considered legacy and will be removed in some future -kernel version. - -More fine grained boosting control ----------------------------------- - -Technically it is possible to switch the boosting functionality at least -on a per package basis, for some CPUs even per core. Currently the driver -does not support it, but this may be implemented in the future. diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt index ba78e7c2a069..978463a7c81e 100644 --- a/Documentation/cpu-freq/core.txt +++ b/Documentation/cpu-freq/core.txt @@ -8,6 +8,8 @@ Dominik Brodowski <linux@brodo.de> David Kimdon <dwhedon@debian.org> + Rafael J. Wysocki <rafael.j.wysocki@intel.com> + Viresh Kumar <viresh.kumar@linaro.org> @@ -36,10 +38,11 @@ speed limits (like LCD drivers on ARM architecture). Additionally, the kernel "constant" loops_per_jiffy is updated on frequency changes here. -Reference counting is done by cpufreq_get_cpu and cpufreq_put_cpu, -which make sure that the cpufreq processor driver is correctly -registered with the core, and will not be unloaded until -cpufreq_put_cpu is called. +Reference counting of the cpufreq policies is done by cpufreq_cpu_get +and cpufreq_cpu_put, which make sure that the cpufreq driver is +correctly registered with the core, and will not be unloaded until +cpufreq_put_cpu is called. That also ensures that the respective cpufreq +policy doesn't get freed while being used. 2. CPUFreq notifiers ==================== @@ -69,18 +72,16 @@ CPUFreq policy notifier is called twice for a policy transition: The phase is specified in the second argument to the notifier. The third argument, a void *pointer, points to a struct cpufreq_policy -consisting of five values: cpu, min, max, policy and max_cpu_freq. min -and max are the lower and upper frequencies (in kHz) of the new -policy, policy the new policy, cpu the number of the affected CPU; and -max_cpu_freq the maximum supported CPU frequency. This value is given -for informational purposes only. +consisting of several values, including min, max (the lower and upper +frequencies (in kHz) of the new policy). 2.2 CPUFreq transition notifiers -------------------------------- -These are notified twice when the CPUfreq driver switches the CPU core -frequency and this change has any external implications. +These are notified twice for each online CPU in the policy, when the +CPUfreq driver switches the CPU core frequency and this change has no +any external implications. The second argument specifies the phase - CPUFREQ_PRECHANGE or CPUFREQ_POSTCHANGE. @@ -90,13 +91,14 @@ values: cpu - number of the affected CPU old - old frequency new - new frequency +flags - flags of the cpufreq driver 3. CPUFreq Table Generation with Operating Performance Point (OPP) ================================================================== For details about OPP, see Documentation/power/opp.txt dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with - cpufreq_frequency_table_cpuinfo which is provided with the list of + cpufreq_table_validate_and_show() which is provided with the list of frequencies that are available for operation. This function provides a ready to use conversion routine to translate the OPP layer's internal information about the available frequencies into a format readily @@ -110,7 +112,7 @@ dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with /* Do things */ r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); if (!r) - cpufreq_frequency_table_cpuinfo(policy, freq_table); + cpufreq_table_validate_and_show(policy, freq_table); /* Do other things */ } diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 14f4e6336d88..434c49cc7330 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -9,6 +9,8 @@ Dominik Brodowski <linux@brodo.de> + Rafael J. Wysocki <rafael.j.wysocki@intel.com> + Viresh Kumar <viresh.kumar@linaro.org> @@ -49,49 +51,65 @@ using cpufreq_register_driver() What shall this struct cpufreq_driver contain? -cpufreq_driver.name - The name of this driver. + .name - The name of this driver. -cpufreq_driver.init - A pointer to the per-CPU initialization - function. + .init - A pointer to the per-policy initialization function. -cpufreq_driver.verify - A pointer to a "verification" function. + .verify - A pointer to a "verification" function. -cpufreq_driver.setpolicy _or_ -cpufreq_driver.target/ -target_index - See below on the differences. + .setpolicy _or_ .fast_switch _or_ .target _or_ .target_index - See + below on the differences. And optionally -cpufreq_driver.exit - A pointer to a per-CPU cleanup - function called during CPU_POST_DEAD - phase of cpu hotplug process. + .flags - Hints for the cpufreq core. -cpufreq_driver.stop_cpu - A pointer to a per-CPU stop function - called during CPU_DOWN_PREPARE phase of - cpu hotplug process. + .driver_data - cpufreq driver specific data. -cpufreq_driver.resume - A pointer to a per-CPU resume function - which is called with interrupts disabled - and _before_ the pre-suspend frequency - and/or policy is restored by a call to - ->target/target_index or ->setpolicy. + .resolve_freq - Returns the most appropriate frequency for a target + frequency. Doesn't change the frequency though. -cpufreq_driver.attr - A pointer to a NULL-terminated list of - "struct freq_attr" which allow to - export values to sysfs. + .get_intermediate and target_intermediate - Used to switch to stable + frequency while changing CPU frequency. -cpufreq_driver.get_intermediate -and target_intermediate Used to switch to stable frequency while - changing CPU frequency. + .get - Returns current frequency of the CPU. + + .bios_limit - Returns HW/BIOS max frequency limitations for the CPU. + + .exit - A pointer to a per-policy cleanup function called during + CPU_POST_DEAD phase of cpu hotplug process. + + .stop_cpu - A pointer to a per-policy stop function called during + CPU_DOWN_PREPARE phase of cpu hotplug process. + + .suspend - A pointer to a per-policy suspend function which is called + with interrupts disabled and _after_ the governor is stopped for the + policy. + + .resume - A pointer to a per-policy resume function which is called + with interrupts disabled and _before_ the governor is started again. + + .ready - A pointer to a per-policy ready function which is called after + the policy is fully initialized. + + .attr - A pointer to a NULL-terminated list of "struct freq_attr" which + allow to export values to sysfs. + + .boost_enabled - If set, boost frequencies are enabled. + + .set_boost - A pointer to a per-policy function to enable/disable boost + frequencies. 1.2 Per-CPU Initialization -------------------------- Whenever a new CPU is registered with the device model, or after the -cpufreq driver registers itself, the per-CPU initialization function -cpufreq_driver.init is called. It takes a struct cpufreq_policy -*policy as argument. What to do now? +cpufreq driver registers itself, the per-policy initialization function +cpufreq_driver.init is called if no cpufreq policy existed for the CPU. +Note that the .init() and .exit() routines are called only once for the +policy and not for each CPU managed by the policy. It takes a struct +cpufreq_policy *policy as argument. What to do now? If necessary, activate the CPUfreq support on your CPU. @@ -117,47 +135,45 @@ policy->governor must contain the "default policy" for cpufreq_driver.setpolicy or cpufreq_driver.target/target_index is called with these values. +policy->cpus Update this with the masks of the + (online + offline) CPUs that do DVFS + along with this CPU (i.e. that share + clock/voltage rails with it). For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the frequency table helpers might be helpful. See the section 2 for more information on them. -SMP systems normally have same clock source for a group of cpus. For these the -.init() would be called only once for the first online cpu. Here the .init() -routine must initialize policy->cpus with mask of all possible cpus (Online + -Offline) that share the clock. Then the core would copy this mask onto -policy->related_cpus and will reset policy->cpus to carry only online cpus. - 1.3 verify ------------- +---------- When the user decides a new policy (consisting of "policy,governor,min,max") shall be set, this policy must be validated so that incompatible values can be corrected. For verifying these -values, a frequency table helper and/or the -cpufreq_verify_within_limits(struct cpufreq_policy *policy, unsigned -int min_freq, unsigned int max_freq) function might be helpful. See -section 2 for details on frequency table helpers. +values cpufreq_verify_within_limits(struct cpufreq_policy *policy, +unsigned int min_freq, unsigned int max_freq) function might be helpful. +See section 2 for details on frequency table helpers. You need to make sure that at least one valid frequency (or operating range) is within policy->min and policy->max. If necessary, increase policy->max first, and only if this is no solution, decrease policy->min. -1.4 target/target_index or setpolicy? ----------------------------- +1.4 target or target_index or setpolicy or fast_switch? +------------------------------------------------------- Most cpufreq drivers or even most cpu frequency scaling algorithms -only allow the CPU to be set to one frequency. For these, you use the -->target/target_index call. +only allow the CPU frequency to be set to predefined fixed values. For +these, you use the ->target(), ->target_index() or ->fast_switch() +callbacks. -Some cpufreq-capable processors switch the frequency between certain -limits on their own. These shall use the ->setpolicy call +Some cpufreq capable processors switch the frequency between certain +limits on their own. These shall use the ->setpolicy() callback. 1.5. target/target_index -------------- +------------------------ The target_index call has two arguments: struct cpufreq_policy *policy, and unsigned int index (into the exposed frequency table). @@ -186,9 +202,20 @@ actual frequency must be determined using the following rules: Here again the frequency table helper might assist you - see section 2 for details. +1.6. fast_switch +---------------- -1.6 setpolicy ---------------- +This function is used for frequency switching from scheduler's context. +Not all drivers are expected to implement it, as sleeping from within +this callback isn't allowed. This callback must be highly optimized to +do switching as fast as possible. + +This function has two arguments: struct cpufreq_policy *policy and +unsigned int target_frequency. + + +1.7 setpolicy +------------- The setpolicy call only takes a struct cpufreq_policy *policy as argument. You need to set the lower limit of the in-processor or @@ -198,13 +225,13 @@ setting when policy->policy is CPUFREQ_POLICY_PERFORMANCE, and a powersaving-oriented setting when CPUFREQ_POLICY_POWERSAVE. Also check the reference implementation in drivers/cpufreq/longrun.c -1.7 get_intermediate and target_intermediate +1.8 get_intermediate and target_intermediate -------------------------------------------- Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. get_intermediate should return a stable intermediate frequency platform wants to -switch to, and target_intermediate() should set CPU to to that frequency, before +switch to, and target_intermediate() should set CPU to that frequency, before jumping to the frequency corresponding to 'index'. Core will take care of sending notifications and driver doesn't have to handle them in target_intermediate() or target_index(). @@ -222,44 +249,36 @@ failures as core would send notifications for that. As most cpufreq processors only allow for being set to a few specific frequencies, a "frequency table" with some functions might assist in -some work of the processor driver. Such a "frequency table" consists -of an array of struct cpufreq_frequency_table entries, with any value in -"driver_data" you want to use, and the corresponding frequency in -"frequency". At the end of the table, you need to add a -cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. And -if you want to skip one entry in the table, set the frequency to -CPUFREQ_ENTRY_INVALID. The entries don't need to be in ascending -order. - -By calling cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy, - struct cpufreq_frequency_table *table); -the cpuinfo.min_freq and cpuinfo.max_freq values are detected, and -policy->min and policy->max are set to the same values. This is -helpful for the per-CPU initialization stage. - -int cpufreq_frequency_table_verify(struct cpufreq_policy *policy, - struct cpufreq_frequency_table *table); -assures that at least one valid frequency is within policy->min and -policy->max, and all other criteria are met. This is helpful for the -->verify call. - -int cpufreq_frequency_table_target(struct cpufreq_policy *policy, - struct cpufreq_frequency_table *table, - unsigned int target_freq, - unsigned int relation, - unsigned int *index); - -is the corresponding frequency table helper for the ->target -stage. Just pass the values to this function, and the unsigned int -index returns the number of the frequency table entry which contains -the frequency the CPU shall be set to. +some work of the processor driver. Such a "frequency table" consists of +an array of struct cpufreq_frequency_table entries, with driver specific +values in "driver_data", the corresponding frequency in "frequency" and +flags set. At the end of the table, you need to add a +cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. +And if you want to skip one entry in the table, set the frequency to +CPUFREQ_ENTRY_INVALID. The entries don't need to be in sorted in any +particular order, but if they are cpufreq core will do DVFS a bit +quickly for them as search for best match is faster. + +By calling cpufreq_table_validate_and_show(), the cpuinfo.min_freq and +cpuinfo.max_freq values are detected, and policy->min and policy->max +are set to the same values. This is helpful for the per-CPU +initialization stage. + +cpufreq_frequency_table_verify() assures that at least one valid +frequency is within policy->min and policy->max, and all other criteria +are met. This is helpful for the ->verify call. + +cpufreq_frequency_table_target() is the corresponding frequency table +helper for the ->target stage. Just pass the values to this function, +and this function returns the of the frequency table entry which +contains the frequency the CPU shall be set to. The following macros can be used as iterators over cpufreq_frequency_table: cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency table. -cpufreq-for_each_valid_entry(pos, table) - iterates over all entries, +cpufreq_for_each_valid_entry(pos, table) - iterates over all entries, excluding CPUFREQ_ENTRY_INVALID frequencies. Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and "table" - the cpufreq_frequency_table * you want to iterate over. diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt index fc647492e940..2bbe207354ed 100644 --- a/Documentation/cpu-freq/cpufreq-stats.txt +++ b/Documentation/cpu-freq/cpufreq-stats.txt @@ -34,21 +34,27 @@ cpufreq stats provides following statistics (explained in detail below). - total_trans - trans_table -All the statistics will be from the time the stats driver has been inserted -to the time when a read of a particular statistic is done. Obviously, stats -driver will not have any information about the frequency transitions before -the stats driver insertion. +All the statistics will be from the time the stats driver has been inserted +(or the time the stats were reset) to the time when a read of a particular +statistic is done. Obviously, stats driver will not have any information +about the frequency transitions before the stats driver insertion. -------------------------------------------------------------------------------- <mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # ls -l total 0 drwxr-xr-x 2 root root 0 May 14 16:06 . drwxr-xr-x 3 root root 0 May 14 15:58 .. +--w------- 1 root root 4096 May 14 16:06 reset -r--r--r-- 1 root root 4096 May 14 16:06 time_in_state -r--r--r-- 1 root root 4096 May 14 16:06 total_trans -r--r--r-- 1 root root 4096 May 14 16:06 trans_table -------------------------------------------------------------------------------- +- reset +Write-only attribute that can be used to reset the stat counters. This can be +useful for evaluating system behaviour under different governors without the +need for a reboot. + - time_in_state This gives the amount of time spent in each of the frequencies supported by this CPU. The cat output will have "<frequency> <time>" pair in each line, which @@ -103,26 +109,14 @@ Config Main Menu Power management options (ACPI, APM) ---> CPU Frequency scaling ---> [*] CPU Frequency scaling - <*> CPU frequency translation statistics - [*] CPU frequency translation statistics details + [*] CPU frequency translation statistics "CPU Frequency scaling" (CONFIG_CPU_FREQ) should be enabled to configure cpufreq-stats. "CPU frequency translation statistics" (CONFIG_CPU_FREQ_STAT) provides the -basic statistics which includes time_in_state and total_trans. - -"CPU frequency translation statistics details" (CONFIG_CPU_FREQ_STAT_DETAILS) -provides fine grained cpufreq stats by trans_table. The reason for having a -separate config option for trans_table is: -- trans_table goes against the traditional /sysfs rule of one value per - interface. It provides a whole bunch of value in a 2 dimensional matrix - form. +statistics which includes time_in_state, total_trans and trans_table. -Once these two options are enabled and your CPU supports cpufrequency, you +Once this option is enabled and your CPU supports cpufrequency, you will be able to see the CPU frequency statistics in /sysfs. - - - - diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt deleted file mode 100644 index c15aa75f5227..000000000000 --- a/Documentation/cpu-freq/governors.txt +++ /dev/null @@ -1,269 +0,0 @@ - CPU frequency and voltage scaling code in the Linux(TM) kernel - - - L i n u x C P U F r e q - - C P U F r e q G o v e r n o r s - - - information for users and developers - - - - Dominik Brodowski <linux@brodo.de> - some additions and corrections by Nico Golde <nico@ngolde.de> - - - - Clock scaling allows you to change the clock speed of the CPUs on the - fly. This is a nice method to save battery power, because the lower - the clock speed, the less power the CPU consumes. - - -Contents: ---------- -1. What is a CPUFreq Governor? - -2. Governors In the Linux Kernel -2.1 Performance -2.2 Powersave -2.3 Userspace -2.4 Ondemand -2.5 Conservative - -3. The Governor Interface in the CPUfreq Core - - - -1. What Is A CPUFreq Governor? -============================== - -Most cpufreq drivers (except the intel_pstate and longrun) or even most -cpu frequency scaling algorithms only offer the CPU to be set to one -frequency. In order to offer dynamic frequency scaling, the cpufreq -core must be able to tell these drivers of a "target frequency". So -these specific drivers will be transformed to offer a "->target/target_index" -call instead of the existing "->setpolicy" call. For "longrun", all -stays the same, though. - -How to decide what frequency within the CPUfreq policy should be used? -That's done using "cpufreq governors". Two are already in this patch --- they're the already existing "powersave" and "performance" which -set the frequency statically to the lowest or highest frequency, -respectively. At least two more such governors will be ready for -addition in the near future, but likely many more as there are various -different theories and models about dynamic frequency scaling -around. Using such a generic interface as cpufreq offers to scaling -governors, these can be tested extensively, and the best one can be -selected for each specific use. - -Basically, it's the following flow graph: - -CPU can be set to switch independently | CPU can only be set - within specific "limits" | to specific frequencies - - "CPUfreq policy" - consists of frequency limits (policy->{min,max}) - and CPUfreq governor to be used - / \ - / \ - / the cpufreq governor decides - / (dynamically or statically) - / what target_freq to set within - / the limits of policy->{min,max} - / \ - / \ - Using the ->setpolicy call, Using the ->target/target_index call, - the limits and the the frequency closest - "policy" is set. to target_freq is set. - It is assured that it - is within policy->{min,max} - - -2. Governors In the Linux Kernel -================================ - -2.1 Performance ---------------- - -The CPUfreq governor "performance" sets the CPU statically to the -highest frequency within the borders of scaling_min_freq and -scaling_max_freq. - - -2.2 Powersave -------------- - -The CPUfreq governor "powersave" sets the CPU statically to the -lowest frequency within the borders of scaling_min_freq and -scaling_max_freq. - - -2.3 Userspace -------------- - -The CPUfreq governor "userspace" allows the user, or any userspace -program running with UID "root", to set the CPU to a specific frequency -by making a sysfs file "scaling_setspeed" available in the CPU-device -directory. - - -2.4 Ondemand ------------- - -The CPUfreq governor "ondemand" sets the CPU depending on the -current usage. To do this the CPU must have the capability to -switch the frequency very quickly. There are a number of sysfs file -accessible parameters: - -sampling_rate: measured in uS (10^-6 seconds), this is how often you -want the kernel to look at the CPU usage and to make decisions on -what to do about the frequency. Typically this is set to values of -around '10000' or more. It's default value is (cmp. with users-guide.txt): -transition_latency * 1000 -Be aware that transition latency is in ns and sampling_rate is in us, so you -get the same sysfs value by default. -Sampling rate should always get adjusted considering the transition latency -To set the sampling rate 750 times as high as the transition latency -in the bash (as said, 1000 is default), do: -echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ - >ondemand/sampling_rate - -sampling_rate_min: -The sampling rate is limited by the HW transition latency: -transition_latency * 100 -Or by kernel restrictions: -If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. -If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the -limits depend on the CONFIG_HZ option: -HZ=1000: min=20000us (20ms) -HZ=250: min=80000us (80ms) -HZ=100: min=200000us (200ms) -The highest value of kernel and HW latency restrictions is shown and -used as the minimum sampling rate. - -up_threshold: defines what the average CPU usage between the samplings -of 'sampling_rate' needs to be for the kernel to make a decision on -whether it should increase the frequency. For example when it is set -to its default value of '95' it means that between the checking -intervals the CPU needs to be on average more than 95% in use to then -decide that the CPU frequency needs to be increased. - -ignore_nice_load: this parameter takes a value of '0' or '1'. When -set to '0' (its default), all processes are counted towards the -'cpu utilisation' value. When set to '1', the processes that are -run with a 'nice' value will not count (and thus be ignored) in the -overall usage calculation. This is useful if you are running a CPU -intensive calculation on your laptop that you do not care how long it -takes to complete as you can 'nice' it and prevent it from taking part -in the deciding process of whether to increase your CPU frequency. - -sampling_down_factor: this parameter controls the rate at which the -kernel makes a decision on when to decrease the frequency while running -at top speed. When set to 1 (the default) decisions to reevaluate load -are made at the same interval regardless of current clock speed. But -when set to greater than 1 (e.g. 100) it acts as a multiplier for the -scheduling interval for reevaluating load when the CPU is at its top -speed due to high load. This improves performance by reducing the overhead -of load evaluation and helping the CPU stay at its top speed when truly -busy, rather than shifting back and forth in speed. This tunable has no -effect on behavior at lower speeds/lower CPU loads. - -powersave_bias: this parameter takes a value between 0 to 1000. It -defines the percentage (times 10) value of the target frequency that -will be shaved off of the target. For example, when set to 100 -- 10%, -when ondemand governor would have targeted 1000 MHz, it will target -1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 -(disabled) by default. -When AMD frequency sensitivity powersave bias driver -- -drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter -defines the workload frequency sensitivity threshold in which a lower -frequency is chosen instead of ondemand governor's original target. -The frequency sensitivity is a hardware reported (on AMD Family 16h -Processors and above) value between 0 to 100% that tells software how -the performance of the workload running on a CPU will change when -frequency changes. A workload with sensitivity of 0% (memory/IO-bound) -will not perform any better on higher core frequency, whereas a -workload with sensitivity of 100% (CPU-bound) will perform better -higher the frequency. When the driver is loaded, this is set to 400 -by default -- for CPUs running workloads with sensitivity value below -40%, a lower frequency is chosen. Unloading the driver or writing 0 -will disable this feature. - - -2.5 Conservative ----------------- - -The CPUfreq governor "conservative", much like the "ondemand" -governor, sets the CPU depending on the current usage. It differs in -behaviour in that it gracefully increases and decreases the CPU speed -rather than jumping to max speed the moment there is any load on the -CPU. This behaviour more suitable in a battery powered environment. -The governor is tweaked in the same manner as the "ondemand" governor -through sysfs with the addition of: - -freq_step: this describes what percentage steps the cpu freq should be -increased and decreased smoothly by. By default the cpu frequency will -increase in 5% chunks of your maximum cpu frequency. You can change this -value to anywhere between 0 and 100 where '0' will effectively lock your -CPU at a speed regardless of its load whilst '100' will, in theory, make -it behave identically to the "ondemand" governor. - -down_threshold: same as the 'up_threshold' found for the "ondemand" -governor but for the opposite direction. For example when set to its -default value of '20' it means that if the CPU usage needs to be below -20% between samples to have the frequency decreased. - -sampling_down_factor: similar functionality as in "ondemand" governor. -But in "conservative", it controls the rate at which the kernel makes -a decision on when to decrease the frequency while running in any -speed. Load for frequency increase is still evaluated every -sampling rate. - -3. The Governor Interface in the CPUfreq Core -============================================= - -A new governor must register itself with the CPUfreq core using -"cpufreq_register_governor". The struct cpufreq_governor, which has to -be passed to that function, must contain the following values: - -governor->name - A unique name for this governor -governor->governor - The governor callback function -governor->owner - .THIS_MODULE for the governor module (if - appropriate) - -The governor->governor callback is called with the current (or to-be-set) -cpufreq_policy struct for that CPU, and an unsigned int event. The -following events are currently defined: - -CPUFREQ_GOV_START: This governor shall start its duty for the CPU - policy->cpu -CPUFREQ_GOV_STOP: This governor shall end its duty for the CPU - policy->cpu -CPUFREQ_GOV_LIMITS: The limits for CPU policy->cpu have changed to - policy->min and policy->max. - -If you need other "events" externally of your driver, _only_ use the -cpufreq_governor_l(unsigned int cpu, unsigned int event) call to the -CPUfreq core to ensure proper locking. - - -The CPUfreq governor may call the CPU processor driver using one of -these two functions: - -int cpufreq_driver_target(struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation); - -int __cpufreq_driver_target(struct cpufreq_policy *policy, - unsigned int target_freq, - unsigned int relation); - -target_freq must be within policy->min and policy->max, of course. -What's the difference between these two functions? When your governor -still is in a direct code path of a call to governor->governor, the -per-CPU cpufreq lock is still held in the cpufreq core, and there's -no need to lock it again (in fact, this would cause a deadlock). So -use __cpufreq_driver_target only in these cases. In all other cases -(for example, when there's a "daemonized" function that wakes up -every second), use cpufreq_driver_target to lock the cpufreq per-CPU -lock before the command is passed to the cpufreq processor driver. - diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt index dc024ab4054f..03a7cee6ac73 100644 --- a/Documentation/cpu-freq/index.txt +++ b/Documentation/cpu-freq/index.txt @@ -18,26 +18,30 @@ Documents in this directory: ---------------------------- + +amd-powernow.txt - AMD powernow driver specific file. + core.txt - General description of the CPUFreq core and - of CPUFreq notifiers + of CPUFreq notifiers. -cpu-drivers.txt - How to implement a new cpufreq processor driver +cpu-drivers.txt - How to implement a new cpufreq processor driver. -governors.txt - What are cpufreq governors and how to - implement them? +cpufreq-nforce2.txt - nVidia nForce2 platform specific file. + +cpufreq-stats.txt - General description of sysfs cpufreq stats. index.txt - File index, Mailing list and Links (this document) -user-guide.txt - User Guide to CPUFreq +intel-pstate.txt - Intel pstate cpufreq driver specific file. + +pcc-cpufreq.txt - PCC cpufreq driver specific file. Mailing List ------------ There is a CPU frequency changing CVS commit and general list where you can report bugs, problems or submit patches. To post a message, -send an email to linux-pm@vger.kernel.org, to subscribe go to -http://vger.kernel.org/vger-lists.html#linux-pm and follow the -instructions there. +send an email to linux-pm@vger.kernel.org. Links ----- @@ -48,7 +52,7 @@ how to access the CVS repository: * http://cvs.arm.linux.org.uk/ the CPUFreq Mailing list: -* http://vger.kernel.org/vger-lists.html#cpufreq +* http://vger.kernel.org/vger-lists.html#linux-pm Clock and voltage scaling for the SA-1100: * http://www.lartmaker.nl/projects/scaling diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt deleted file mode 100644 index f7b12c071d53..000000000000 --- a/Documentation/cpu-freq/intel-pstate.txt +++ /dev/null @@ -1,222 +0,0 @@ -Intel P-State driver --------------------- - -This driver provides an interface to control the P-State selection for the -SandyBridge+ Intel processors. - -The following document explains P-States: -http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf -As stated in the document, P-State doesn’t exactly mean a frequency. However, for -the sake of the relationship with cpufreq, P-State and frequency are used -interchangeably. - -Understanding the cpufreq core governors and policies are important before -discussing more details about the Intel P-State driver. Based on what callbacks -a cpufreq driver provides to the cpufreq core, it can support two types of -drivers: -- with target_index() callback: In this mode, the drivers using cpufreq core -simply provide the minimum and maximum frequency limits and an additional -interface target_index() to set the current frequency. The cpufreq subsystem -has a number of scaling governors ("performance", "powersave", "ondemand", -etc.). Depending on which governor is in use, cpufreq core will call for -transitions to a specific frequency using target_index() callback. -- setpolicy() callback: In this mode, drivers do not provide target_index() -callback, so cpufreq core can't request a transition to a specific frequency. -The driver provides minimum and maximum frequency limits and callbacks to set a -policy. The policy in cpufreq sysfs is referred to as the "scaling governor". -The cpufreq core can request the driver to operate in any of the two policies: -"performance: and "powersave". The driver decides which frequency to use based -on the above policy selection considering minimum and maximum frequency limits. - -The Intel P-State driver falls under the latter category, which implements the -setpolicy() callback. This driver decides what P-State to use based on the -requested policy from the cpufreq core. If the processor is capable of -selecting its next P-State internally, then the driver will offload this -responsibility to the processor (aka HWP: Hardware P-States). If not, the -driver implements algorithms to select the next P-State. - -Since these policies are implemented in the driver, they are not same as the -cpufreq scaling governors implementation, even if they have the same name in -the cpufreq sysfs (scaling_governors). For example the "performance" policy is -similar to cpufreq’s "performance" governor, but "powersave" is completely -different than the cpufreq "powersave" governor. The strategy here is similar -to cpufreq "ondemand", where the requested P-State is related to the system load. - -Sysfs Interface - -In addition to the frequency-controlling interfaces provided by the cpufreq -core, the driver provides its own sysfs files to control the P-State selection. -These files have been added to /sys/devices/system/cpu/intel_pstate/. -Any changes made to these files are applicable to all CPUs (even in a -multi-package system). - - max_perf_pct: Limits the maximum P-State that will be requested by - the driver. It states it as a percentage of the available performance. The - available (P-State) performance may be reduced by the no_turbo - setting described below. - - min_perf_pct: Limits the minimum P-State that will be requested by - the driver. It states it as a percentage of the max (non-turbo) - performance level. - - no_turbo: Limits the driver to selecting P-State below the turbo - frequency range. - - turbo_pct: Displays the percentage of the total performance that - is supported by hardware that is in the turbo range. This number - is independent of whether turbo has been disabled or not. - - num_pstates: Displays the number of P-States that are supported - by hardware. This number is independent of whether turbo has - been disabled or not. - -For example, if a system has these parameters: - Max 1 core turbo ratio: 0x21 (Max 1 core ratio is the maximum P-State) - Max non turbo ratio: 0x17 - Minimum ratio : 0x08 (Here the ratio is called max efficiency ratio) - -Sysfs will show : - max_perf_pct:100, which corresponds to 1 core ratio - min_perf_pct:24, max_efficiency_ratio / max 1 Core ratio - no_turbo:0, turbo is not disabled - num_pstates:26 = (max 1 Core ratio - Max Efficiency Ratio + 1) - turbo_pct:39 = (max 1 core ratio - max non turbo ratio) / num_pstates - -Refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual -Volume 3: System Programming Guide" to understand ratios. - -cpufreq sysfs for Intel P-State - -Since this driver registers with cpufreq, cpufreq sysfs is also presented. -There are some important differences, which need to be considered. - -scaling_cur_freq: This displays the real frequency which was used during -the last sample period instead of what is requested. Some other cpufreq driver, -like acpi-cpufreq, displays what is requested (Some changes are on the -way to fix this for acpi-cpufreq driver). The same is true for frequencies -displayed at /proc/cpuinfo. - -scaling_governor: This displays current active policy. Since each CPU has a -cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this -is not possible with Intel P-States, as there is one common policy for all -CPUs. Here, the last requested policy will be applicable to all CPUs. It is -suggested that one use the cpupower utility to change policy to all CPUs at the -same time. - -scaling_setspeed: This attribute can never be used with Intel P-State. - -scaling_max_freq/scaling_min_freq: This interface can be used similarly to -the max_perf_pct/min_perf_pct of Intel P-State sysfs. However since frequencies -are converted to nearest possible P-State, this is prone to rounding errors. -This method is not preferred to limit performance. - -affected_cpus: Not used -related_cpus: Not used - -For contemporary Intel processors, the frequency is controlled by the -processor itself and the P-State exposed to software is related to -performance levels. The idea that frequency can be set to a single -frequency is fictional for Intel Core processors. Even if the scaling -driver selects a single P-State, the actual frequency the processor -will run at is selected by the processor itself. - -Tuning Intel P-State driver - -When HWP mode is not used, debugfs files have also been added to allow the -tuning of the internal governor algorithm. These files are located at -/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional -Integral Derivative) controller. The PID tunable parameters are: - - deadband - d_gain_pct - i_gain_pct - p_gain_pct - sample_rate_ms - setpoint - -To adjust these parameters, some understanding of driver implementation is -necessary. There are some tweeks described here, but be very careful. Adjusting -them requires expert level understanding of power and performance relationship. -These limits are only useful when the "powersave" policy is active. - --To make the system more responsive to load changes, sample_rate_ms can -be adjusted (current default is 10ms). --To make the system use higher performance, even if the load is lower, setpoint -can be adjusted to a lower number. This will also lead to faster ramp up time -to reach the maximum P-State. -If there are no derivative and integral coefficients, The next P-State will be -equal to: - current P-State - ((setpoint - current cpu load) * p_gain_pct) - -For example, if the current PID parameters are (Which are defaults for the core -processors like SandyBridge): - deadband = 0 - d_gain_pct = 0 - i_gain_pct = 0 - p_gain_pct = 20 - sample_rate_ms = 10 - setpoint = 97 - -If the current P-State = 0x08 and current load = 100, this will result in the -next P-State = 0x08 - ((97 - 100) * 0.2) = 8.6 (rounded to 9). Here the P-State -goes up by only 1. If during next sample interval the current load doesn't -change and still 100, then P-State goes up by one again. This process will -continue as long as the load is more than the setpoint until the maximum P-State -is reached. - -For the same load at setpoint = 60, this will result in the next P-State -= 0x08 - ((60 - 100) * 0.2) = 16 -So by changing the setpoint from 97 to 60, there is an increase of the -next P-State from 9 to 16. So this will make processor execute at higher -P-State for the same CPU load. If the load continues to be more than the -setpoint during next sample intervals, then P-State will go up again till the -maximum P-State is reached. But the ramp up time to reach the maximum P-State -will be much faster when the setpoint is 60 compared to 97. - -Debugging Intel P-State driver - -Event tracing -To debug P-State transition, the Linux event tracing interface can be used. -There are two specific events, which can be enabled (Provided the kernel -configs related to event tracing are enabled). - -# cd /sys/kernel/debug/tracing/ -# echo 1 > events/power/pstate_sample/enable -# echo 1 > events/power/cpu_frequency/enable -# cat trace -gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 - scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 - freq=2474476 -cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2 - - -Using ftrace - -If function level tracing is required, the Linux ftrace interface can be used. -For example if we want to check how often a function to set a P-State is -called, we can set ftrace filter to intel_pstate_set_pstate. - -# cd /sys/kernel/debug/tracing/ -# cat available_filter_functions | grep -i pstate -intel_pstate_set_pstate -intel_pstate_cpu_init -... - -# echo intel_pstate_set_pstate > set_ftrace_filter -# echo function > current_tracer -# cat trace | head -15 -# tracer: function -# -# entries-in-buffer/entries-written: 80/80 #P:4 -# -# _-----=> irqs-off -# / _----=> need-resched -# | / _---=> hardirq/softirq -# || / _--=> preempt-depth -# ||| / delay -# TASK-PID CPU# |||| TIMESTAMP FUNCTION -# | | | |||| | | - Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func - gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func - gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func - <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt index 0a94224ad296..9e3c3b33514c 100644 --- a/Documentation/cpu-freq/pcc-cpufreq.txt +++ b/Documentation/cpu-freq/pcc-cpufreq.txt @@ -159,8 +159,8 @@ to be strictly associated with a P-state. 2.2 cpuinfo_transition_latency: ------------------------------- -The cpuinfo_transition_latency field is CPUFREQ_ETERNAL. The PCC specification -does not include a field to expose this value currently. +The cpuinfo_transition_latency field is 0. The PCC specification does +not include a field to expose this value currently. 2.3 cpuinfo_cur_freq: --------------------- diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt deleted file mode 100644 index 109e97bbab77..000000000000 --- a/Documentation/cpu-freq/user-guide.txt +++ /dev/null @@ -1,222 +0,0 @@ - CPU frequency and voltage scaling code in the Linux(TM) kernel - - - L i n u x C P U F r e q - - U S E R G U I D E - - - Dominik Brodowski <linux@brodo.de> - - - - Clock scaling allows you to change the clock speed of the CPUs on the - fly. This is a nice method to save battery power, because the lower - the clock speed, the less power the CPU consumes. - - -Contents: ---------- -1. Supported Architectures and Processors -1.1 ARM -1.2 x86 -1.3 sparc64 -1.4 ppc -1.5 SuperH -1.6 Blackfin - -2. "Policy" / "Governor"? -2.1 Policy -2.2 Governor - -3. How to change the CPU cpufreq policy and/or speed -3.1 Preferred interface: sysfs - - - -1. Supported Architectures and Processors -========================================= - -1.1 ARM -------- - -The following ARM processors are supported by cpufreq: - -ARM Integrator -ARM-SA1100 -ARM-SA1110 -Intel PXA - - -1.2 x86 -------- - -The following processors for the x86 architecture are supported by cpufreq: - -AMD Elan - SC400, SC410 -AMD mobile K6-2+ -AMD mobile K6-3+ -AMD mobile Duron -AMD mobile Athlon -AMD Opteron -AMD Athlon 64 -Cyrix Media GXm -Intel mobile PIII and Intel mobile PIII-M on certain chipsets -Intel Pentium 4, Intel Xeon -Intel Pentium M (Centrino) -National Semiconductors Geode GX -Transmeta Crusoe -Transmeta Efficeon -VIA Cyrix 3 / C3 -various processors on some ACPI 2.0-compatible systems [*] - -[*] Only if "ACPI Processor Performance States" are available -to the ACPI<->BIOS interface. - - -1.3 sparc64 ------------ - -The following processors for the sparc64 architecture are supported by -cpufreq: - -UltraSPARC-III - - -1.4 ppc -------- - -Several "PowerBook" and "iBook2" notebooks are supported. - - -1.5 SuperH ----------- - -All SuperH processors supporting rate rounding through the clock -framework are supported by cpufreq. - -1.6 Blackfin ------------- - -The following Blackfin processors are supported by cpufreq: - -BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher -BF531, BF532, BF533, Rev 0.3 or higher -BF534, BF536, BF537, Rev 0.2 or higher -BF561, Rev 0.3 or higher -BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher - - -2. "Policy" / "Governor" ? -========================== - -Some CPU frequency scaling-capable processor switch between various -frequencies and operating voltages "on the fly" without any kernel or -user involvement. This guarantees very fast switching to a frequency -which is high enough to serve the user's needs, but low enough to save -power. - - -2.1 Policy ----------- - -On these systems, all you can do is select the lower and upper -frequency limit as well as whether you want more aggressive -power-saving or more instantly available processing power. - - -2.2 Governor ------------- - -On all other cpufreq implementations, these boundaries still need to -be set. Then, a "governor" must be selected. Such a "governor" decides -what speed the processor shall run within the boundaries. One such -"governor" is the "userspace" governor. This one allows the user - or -a yet-to-implement userspace program - to decide what specific speed -the processor shall run at. - - -3. How to change the CPU cpufreq policy and/or speed -==================================================== - -3.1 Preferred Interface: sysfs ------------------------------- - -The preferred interface is located in the sysfs filesystem. If you -mounted it at /sys, the cpufreq interface is located in a subdirectory -"cpufreq" within the cpu-device directory -(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU). - -cpuinfo_min_freq : this file shows the minimum operating - frequency the processor can run at(in kHz) -cpuinfo_max_freq : this file shows the maximum operating - frequency the processor can run at(in kHz) -cpuinfo_transition_latency The time it takes on this CPU to - switch between two frequencies in nano - seconds. If unknown or known to be - that high that the driver does not - work with the ondemand governor, -1 - (CPUFREQ_ETERNAL) will be returned. - Using this information can be useful - to choose an appropriate polling - frequency for a kernel governor or - userspace daemon. Make sure to not - switch the frequency too often - resulting in performance loss. -scaling_driver : this file shows what cpufreq driver is - used to set the frequency on this CPU - -scaling_available_governors : this file shows the CPUfreq governors - available in this kernel. You can see the - currently activated governor in - -scaling_governor, and by "echoing" the name of another - governor you can change it. Please note - that some governors won't load - they only - work on some specific architectures or - processors. - -cpuinfo_cur_freq : Current frequency of the CPU as obtained from - the hardware, in KHz. This is the frequency - the CPU actually runs at. - -scaling_available_frequencies : List of available frequencies, in KHz. - -scaling_min_freq and -scaling_max_freq show the current "policy limits" (in - kHz). By echoing new values into these - files, you can change these limits. - NOTE: when setting a policy you need to - first set scaling_max_freq, then - scaling_min_freq. - -affected_cpus : List of Online CPUs that require software - coordination of frequency. - -related_cpus : List of Online + Offline CPUs that need software - coordination of frequency. - -scaling_cur_freq : Current frequency of the CPU as determined by - the governor and cpufreq core, in KHz. This is - the frequency the kernel thinks the CPU runs - at. - -bios_limit : If the BIOS tells the OS to limit a CPU to - lower frequencies, the user can read out the - maximum available frequency from this file. - This typically can happen through (often not - intended) BIOS settings, restrictions - triggered through a service processor or other - BIOS/HW based implementations. - This does not cover thermal ACPI limitations - which can be detected through the generic - thermal driver. - -If you have selected the "userspace" governor which allows you to -set the CPU operating frequency to a specific value, you can read out -the current frequency in - -scaling_setspeed. By "echoing" a new frequency into this - you can change the speed of the CPU, - but only within the limits of - scaling_min_freq and scaling_max_freq. |