aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2015-06-01Revert "mm: make vmstat_update periodic run conditional"Jon Medhurst (Tixy)
This reverts commit 7d252cd22a3f6cb459e8b012912dfd258157f7df because it has been implicated in kernel crashes when its workqueue timer is migrated during CPU hotplug. When HMP was initially being developed against a 3.4 derived kernel, it was observed that wakeups were occurring every 30s across every core to give the vmstat accounting a kick. This was causing a noticeable increase in energy consumption on the really quiet use cases such as audio and video playback. So commit 7d252cd22a was used which turned off the updates on idle CPUs to reduce that. On the 3.10 derived LSK this revert does not result in a significant increase in power consumption so it is assumed that changes since 3.4 have mitigated the initial problem. Signed-off-by: Jon Medhurst <tixy@linaro.org> Signed-off-by: Kevin Hilman <khilman@linaro.org>
2014-08-14Merge branch 'for-lsk' of git://git.linaro.org/arm/big.LITTLE/mp into ↵Mark Brown
lsk-v3.10-big.LITTLE
2014-08-12HMP: Do not fork-boost tasks coming from PIDs <= 2Chris Redpath
System services are generally started by init, whilst kernel threads are started by kthreadd. We do not want to give those tasks a head start, as this costs power for very little benefit. We do however wish to do that for tasks which the user launches. Further, some tasks allocate per-cpu timers directly after launch which can lead to those tasks being always scheduled on a big CPU when there is no computational need to do so. Not promoting services to big CPUs on launch will prevent that unless a service allocates their per-cpu resources after a period of intense computation, which is not a common pattern. Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Jon Medhurst <tixy@linaro.org>
2013-07-18Merge branches 'master-arm-multi_pmu_v2', 'master-config-fragments', ↵Jon Medhurst
'master-hw-bkpt-fix', 'master-misc-patches' and 'master-task-placement-v2-updates' into big-LITTLE-MP-master-v19 Updates: ------- - Rebased over 3.10 final - Differences from big-LITTLE-MP-master-v18 - New Patches: - master-config-fragments: 1 new patch - "config: Disable priority filtering for HMP Scheduler" - master-misc-patches: 1 new patch - "mm: make vmstat_update periodic run conditional" - New Branches: - master-task-placement-v2-updates: 7 patches New patches from ARM added in a new topic branch stacked on top of master-task-placement-v2-sysfs... - Revert "sched: Enable HMP priority filter by default" - "HMP: Use unweighted load for hmp migration decisions" - "HMP: Select least-loaded CPU when performing HMP Migrations" - "HMP: Avoid multiple calls to hmp_domain_min_load in fast path" - "HMP: Force new non-kernel tasks onto big CPUs until load stabilises" - "sched: Restrict nohz balance kicks to stay in the HMP domain" - "HMP: experimental: Force all rt tasks to start on little domain." Commands used for merge: ----------------------- $ git checkout -b big-LITTLE-MP-master-v19 v3.10 $ git merge master-arm-multi_pmu_v2 master-config-fragments \ master-hw-bkpt-fix master-misc-patches master-task-placement-v2 \ master-task-placement-v2-sysfs master-task-placement-v2-updates
2013-07-17mm: make vmstat_update periodic run conditionalGilad Ben-Yossef
vmstat_update runs every second from the work queue to update statistics and drain per cpu pages back into the global page allocator. This is useful in most circumstances but is wasteful if the CPU doesn't actually make any VM activity. This can happen in the situtation that the CPU is idle or running a CPU bound long term task (e.g. CPU isolation), in which case the periodic vmstate_update timer needlessly itnerrupts the CPU. This patch tries to make vmstat_update schedule itself for the next round only if there was any work for it to do in the previous run. The assumption is that if for a whole second we didn't see any VM activity it is reasnoable to assume that the CPU is not using the VM because it is idle or runs a long term single CPU bound task. A new single unbound system work queue item is scheduled periodically to monitor CPUs that have their vmstat_update work stopped and re-schedule them if VM activity is detected. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tejun Heo <tj@kernel.org> CC: John Stultz <johnstul@us.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> CC: Mel Gorman <mel@csn.ul.ie> CC: Mike Frysinger <vapier@gentoo.org> CC: David Rientjes <rientjes@google.com> CC: Hugh Dickins <hughd@google.com> CC: Minchan Kim <minchan.kim@gmail.com> CC: Konstantin Khlebnikov <khlebnikov@openvz.org> CC: Christoph Lameter <cl@linux.com> CC: Chris Metcalf <cmetcalf@tilera.com> CC: Hakan Akkan <hakanakkan@gmail.com> CC: Max Krasnyansky <maxk@qualcomm.com> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: linux-kernel@vger.kernel.org CC: linux-mm@kvack.org
2013-07-17sched: Ignore offline CPUs in HMP migration & load statsChris Redpath
Previously, an offline CPU would always appear to have a zero load and this would distort the offload functionality used for balancing big and little domains. Maintain a mask of online CPUs in each domain and use this instead. Change-Id: I639b564b2f40cb659af8ceb8bd37f84b8a1fe323 Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2013-07-17sched: SCHED_HMP multi-domain task migration controlMorten Rasmussen
We need a way to prevent tasks that are migrating up and down the hmp_domains from migrating straight on through before the load has adapted to the new compute capacity of the CPU on the new hmp_domain. This patch adds a next up/down migration delay that prevents the task from doing another migration in the same direction until the delay has expired. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2013-07-17sched: Task placement for heterogeneous systems based on task load-trackingMorten Rasmussen
This patch introduces the basic SCHED_HMP infrastructure. Each class of cpus is represented by a hmp_domain and tasks will only be moved between these domains when their load profiles suggest it is beneficial. SCHED_HMP relies heavily on the task load-tracking introduced in Paul Turners fair group scheduling patch set: <https://lkml.org/lkml/2012/8/23/267> SCHED_HMP requires that the platform implements arch_get_hmp_domains() which should set up the platform specific list of hmp_domains. It is also assumed that the platform disables SD_LOAD_BALANCE for the appropriate sched_domains. Tasks placement takes place every time a task is to be inserted into a runqueue based on its load history. The task placement decision is based on load thresholds. There are no restrictions on the number of hmp_domains, however, multiple (>2) has not been tested and the up/down migration policy is rather simple. Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2013-07-17sched: entity load-tracking load_avg_ratioMorten Rasmussen
This patch adds load_avg_ratio to each task. The load_avg_ratio is a variant of load_avg_contrib which is not scaled by the task priority. It is calculated like this: runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1). Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
2013-07-17sched: implement usage trackingPaul Turner
With the frame-work for runnable tracking now fully in place. Per-entity usage tracking is a simple and low-overhead addition. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>
2013-06-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Found via trinity: If you connect up an ipv6 socket to an ipv4 mapped address then an ipv6 one, sendmsg() can croak because ip6_sk_dst_check() assumes the route cached in the socket is an ipv6 one. In this case there is an ipv4 route attached, so it gets stomped on. Reported by Dave Jones and Hannes Frederic Sowa, fixed by Eric Dumazet. 2) AF_KEY notifications leak some kernel memory to userspace, fix from Mathias Krause. 3) DLCI calls __dev_get_by_name() without proper locking, and dlci_del doesn't validate that the device being deleted is actually a DLCI one. Fixes from Li Zefan. 4) Length check on bluetooth l2cap information responses is wrong, each response type has a different lenth, so we should make sure it's in a given range rather than enforce one single valid length. From Jaganath Kanakkassery. 5) Receive FIFO overflow is really easy to trigger in stress scenerios in the sh_eth driver, but the event isn't being handled properly at all. Specifically, the mask of error interrupts doesn't include the event so we never clear it, resulting in the driver becomming wedged processing an interrupt that never gets cleared. Fix from Sergei Shtylyov. 6) qlcnic sleeps while holding a spinlock, use mdelay() instead of msleep(). From Shahed Shaikh. 7) Missing curly braces causes SIP netfilter NAT module to always drop packets. Fix from Balazs Peter Odor. 8) ipt_ULOG in netfilter passes the wrong value to timer setup, causing the timer to dereference crap when it fires. Fix from Gao Feng. 9) Missing RCU protection around txq->axq_acq traversal in ath_txq_schedule(). Fix from Felix Fietkau. 10) Idle state transition test in ath9k_htc_config() is reversed, fix from Sujith Manoharan. 11) IPV6 forwarding handles unicast Router Alert packets incorrectly. It tests the wrong option state. Previously opt->ra being non-zero indicated a router alert marking in the SKB, but now it's indicated by a bit in opt->flags. Fix from YOSHIFUJI Hideaki. 12) SKB leak in GRE tunnel GSO handling, from Eric Dumazet. 13) get_user_pages_fast() error handling in TUN and MACVTAP use the same local variable for the base index and the loop iterator for page traversal, oops! Fix from Michael S Tsirkin. 14) ipv6_get_lladdr() can fail, and we must therefore check it's return value in inet6_set_iftoken(). For from Hannes Frederic Sowa. 15) If you change an interface name and meanwhile can sneak in something that looks up the name (like SO_BINDTODEVICE or SIOCGIFNAME) we can deadlock with CONFIG_PREEMPT=n. Fix this by providing a helper function that properly uses raw_seqcount_begin(). From Nicolas Schichan. 16) Chain noise calibration test is inverted in iwlwifi, fix from Nikolay Martynov. 17) Properly set TX iwlwifi descriptor flags for back requests. Fix from Emmanuel Grumbach. 18) We can't assume skb_transport_header() is set in xt_TCPOPTSTRAP module, fix from Pablo Neira Ayuso. 19) Some crummy APs don't provide the proper High Throughput info in association response frames. Add a workaround by assume we'll use whatever is in the beacon/probe. Fix from Johannes Berg. 20) mac80211 call to rate_idx_match_mask() swaps two arguments (mask and channel width). Fix from Simon Wunderlich. 21) xt_TCPMSS (like xt_TCPOPTSTRAP) must not try to handle fragmented frames. Fix from Phil Oester. 22) Fix rate control regression causing iwlwifi/iwlegacy chips to use 1Mbit/s on pre-11n networks. From Moshe Benji and Stanslaw Gruszka. 23) Disable brcmsmac power-save functions, they cause regressions. From Arend van Spriel. 24) Enforce a sane minimum MTU in l2cap_build_cmd() otherwise we can easily crash. Fix from Anderson Lizardo. 25) If a learning packet arrives during vxlan_stop() we crash, easily fixed by checking netif_running(). From Stephen Hemminger. 26) Static vxlan FDB entries should not be migrated, also from Stephen. 27) skb_clone() failures not handled in vxlan_xmit(), oops. Also from Stephen. 28) Add minimal driver for AR816x/AR817x ethernet chips, from Johannes Berg. 29) Fix regression in userspace VLAN acceleration control, added by the 802.1ad support changes. Fix from Fernando Luis Vazquez Cao. 30) Interval selection for MLD queries in the bridging code was reversed. Fix from Linus Lüssing. 31) ipv6's ndisc_send_redirect() erroneously writes to the packet we received not the packet we are building to send out. Fix from Matthias Schiffer. 32) Don't free netdev before unregistering it, in usb_8dev can driver. From Marc Kleine-Budde. 33) Fix nl80211 attribute buffer races, from Johannes Berg. 34) Although netlink_diag.h is under uapi/ it isn't present in Kbuild. From Stephen Hemminger. 35) Wrong address and family passed to MD5 key lookups in TCP, from Aydin Arik. 36) phy_type attribute created by SFC driver should not be writable. From Ben Hutchings. 37) Receive/Transmit queue allocations in pxa168_eth and mv643xx_eth should use kzalloc(). Otherwise if setup fails half-way, we'll dereference garbage when trying to teardown the rings. From Lubomir Rintel. 38) Fix double-allocation of dst (resulting in unfreeable net device) in ipv6's init_loopback(). From Gao Feng. 39) Fix fragmentation handling SKB leak in netfilter conntrack, we were freeing the wrong skb pointer. From Phil Oester. 40) Don't report "-1" (SPEED_UNKNOWN) in bond_miimon_commit(), from Nikolay Aleksandrov. 41) davinci_cpdma doesn't check for DMA mapping errors, letting the device scribble to random addresses. From Sebastian Siewior. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) dlci: validate the net device in dlci_del() dlci: acquire rtnl_lock before calling __dev_get_by_name() af_key: fix info leaks in notify messages ipv6: ip6_sk_dst_check() must not assume ipv6 dst net: fix kernel deadlock with interface rename and netdev name retrieval. net/tg3: Avoid delay during MMIO access ipv6: check return value of ipv6_get_lladdr macvtap: fix recovery from gup errors tun: fix recovery from gup errors gre: fix a possible skb leak ipv6: Process unicast packet with Router Alert by checking flag in skb. ath9k_htc: Handle IDLE state transition properly ath9k: fix an RCU issue in calling ieee80211_get_tx_rates netfilter: ipt_ULOG: fix incorrect setting of ulog timer netfilter: ctnetlink: send event when conntrack label was modified netfilter: nf_nat_sip: fix mangling qlcnic: Do not sleep while holding spinlock drivers: net: cpsw: fix compilation error with cpsw driver tcp: doc : fix the syncookies default value sh_eth: fix misreporting of transmit abort ...
2013-06-26net: fix kernel deadlock with interface rename and netdev name retrieval.Nicolas Schichan
When the kernel (compiled with CONFIG_PREEMPT=n) is performing the rename of a network interface, it can end up waiting for a workqueue to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to the fact that read_secklock_begin() will spin forever waiting for the writer process (the one doing the interface rename) to update the devnet_rename_seq sequence. This patch fixes the problem by adding a helper (netdev_get_name()) and using it in the code handling the SIOCGIFNAME ioctl and SO_BINDTODEVICE setsockopt. The netdev_get_name() helper uses raw_seqcount_begin() to avoid spinning forever, waiting for devnet_rename_seq->sequence to become even. cond_resched() is used in the contended case, before retrying the access to give the writer process a chance to finish. The use of raw_seqcount_begin() will incur some unneeded work in the reader process in the contended case, but this is better than deadlocking the system. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25gre: fix a possible skb leakEric Dumazet
commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE") added a possible skb leak, because it frees only the head of segment list, in case a skb_linearize() call fails. This patch adds a kfree_skb_list() helper to fix the bug. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Several fixes for bugs caught while looking through f_pos (ab)users" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aout32 coredump compat fix splice: don't pass the address of ->f_pos to methods mconsole: we'd better initialize pos before passing it to vfs_read()...
2013-06-20Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two smaller fixes - plus a context tracking tracing fix that is a bit bigger" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/context-tracking: Add preempt_schedule_context() for tracing sched: Fix clear NOHZ_BALANCE_KICK sched/x86: Construct all sibling maps if smt
2013-06-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Four fixes. The mmap ones are unfortunately larger than desired - fuzzing uncovered bugs that needed perf context life time management changes to fix properly" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP perf: Fix mmap() accounting hole perf: Fix perf mmap bugs kprobes: Fix to free gone and unused optprobes
2013-06-20Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Fix inconstinant clock usage in virtual time accounting - Fix a build error in KVM caused by the NOHZ work - Remove a pointless timekeeping duty assignment which breaks NOHZ - Use a proper notifier return value to avoid random behaviour * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Remove useless timekeeping duty attribution to broadcast source nohz: Fix notifier return val that enforce timekeeping kvm: Move guest entry/exit APIs to context_tracking vtime: Use consistent clocks among nohz accounting
2013-06-20splice: don't pass the address of ->f_pos to methodsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-19net: vlan: fix comment for vlan_ethhdr->h_vlan_protoOlaf Hering
After addition of 8021AD h_vlan_proto can be either ETH_P_8021Q or ETH_P_8021AD. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19tracing/context-tracking: Add preempt_schedule_context() for tracingSteven Rostedt
Dave Jones hit the following bug report: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #1 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by cc1/63645: #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28 ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645 Call Trace: [<ffffffff816ae383>] dump_stack+0x19/0x1b [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130 [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0 [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0 [<ffffffff8108dffc>] update_curr+0xec/0x240 [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480 [<ffffffff816b3a71>] __schedule+0x161/0x9b0 [<ffffffff816b4721>] preempt_schedule+0x51/0x80 [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60 [<ffffffff816b6824>] ? retint_careful+0x12/0x2e [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210 [<ffffffff816be280>] ftrace_call+0x5/0x2f [<ffffffff816b681d>] ? retint_careful+0xb/0x2e [<ffffffff816b4805>] ? schedule_user+0x5/0x70 [<ffffffff816b4805>] ? schedule_user+0x5/0x70 [<ffffffff816b6824>] ? retint_careful+0x12/0x2e ------------[ cut here ]------------ What happened was that the function tracer traced the schedule_user() code that tells RCU that the system is coming back from userspace, and to add the CPU back to the RCU monitoring. Because the function tracer does a preempt_disable/enable_notrace() calls the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set, then preempt_schedule() is called. But this is called before the user_exit() function can inform the kernel that the CPU is no longer in user mode and needs to be accounted for by RCU. The fix is to create a new preempt_schedule_context() that checks if the kernel is still in user mode and if so to switch it to kernel mode before calling schedule. It also switches back to user mode coming back from schedule in need be. The only user of this currently is the preempt_enable_notrace(), which is only used by the tracing subsystem. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-14smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().David Daney
Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts are enabled too early"), "bloody murder" is now being screamed. With a MIPS OCTEON config, we use on_each_cpu() in our irq_chip.irq_bus_sync_unlock() function. This gets called in early as a result of the time_init() call. Because the !SMP version of on_each_cpu() unconditionally enables irqs, we get: WARNING: at init/main.c:560 start_kernel+0x250/0x410() Interrupts were enabled early CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801 Call Trace: show_stack+0x68/0x80 warn_slowpath_common+0x78/0xb0 warn_slowpath_fmt+0x38/0x48 start_kernel+0x250/0x410 Suggested fix: Do what we already do in the SMP version of on_each_cpu(), and use local_irq_save/local_irq_restore. Because we need a flags variable, make it a static inline to avoid name space issues. [ Change from v1: Convert on_each_cpu to a static inline function, add #include <linux/irqflags.h> to avoid build breakage on some files. on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as on_each_cpu(), but they are not causing !SMP bugs for me, so I will defer changing them to a less urgent patch. ] Signed-off-by: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13Merge branch 'rcu/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fixes from Paul McKenney: "I must confess that this past merge window was not RCU's best showing. This series contains three more fixes for RCU regressions: 1. A fix to __DECLARE_TRACE_RCU() that causes it to act as an interrupt from idle rather than as a task switch from idle. This change is needed due to the recent use of _rcuidle() tracepoints that can be invoked from interrupt handlers as well as from idle. Without this fix, invoking _rcuidle() tracepoints from interrupt handlers results in splats and (more seriously) confusion on RCU's part as to whether a given CPU is idle or not. This confusion can in turn result in too-short grace periods and therefore random memory corruption. 2. A fix to a subtle deadlock that could result due to RCU doing a wakeup while holding one of its rcu_node structure's locks. Although the probability of occurrence is low, it really does happen. The fix, courtesy of Steven Rostedt, uses irq_work_queue() to avoid the deadlock. 3. A fix to a silent deadlock (invisible to lockdep) due to the interaction of timeouts posted by RCU debug code enabled by CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU hotplug operations. This will not occur in production kernels, but really does occur in randconfig testing. Diagnosis courtesy of Steven Rostedt" * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration rcu: Don't call wakeup() with rcu_node structure ->lock held trace: Allow idle-safe tracepoints to be called from irq
2013-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking update from David Miller: 1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to dump all objects properly, from Pablo Neira Ayuso. 2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is present. Fix from Phil Oester. 3) qdisc_get_rtab() looks for an existing matching rate table and uses that instead of creating a new one. However, it's key matching is incomplete, it fails to check to make sure the ->data[] array is identical too. Fix from Eric Dumazet. 4) ip_vs_dest_entry isn't fully initialized before copying back to userspace, fix from Dan Carpenter. 5) Fix ubuf reference counting regression in vhost_net, from Jason Wang. 6) When sock_diag dumps a socket filter back to userspace, we have to translate it out of the kernel's internal representation first. From Nicolas Dichtel. 7) davinci_mdio holds a spinlock while calling pm_runtime, which sleeps. Fix from Sebastian Siewior. 8) Timeout check in sh_eth_check_reset is off by one, from Sergei Shtylyov. 9) If sctp socket init fails, we can NULL deref during cleanup. Fix from Daniel Borkmann. 10) netlink_mmap() does not propagate errors properly, from Patrick McHardy. 11) Disable powersave and use minstrel by default in ath9k. From Sujith Manoharan. 12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets which prevents vhost from being able to use zerocopy. From Jason Wang. 13) Fix race between port lookup and TX path in team driver, from Jiri Pirko. 14) Missing length checks in bluetooth L2CAP packet parsing, from Johan Hedberg. 15) rtlwifi fails to connect to networking using any encryption method other than WPA2. Fix from Larry Finger. 16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power management stuff. From Yijing Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) b43: stop format string leaking into error msgs ath9k: Use minstrel rate control by default Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity" ath9k: Disable PowerSave by default net: wireless: iwlegacy: fix build error for il_pm_ops rtlwifi: Fix a false leak indication for PCI devices wl12xx/wl18xx: scan all 5ghz channels wl12xx: increase minimum singlerole firmware version required wl12xx: fix minimum required firmware version for wl127x multirole rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks mwifiex: debugfs: Fix out of bounds array access Bluetooth: Fix mgmt handling of power on failures Bluetooth: Fix missing length checks for L2CAP signalling PDUs Bluetooth: btmrvl: support Marvell Bluetooth device SD8897 Bluetooth: Fix checks for LE support on LE-only controllers team: fix checks in team_get_first_port_txable_rcu() team: move add to port list before port enablement team: check return value of team_get_port_by_index_rcu() for NULL tuntap: set SOCK_ZEROCOPY flag during open netlink: fix error propagation in netlink_mmap() ...
2013-06-12Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "Outside of bcache (which really isn't super big), these are all few-liners. There are a few important fixes in here: - Fix blk pm sleeping when holding the queue lock - A small collection of bcache fixes that have been done and tested since bcache was included in this merge window. - A fix for a raid5 regression introduced with the bio changes. - Two important fixes for mtip32xx, fixing an oops and potential data corruption (or hang) due to wrong bio iteration on stacked devices." * 'for-linus' of git://git.kernel.dk/linux-block: scatterlist: sg_set_buf() argument must be in linear mapping raid5: Initialize bi_vcnt pktcdvd: silence static checker warning block: remove refs to XD disks from documentation blkpm: avoid sleep when holding queue lock mtip32xx: Correctly handle bio->bi_idx != 0 conditions mtip32xx: Fix NULL pointer dereference during module unload bcache: Fix error handling in init code bcache: clarify free/available/unused space bcache: drop "select CLOSURES" bcache: Fix incompatible pointer type warning
2013-06-12include/linux/math64.h: add div64_ul()Alex Shi
There is div64_long() to handle the s64/long division, but no mocro do u64/ul division. It is necessary in some scenarios, so add this function. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alex Shi <alex.shi@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12mm: migration: add migrate_entry_wait_huge()Naoya Horiguchi
When we have a page fault for the address which is backed by a hugepage under migration, the kernel can't wait correctly and do busy looping on hugepage fault until the migration finishes. As a result, users who try to kick hugepage migration (via soft offlining, for example) occasionally experience long delay or soft lockup. This is because pte_offset_map_lock() can't get a correct migration entry or a correct page table lock for hugepage. This patch introduces migration_entry_wait_huge() to solve this. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [2.6.35+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12kmsg: honor dmesg_restrict sysctl on /dev/kmsgKees Cook
The dmesg_restrict sysctl currently covers the syslog method for access dmesg, however /dev/kmsg isn't covered by the same protections. Most people haven't noticed because util-linux dmesg(1) defaults to using the syslog method for access in older versions. With util-linux dmesg(1) defaults to reading directly from /dev/kmsg. To fix /dev/kmsg, let's compare the existing interfaces and what they allow: - /proc/kmsg allows: - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive single-reader interface (SYSLOG_ACTION_READ). - everything, after an open. - syslog syscall allows: - anything, if CAP_SYSLOG. - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if dmesg_restrict==0. - nothing else (EPERM). The use-cases were: - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs. - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the destructive SYSLOG_ACTION_READs. AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't clear the ring buffer. Based on the comments in devkmsg_llseek, it sounds like actions besides reading aren't going to be supported by /dev/kmsg (i.e. SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive syslog syscall actions. To this end, move the check as Josh had done, but also rename the constants to reflect their new uses (SYSLOG_FROM_CALL becomes SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC). SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC allows destructive actions after a capabilities-constrained SYSLOG_ACTION_OPEN check. - /dev/kmsg allows: - open if CAP_SYSLOG or dmesg_restrict==0 - reading/polling, after open Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192 [akpm@linux-foundation.org: use pr_warn_once()] Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Josh Boyer <jwboyer@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12CPU hotplug: provide a generic helper to disable/enable CPU hotplugSrivatsa S. Bhat
There are instances in the kernel where we would like to disable CPU hotplug (from sysfs) during some important operation. Today the freezer code depends on this and the code to do it was kinda tailor-made for that. Restructure the code and make it generic enough to be useful for other usecases too. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12team: fix checks in team_get_first_port_txable_rcu()Jiri Pirko
should be checked if "cur" is txable, not "port". Introduced by commit 6e88e1357c "team: use function team_port_txable() for determing enabled and up port" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10sock_diag: fix filter code sent to userspaceNicolas Dichtel
Filters need to be translated to real BPF code for userland, like SO_GETFILTER. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10trace: Allow idle-safe tracepoints to be called from irqPaul E. McKenney
__DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which may safely be invoked from what RCU considers to be an idle CPU. However, these _rcuidle() tracepoints may -not- be invoked from the handler of an irq taken from idle, because rcu_idle_enter() zeroes RCU's nesting-level counter, so that the rcu_irq_exit() returning to idle will trigger a WARN_ON_ONCE(). This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit() and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle() tracepoints usable from irq handlers as well as from process context. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-07Merge tag 'trace-fixes-v3.10-rc3-v3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This contains 4 fixes. The first two fix the case where full RCU debugging is enabled, enabling function tracing causes a live lock of the system. This is due to the added debug checks in rcu_dereference_raw() that is used by the function tracer. These checks are also traced by the function tracer as well as cause enough overhead to the function tracer to slow down the system enough that the time to finish an interrupt can take longer than when the next interrupt is triggered, causing a live lock from the timer interrupt. Talking this over with Paul McKenney, we came up with a fix that adds a new rcu_dereference_raw_notrace() that does not perform these added checks, and let the function tracer use that. The third commit fixes a failed compile when branch tracing is enabled, due to the conversion of the trace_test_buffer() selftest that the branch trace wasn't converted for. The forth patch fixes a bug caught by the RCU lockdep code where a rcu_read_lock() is performed when rcu is disabled (either going to or from idle, or user space). This happened on the irqsoff tracer as it calls task_uid(). The fix here was to use current_uid() when possible that doesn't use rcu locking. Which luckily, is always used when irqsoff calls this code." * tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use current_uid() for critical time tracing tracing: Fix bad parameter passed in branch selftest ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
2013-06-06net: Unbreak compat_sys_{send,recv}msgAndy Lutomirski
I broke them in this commit: commit 1be374a0518a288147c6a7398792583200a67261 Author: Andy Lutomirski <luto@amacapital.net> Date: Wed May 22 14:07:44 2013 -0700 net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints. It also reverts some unnecessary checks in sys_socketcall. Apparently I was suffering from underscore blindness the first time around. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix timeouts with direct mode authentication in mac80211, from Stanislaw Gruszka. 2) Aggregation sessions can deadlock in ath9k, from Felix Fietkau. 3) Netfilter's xt_addrtype doesn't work with ipv6 due to route lookups creating undesirable cache entries, from Florian Westphal. 4) Fix netfilter's ipt_ULOG from generating non-NULL terminated strings. 5) Fix netdev transmit queue crashes in mac80211, from Johannes Berg. 6) Fix copy and paste error in 802.11 stack that broke reporting of 64-bit station tx statistics, from Felix Fietkau. 7) When qlge_probe fails, it leaks the netdev. Fix from Wei Yongjun. 8) SKB control block (where we store the IP options information, amongst other things) must be cleared properly otherwise ICMP sending can crash for IP tunnels. Fix from Eric Dumazet. 9) Verification of Energy Efficient Ether support was coded wrongly, the test was inversed. Fix from Giuseppe CAVALLARO. 10) TCP handles redirects improperly because the wrong flow key is used for the route lookup. From Michal Kubecek. 11) Don't interpret MSG_CMSG_COMPAT from userspace, fix from Andy Lutomirski. 12) The new AF_VSOCK was missing from the lockdep string table, fix from Federico Vaga. 13) be2net doesn't handle checksumming of IP fragments properly, from Somnath Kotur. 14) Fix several bugs in the device address list code that lead to crashes and other misbehaviors. From Jay Vosburgh. 15) Fix ipv6 segmentation handling of fragmented GRE tunnel traffic, from Pravin B Shalr. 16) Fix usage of stale policies in IPSEC layer, from Paul Moore. 17) Fix team driver dump of ports when there are a large number of them, from Jiri Pirko. 18) Fix softlockups in UDP ipv4 socket lookup causes by and error in the hlist_nulls_for_each_entry_rcu() macro. From Eric Dumazet. 19) Fix several regressions added by the high rate accuracy changes to the htb packet scheduler. From Eric Dumazet. 20) Fix DMA'ing onto the stack in esd_usb2 and peak_usb CAN drivers, from Olivier Sobrie and Marc Kleine-Budde. 21) Fix unremovable network devices due to missing route pointer installation in the per-device ipv6 address list entries. From Gao feng. 22) Apply the tg3 5719 DMA workaround on 5720 chips as well, otherwise we get stalls. From Nithin Sujir. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits) net_sched: htb: do not mix 1ns and 64ns time units net: fix sk_buff head without data area tg3: Add read dma workaround for 5720 net: ethernet: xilinx_emaclite: set protocol selector bits when writing ANAR bnx2x: Fix bridged GSO for 57710/57711 chips net: fec: add fallback to random MAC address bnx2x: fix TCP offload for tunneling ipv4 over ipv6 ipv6: assign rt6_info to inet6_ifaddr in init_loopback net/mlx4_core: Keep VF assigned MAC in the PF admin table net/mlx4_en: Handle unassigned VF MAC address correctly net/mlx4_core: Return -EPROBE_DEFER when a VF is probed before PF is sufficiently initialized net/mlx4_en: Fix adaptive moderation cq update net: can: peak_usb: Do not do dma on the stack net: can: esd_usb2: Do not do dma on the stack net: can: kvaser_usb: fix reception on "USBcan Pro" and "USBcan R" type hardware. net_sched: restore "overhead xxx" handling net: force a reload of first item in hlist_nulls_for_each_entry_rcu hyperv: Fix vlan_proto setting in netvsc_recv_callback() team: fix port list dump for big number of ports list: introduce list_first_entry_or_null ...
2013-06-03Merge branch 'for-3.10-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix for yet another xattr bug which may lead to NULL deref. - A subtle bug in for_each_descendant_pre(). This bug requires quite specific conditions to trigger and isn't too likely to actually happen in the wild, but maybe that just makes it that much more nastier. - A warning message added for silly cgroup re-mount (not -o remount, but unmount followed by mount) behavior. * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: warn about mismatching options of a new mount of an existing hierarchy cgroup: fix a subtle bug in descendant pre-order walk cgroup: initialize xattr before calling d_instantiate()
2013-06-02net: force a reload of first item in hlist_nulls_for_each_entry_rcuEric Dumazet
Roman Gushchin discovered that udp4_lib_lookup2() was not reloading first item in the rcu protected list, in case the loop was restarted. This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37 rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is ptr->field : In some cases, gcc caches the value or ptr->field in a register. Use a barrier() to disallow such caching, as documented in Documentation/atomic_ops.txt line 114 Thanks a lot to Roman for providing analysis and numerous patches. Diagnosed-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31list: introduce list_first_entry_or_nullJiri Pirko
non-rcu variant of list_first_or_null_rcu Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31udp6: Fix udp fragmentation for tunnel traffic.Pravin B Shelar
udp6 over GRE tunnel does not work after to GRE tso changes. GRE tso handler passes inner packet but keeps track of outer header start in SKB_GSO_CB(skb)->mac_offset. udp6 fragment need to take care of outer header, which start at the mac_offset, while adding fragment header. This bug is introduced by commit 68c3316311 (GRE: Add TCP segmentation offload for GRE). Reported-by: Dmitry Kravkov <dkravkov@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Tested-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31kvm: Move guest entry/exit APIs to context_trackingFrederic Weisbecker
The kvm_host.h header file doesn't handle well inclusion when archs don't support KVM. This results in build crashes for such archs when they want to implement context tracking because this subsystem includes kvm_host.h in order to implement the guest_enter/exit APIs but it doesn't handle KVM off case. To fix this, move the guest_enter()/guest_exit() declarations and generic implementation to the context tracking headers. These generic APIs actually belong to this subsystem, besides other domains boundary tracking like user_enter() et al. KVM now properly becomes a user of this library, not the other buggy way around. Reported-by: Kevin Hilman <khilman@linaro.org> Reviewed-by: Kevin Hilman <khilman@linaro.org> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31vtime: Use consistent clocks among nohz accountingFrederic Weisbecker
While computing the cputime delta of dynticks CPUs, we are mixing up clocks of differents natures: * local_clock() which takes care of unstable clock sources and fix these if needed. * sched_clock() which is the weaker version of local_clock(). It doesn't compute any fixup in case of unstable source. If the clock source is stable, those two clocks are the same and we can safely compute the difference against two random points. Otherwise it results in random deltas as sched_clock() can randomly drift away, back or forward, from local_clock(). As a consequence, some strange behaviour with unstable tsc has been observed such as non progressing constant zero cputime. (The 'top' command showing no load). Fix this by only using local_clock(), or its irq safe/remote equivalent, in vtime code. Reported-by: Mike Galbraith <efault@gmx.de> Suggested-by: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-30Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter/IPVS fixes for 3.10-rc3, they are: * fix xt_addrtype with IPv6, from Florian Westphal. This required a new hook for IPv6 functions in the netfilter core to avoid hard dependencies with the ipv6 subsystem when this match is only used for IPv4. * fix connection reuse case in IPVS. Currently, if an reused connection are directed to the same server. If that server is down, those connection would fail. Therefore, clear the connection and choose a new server among the available ones. * fix possible non-nul terminated string sent to user-space if ipt_ULOG is used as the default netfilter logging stub, from Chen Gang. * fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek. This bug has been there since 2.6.26. * Fix breakage ip_vs_sh due to incorrect structure layout for RCU, from Jan Beulich. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-30aerdrv: Move cper_print_aer() call out of interrupt contextLance Ortiz
The following warning was seen on 3.9 when a corrected PCIe error was being handled by the AER subsystem. WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90() This occurred because a call to pci_get_domain_bus_and_slot() was added to cper_print_pcie() to setup for the call to cper_print_aer(). The warning showed up because cper_print_pcie() is called in an interrupt context and pci_get* functions are not supposed to be called in that context. The solution is to move the cper_print_aer() call out of the interrupt context and into aer_recover_work_func() to avoid any warnings when calling pci_get* functions. Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-05-30scatterlist: sg_set_buf() argument must be in linear mappingRusty Russell
Add a check behind CONFIG_DEBUG_SG to verify this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-05-28rcu: Add _notrace variation of rcu_dereference_raw() and ↵Steven Rostedt
hlist_for_each_entry_rcu() As rcu_dereference_raw() under RCU debug config options can add quite a bit of checks, and that tracing uses rcu_dereference_raw(), these checks happen with the function tracer. The function tracer also happens to trace these debug checks too. This added overhead can livelock the system. Add a new interface to RCU for both rcu_dereference_raw_notrace() as well as hlist_for_each_entry_rcu_notrace() as the hlist iterator uses the rcu_dereference_raw() as well, and is used a bit with the function tracer. Link: http://lkml.kernel.org/r/20130528184209.304356745@goodmis.org Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-28perf: Fix perf mmap bugsPeter Zijlstra
Vince reported a problem found by his perf specific trinity fuzzer. Al noticed 2 problems with perf's mmap(): - it has issues against fork() since we use vma->vm_mm for accounting. - it has an rb refcount leak on double mmap(). We fix the issues against fork() by using VM_DONTCOPY; I don't think there's code out there that uses this; we didn't hear about weird accounting problems/crashes. If we do need this to work, the previously proposed VM_PINNED could make this work. Aside from the rb reference leak spotted by Al, Vince's example prog was indeed doing a double mmap() through the use of perf_event_set_output(). This exposes another problem, since we now have 2 events with one buffer, the accounting gets screwy because we account per event. Fix this by making the buffer responsible for its own accounting. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-25Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dma fixes from Vinod Koul: "We have two patches from Andy & Rafael fixing the Lynxpoint dma" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: ACPI / LPSS: register clock device for Lynxpoint DMA properly dma: acpi-dma: parse CSRT to extract additional resources
2013-05-24Merge branch 'akpm' (incoming from Andrew Morton)Linus Torvalds
Merge fixes from Andrew Morton: "A bunch of fixes and one simple fbdev driver which missed the merge window because people will still talking about it (to no great effect)." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (30 commits) aio: fix kioctx not being freed after cancellation at exit time mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() random: fix accounting race condition with lockless irq entropy_count update drivers/char/random.c: fix priming of last_data mm/memory_hotplug.c: fix printk format warnings nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary drivers/block/brd.c: fix brd_lookup_page() race fbdev: FB_GOLDFISH should depend on HAS_DMA drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq() auditfilter.c: fix kernel-doc warnings aio: fix io_getevents documentation revert "selftest: add simple test for soft-dirty bit" drivers/leds/leds-ot200.c: fix error caused by shifted mask mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer linux/kernel.h: fix kernel-doc warning mm compaction: fix of improper cache flush in migration code rapidio/tsi721: fix bug in MSI interrupt handling hfs: avoid crash in hfs_bnode_create ...
2013-05-24Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "We didn't have any fixes sent up for -rc2, so this is a slightly larger batch. A bit all over the place platform-wise; OMAP, at91, marvell, renesas, sunxi, ux500, etc. I tried to summarize highlights but there isn't a whole lot to point out. Lots of little things fixed all over. A couple of defconfig updates due to new/changing options." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits) ARM: at91/sama5: fix incorrect PMC pcr div definition ARM: at91/dt: fix macb pinctrl_macb_rmii_mii_alt definition ARM: at91: at91sam9n12: move external irq declatation to DT ARM: shmobile: marzen: Use error values in usb_power_* ARM: tegra: defconfig fixes ARM: nomadik: fix IRQ assignment for SMC ethernet ARM: vt8500: Add missing NULL terminator in dt_compat clk: tegra: add ac97 controller clock clk: tegra: remove USB from clk init table ARM: dts: mvebu: Fix wrong the address reg value for the L2-cache node ARM: plat-orion: Fix num_resources and id for ge10 and ge11 ARM: OMAP2+: hwmod: Remove sysc slave idle and auto idle apis SERIAL: OMAP: Remove the slave idle handling from the driver ARM: OMAP2+: serial: Remove the un-used slave idle hooks ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes ARM: OMAP2+: hwmod: Add a new flag to handle SIDLE in SWSUP only in active ARM: OMAP2+: hwmod: Fix sidle programming in _enable_sysc()/_idle_sysc() arm: mvebu: fix the 'ranges' property to handle PCIe ARM: mvebu: select ARCH_REQUIRE_GPIOLIB for mvebu platform ARM: AM33XX: Add missing .clkdm_name to clkdiv32k_ick clock ...
2013-05-24linux/kernel.h: fix kernel-doc warningRandy Dunlap
Fix kernel-doc warning in <linux/kernel.h>: Warning(include/linux/kernel.h:590): No description found for parameter 'ip' scripts/kernel-doc cannot handle macros, functions, or function prototypes between the function or macro that is being documented and its definition, so move these prototypes above the function that is being documented. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-24wait: fix false timeouts when using wait_event_timeout()Imre Deak
Many callers of the wait_event_timeout() and wait_event_interruptible_timeout() expect that the return value will be positive if the specified condition becomes true before the timeout elapses. However, at the moment this isn't guaranteed. If the wake-up handler is delayed enough, the time remaining until timeout will be calculated as 0 - and passed back as a return value - even if the condition became true before the timeout has passed. Fix this by returning at least 1 if the condition becomes true. This semantic is in line with what wait_for_condition_timeout() does; see commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious failure under heavy load"). Daniel said "We have 3 instances of this bug in drm/i915. One case even where we switch between the interruptible and not interruptible wait_event_timeout variants, foolishly presuming they have the same semantics. I very much like this." One such bug is reported at https://bugs.freedesktop.org/show_bug.cgi?id=64133 Signed-off-by: Imre Deak <imre.deak@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Jones <davej@redhat.com> Cc: Lukas Czerner <lczerner@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>