aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-10-15Merge tag 'v3.14.22' into linux-linaro-lsk-v3.14Mark Brown
This is the 3.14.22 stable release
2014-10-15USB: Add device quirk for ASUS T100 Base Station keyboardLu Baolu
commit ddbe1fca0bcb87ca8c199ea873a456ca8a948567 upstream. This full-speed USB device generates spurious remote wakeup event as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result, Linux can't enter system suspend and S0ix power saving modes once this keyboard is used. This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk. With this quirk set, wakeup capability will be ignored during device configure. This patch could be back-ported to kernels as old as 2.6.39. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15sctp: handle association restarts when the socket is closed.Vlad Yasevich
[ Upstream commit bdf6fa52f01b941d4a80372d56de465bdbbd1d23 ] Currently association restarts do not take into consideration the state of the socket. When a restart happens, the current assocation simply transitions into established state. This creates a condition where a remote system, through a the restart procedure, may create a local association that is no way reachable by user. The conditions to trigger this are as follows: 1) Remote does not acknoledge some data causing data to remain outstanding. 2) Local application calls close() on the socket. Since data is still outstanding, the association is placed in SHUTDOWN_PENDING state. However, the socket is closed. 3) The remote tries to create a new association, triggering a restart on the local system. The association moves from SHUTDOWN_PENDING to ESTABLISHED. At this point, it is no longer reachable by any socket on the local system. This patch addresses the above situation by moving the newly ESTABLISHED association into SHUTDOWN-SENT state and bundling a SHUTDOWN after the COOKIE-ACK chunk. This way, the restarted associate immidiately enters the shutdown procedure and forces the termination of the unreachable association. Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15xfrm: Generate queueing routes only from route lookup functionsSteffen Klassert
[ Upstream commit b8c203b2d2fc961bafd53b41d5396bbcdec55998 ] Currently we genarate a queueing route if we have matching policies but can not resolve the states and the sysctl xfrm_larval_drop is disabled. Here we assume that dst_output() is called to kill the queued packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating queueing routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: a0073fe18e71 ("xfrm: Add a state resolution packet queue") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15xfrm: Generate blackhole routes only from route lookup functionsSteffen Klassert
[ Upstream commit f92ee61982d6da15a9e49664ecd6405a15a2ee56 ] Currently we genarate a blackhole route route whenever we have matching policies but can not resolve the states. Here we assume that dst_output() is called to kill the balckholed packets. Unfortunately this assumption is not true in all cases, so it is possible that these packets leave the system unwanted. We fix this by generating blackhole routes only from the route lookup functions, here we can guarantee a call to dst_output() afterwards. Fixes: 2774c131b1d ("xfrm: Handle blackhole route creation via afinfo.") Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()Neal Cardwell
[ Upstream commit 4fab9071950c2021d846e18351e0f46a1cffd67b ] Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15tcp: don't use timestamp from repaired skb-s to calculate RTT (v2)Andrey Vagin
[ Upstream commit 9d186cac7ffb1831e9f34cb4a3a8b22abb9dd9d4 ] We don't know right timestamp for repaired skb-s. Wrong RTT estimations isn't good, because some congestion modules heavily depends on it. This patch adds the TCPCB_REPAIRED flag, which is included in TCPCB_RETRANS. Thanks to Eric for the advice how to fix this issue. This patch fixes the warning: [ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380() [ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1 [ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2 [ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80 [ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc [ 879.569982] Call Trace: [ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66 [ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0 [ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20 [ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380 [ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710 [ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370 [ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0 [ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860 [ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80 [ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40 [ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20 [ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0 [ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b [ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]--- v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit, where it's set for other skb-s. Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages") Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution") Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15net: Always untag vlan-tagged traffic on input.Vlad Yasevich
[ Upstream commit 0d5501c1c828fb97d02af50aa9d2b1a5498b94e4 ] Currently the functionality to untag traffic on input resides as part of the vlan module and is build only when VLAN support is enabled in the kernel. When VLAN is disabled, the function vlan_untag() turns into a stub and doesn't really untag the packets. This seems to create an interesting interaction between VMs supporting checksum offloading and some network drivers. There are some drivers that do not allow the user to change tx-vlan-offload feature of the driver. These drivers also seem to assume that any VLAN-tagged traffic they transmit will have the vlan information in the vlan_tci and not in the vlan header already in the skb. When transmitting skbs that already have tagged data with partial checksum set, the checksum doesn't appear to be updated correctly by the card thus resulting in a failure to establish TCP connections. The following is a packet trace taken on the receiver where a sender is a VM with a VLAN configued. The host VM is running on doest not have VLAN support and the outging interface on the host is tg3: 10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294837885 ecr 0,nop,wscale 7], length 0 10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294838888 ecr 0,nop,wscale 7], length 0 This connection finally times out. I've only access to the TG3 hardware in this configuration thus have only tested this with TG3 driver. There are a lot of other drivers that do not permit user changes to vlan acceleration features, and I don't know if they all suffere from a similar issue. The patch attempt to fix this another way. It moves the vlan header stipping code out of the vlan module and always builds it into the kernel network core. This way, even if vlan is not supported on a virtualizatoin host, the virtual machines running on top of such host will still work with VLANs enabled. CC: Patrick McHardy <kaber@trash.net> CC: Nithin Nayak Sujir <nsujir@broadcom.com> CC: Michael Chan <mchan@broadcom.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-10Merge tag 'v3.14.21' into linux-linaro-lsk-v3.14Mark Brown
This is the 3.14.21 stable release
2014-10-09mm: per-thread vma cachingDavidlohr Bueso
commit 615d6e8756c87149f2d4c1b93d471bca002bd849 upstream. This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 50.61% | 19.90 | | patched | 73.45% | 13.58 | +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 75.28% | 11.03 | | patched | 88.09% | 9.31 | +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 70.66% | 17.14 | | patched | 91.15% | 12.57 | +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 1.06% | 91.54 | | patched | 99.97% | 14.18 | +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09mm: optimize put_mems_allowed() usageMel Gorman
commit d26914d11751b23ca2e8747725f2cae10c2f2c1b upstream. Since put_mems_allowed() is strictly optional, its a seqcount retry, we don't need to evaluate the function if the allocation was in fact successful, saving a smp_rmb some loads and comparisons on some relative fast-paths. Since the naming, get/put_mems_allowed() does suggest a mandatory pairing, rename the interface, as suggested by Mel, to resemble the seqcount interface. This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(), where it is important to note that the return value of the latter call is inverted from its previous incarnation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09swap: change swap_list_head to plist, add swap_avail_headDan Streetman
commit 18ab4d4ced0817421e6db6940374cc39d28d65da upstream. Originally get_swap_page() started iterating through the singly-linked list of swap_info_structs using swap_list.next or highest_priority_index, which both were intended to point to the highest priority active swap target that was not full. The first patch in this series changed the singly-linked list to a doubly-linked list, and removed the logic to start at the highest priority non-full entry; it starts scanning at the highest priority entry each time, even if the entry is full. Replace the manually ordered swap_list_head with a plist, swap_active_head. Add a new plist, swap_avail_head. The original swap_active_head plist contains all active swap_info_structs, as before, while the new swap_avail_head plist contains only swap_info_structs that are active and available, i.e. not full. Add a new spinlock, swap_avail_lock, to protect the swap_avail_head list. Mel Gorman suggested using plists since they internally handle ordering the list entries based on priority, which is exactly what swap was doing manually. All the ordering code is now removed, and swap_info_struct entries and simply added to their corresponding plist and automatically ordered correctly. Using a new plist for available swap_info_structs simplifies and optimizes get_swap_page(), which no longer has to iterate over full swap_info_structs. Using a new spinlock for swap_avail_head plist allows each swap_info_struct to add or remove themselves from the plist when they become full or not-full; previously they could not do so because the swap_info_struct->lock is held when they change from full<->not-full, and the swap_lock protecting the main swap_active_head must be ordered before any swap_info_struct->lock. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09lib/plist: add plist_requeueDan Streetman
commit a75f232ce0fe38bd01301899ecd97ffd0254316a upstream. Add plist_requeue(), which moves the specified plist_node after all other same-priority plist_nodes in the list. This is essentially an optimized plist_del() followed by plist_add(). This is needed by swap, which (with the next patch in this set) uses a plist of available swap devices. When a swap device (either a swap partition or swap file) are added to the system with swapon(), the device is added to a plist, ordered by the swap device's priority. When swap needs to allocate a page from one of the swap devices, it takes the page from the first swap device on the plist, which is the highest priority swap device. The swap device is left in the plist until all its pages are used, and then removed from the plist when it becomes full. However, as described in man 2 swapon, swap must allocate pages from swap devices with the same priority in round-robin order; to do this, on each swap page allocation, swap uses a page from the first swap device in the plist, and then calls plist_requeue() to move that swap device entry to after any other same-priority swap devices. The next swap page allocation will again use a page from the first swap device in the plist and requeue it, and so on, resulting in round-robin usage of equal-priority swap devices. Also add plist_test_requeue() test function, for use by plist_test() to test plist_requeue() function. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09lib/plist: add helper functionsDan Streetman
commit fd16618e12a05df79a3439d72d5ffdac5d34f3da upstream. Add PLIST_HEAD() to plist.h, equivalent to LIST_HEAD() from list.h, to define and initialize a struct plist_head. Add plist_for_each_continue() and plist_for_each_entry_continue(), equivalent to list_for_each_continue() and list_for_each_entry_continue(), to iterate over a plist continuing after the current position. Add plist_prev() and plist_next(), equivalent to (struct list_head*)->prev and ->next, implemented by list_prev_entry() and list_next_entry(), to access the prev/next struct plist_node entry. These are needed because unlike struct list_head, direct access of the prev/next struct plist_node isn't possible; the list must be navigated via the contained struct list_head. e.g. instead of accessing the prev by list_prev_entry(node, node_list) it can be accessed by plist_prev(node). Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09swap: change swap_info singly-linked list to list_headDan Streetman
commit adfab836f4908deb049a5128082719e689eed964 upstream. The logic controlling the singly-linked list of swap_info_struct entries for all active, i.e. swapon'ed, swap targets is rather complex, because: - it stores the entries in priority order - there is a pointer to the highest priority entry - there is a pointer to the highest priority not-full entry - there is a highest_priority_index variable set outside the swap_lock - swap entries of equal priority should be used equally this complexity leads to bugs such as: https://lkml.org/lkml/2014/2/13/181 where different priority swap targets are incorrectly used equally. That bug probably could be solved with the existing singly-linked lists, but I think it would only add more complexity to the already difficult to understand get_swap_page() swap_list iteration logic. The first patch changes from a singly-linked list to a doubly-linked list using list_heads; the highest_priority_index and related code are removed and get_swap_page() starts each iteration at the highest priority swap_info entry, even if it's full. While this does introduce unnecessary list iteration (i.e. Schlemiel the painter's algorithm) in the case where one or more of the highest priority entries are full, the iteration and manipulation code is much simpler and behaves correctly re: the above bug; and the fourth patch removes the unnecessary iteration. The second patch adds some minor plist helper functions; nothing new really, just functions to match existing regular list functions. These are used by the next two patches. The third patch adds plist_requeue(), which is used by get_swap_page() in the next patch - it performs the requeueing of same-priority entries (which moves the entry to the end of its priority in the plist), so that all equal-priority swap_info_structs get used equally. The fourth patch converts the main list into a plist, and adds a new plist that contains only swap_info entries that are both active and not full. As Mel suggested using plists allows removing all the ordering code from swap - plists handle ordering automatically. The list naming is also clarified now that there are two lists, with the original list changed from swap_list_head to swap_active_head and the new list named swap_avail_head. A new spinlock is also added for the new list, so swap_info entries can be added or removed from the new list immediately as they become full or not full. This patch (of 4): Replace the singly-linked list tracking active, i.e. swapon'ed, swap_info_struct entries with a doubly-linked list using struct list_heads. Simplify the logic iterating and manipulating the list of entries, especially get_swap_page(), by using standard list_head functions, and removing the highest priority iteration logic. The change fixes the bug: https://lkml.org/lkml/2014/2/13/181 in which different priority swap entries after the highest priority entry are incorrectly used equally in pairs. The swap behavior is now as advertised, i.e. different priority swap entries are used in order, and equal priority swap targets are used concurrently. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09jiffies: Fix timeval conversion to jiffiesAndrew Hunter
commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream. timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Paul Turner <pjt@google.com> Reported-by: Aaron Jacobs <jacobsa@google.com> Signed-off-by: Andrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by: John Stultz <john.stultz@linaro.org> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09media: vb2: fix VBI/poll regressionHans Verkuil
commit 58d75f4b1ce26324b4d809b18f94819843a98731 upstream. The recent conversion of saa7134 to vb2 unconvered a poll() bug that broke the teletext applications alevt and mtt. These applications expect that calling poll() without having called VIDIOC_STREAMON will cause poll() to return POLLERR. That did not happen in vb2. This patch fixes that behavior. It also fixes what should happen when poll() is called when STREAMON is called but no buffers have been queued. In that case poll() will also return POLLERR, but only for capture queues since output queues will always return POLLOUT anyway in that situation. This brings the vb2 behavior in line with the old videobuf behavior. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09hugetlb: ensure hugepage access is denied if hugepages are not supportedNishanth Aravamudan
commit 457c1b27ed56ec472d202731b12417bff023594a upstream. Currently, I am seeing the following when I `mount -t hugetlbfs /none /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's related to the fact that hugetlbfs is properly not correctly setting itself up in this state?: Unable to handle kernel paging request for data at address 0x00000031 Faulting instruction address: 0xc000000000245710 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries .... In KVM guests on Power, in a guest not backed by hugepages, we see the following: AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 64 kB HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages are not supported at boot-time, but this is only checked in hugetlb_init(). Extract the check to a helper function, and use it in a few relevant places. This does make hugetlbfs not supported (not registered at all) in this environment. I believe this is fine, as there are no valid hugepages and that won't change at runtime. [akpm@linux-foundation.org: use pr_info(), per Mel] [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined] Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09Merge remote-tracking branch 'lsk/v3.14/topic/kvm' into linux-linaro-lsk-v3.14Mark Brown
2014-10-08Merge remote-tracking branch 'lsk/v3.14/topic/gicv3' into linux-linaro-lsk-v3.14Mark Brown
2014-10-08Merge branch 'lsk/v3.14/topic/kvm' into lsk/lsk-with-kvm-v3.14Christoffer Dall
Conflicts: arch/arm64/include/asm/debug-monitors.h
2014-10-08Merge branch 'lsk/v3.14/topic/gic-v3' into lsk/lsk-with-kvm-v3.14Christoffer Dall
2014-10-06Merge tag 'v3.14.20' into linux-linaro-lsk-v3.14Mark Brown
This is the 3.14.20 stable release
2014-10-05vgaswitcheroo: add vga_switcheroo_fini_domain_pm_opsAlex Deucher
commit 766a53d059d1500c9755c8af017bd411bd8f1b20 upstream. Drivers should call this on unload to unregister pmops. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=84431 Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05lockdep: Revert lockdep check in raw_seqcount_begin()Trond Myklebust
commit 22fdcf02f6e80d64a927f702dd9d631a927d87d4 upstream. This commit reverts the addition of lockdep checking to raw_seqcount_begin for the following reasons: 1) It violates the naming convention that raw_* functions should not do lockdep checks (a convention that is also followed by the other raw_*_seqcount_begin functions). 2) raw_seqcount_begin does not spin, so it can only be part of an ABBA deadlock in very special circumstances (for instance if a lock is held across the entire raw_seqcount_begin()+read_seqcount_retry() loop while also being taken inside the write_seqcount protected area). 3) It is causing false positives with some existing callers, and there is no non-lockdep alternative for those callers to use. None of the three existing callers (__d_lookup_rcu, netdev_get_name, and the NFS state code) appear to use the function in a manner that is ABBA deadlock prone. Fixes: 1ca7d67cf5d5: seqcount: Add lockdep functionality to seqcount/seqlock Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CAHQdGtRR6SvEhXiqWo24hoUh9AU9cL82Z8Z-d8-7u951F_d+5g@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05regulatory: add NUL to alpha2Eliad Peller
commit a5fe8e7695dc3f547e955ad2b662e3e72969e506 upstream. alpha2 is defined as 2-chars array, but is used in multiple places as string (e.g. with nla_put_string calls), which might leak kernel data. Solve it by simply adding an extra char for the NULL terminator, making such operations safe. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()Tejun Heo
commit e09c2c295468476a239d13324ce9042ec4de05eb upstream. create_singlethread_workqueue() is a compat interface for single threaded workqueue which maps to ordered workqueue w/ rescuer in the current implementation. create_singlethread_workqueue() currently implemented by invoking alloc_workqueue() w/ appropriate parameters. 8719dceae2f9 ("workqueue: reject adjusting max_active or applying attrs to ordered workqueues") introduced __WQ_ORDERED to protect ordered workqueues against dynamic attribute changes which can break ordering guarantees but forgot to apply it to create_singlethread_workqueue(). This in itself is okay as nobody currently uses dynamic attribute change on workqueues created with create_singlethread_workqueue(). However, 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") broke singlethreaded guarantee for ordered workqueues through allocating a separate pool_workqueue on each NUMA node by default. A later change 8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups") fixed it by allocating only one global pool_workqueue if __WQ_ORDERED is set. Combined, the __WQ_ORDERED omission in create_singlethread_workqueue() became critical breaking its single threadedness and ordering guarantee. Let's make create_singlethread_workqueue() wrap alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and can implicitly track future ordered_workqueue changes. v2: I missed that __WQ_ORDERED now protects against pwq splitting across NUMA nodes and incorrectly described the patch as a nice-to-have fix to protect against future dynamic attribute usages. Oleg pointed out that this is actually a critical breakage due to 8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups"). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Mike Anderson <mike.anderson@us.ibm.com> Cc: Oleg Nesterov <onestero@redhat.com> Cc: Gustavo Luiz Duarte <gduarte@redhat.com> Cc: Tomas Henzl <thenzl@redhat.com> Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05iio:trigger: modify return value for iio_trigger_getSrinivas Pandruvada
commit f153566570fb9e32c2f59182883f4f66048788fb upstream. Instead of a void function, return the trigger pointer. Whilst not in of itself a fix, this makes the following set of 7 fixes cleaner than they would otherwise be. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05ACPI / hotplug: Generate online uevents for ACPI containersRafael J. Wysocki
commit 8ab17fc92e49bc2b8fff9d220c19bf50ec9c1158 upstream. Commit 46394fd01 (ACPI / hotplug: Move container-specific code out of the core) removed the generation of "online" uevents for containers, because "add" uevents are now generated for them automatically when container system devices are registered. However, there are user space tools that need to be notified when the container and all of its children have been enumerated, which doesn't happen any more. For this reason, add a mechanism allowing "online" uevents to be generated for ACPI containers after enumerating the container along with all of its children. Fixes: 46394fd01 (ACPI / hotplug: Move container-specific code out of the core) Reported-and-tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05usb: gadget: f_fs: drop duplicate usb_functionfs_descs_head declarationKarol Lewandowski
Applicable for 3.14-stable only, as this revertes a previous commit that was incorrect. This commit drops duplicate declaration of struct usb_functionfs_descs_head erronousely added in commit 28c5980b54 ("usb: gadget: f_fs: resurect usb_functionfs_descs_head structure"). Fix from 28c5980b54 is applicable only for v3.15-rc1 and newer kernels. This fixes error in uapi: /src/linux$ make -C tools usb make: Entering directory '/src/linux/tools' DESCEND usb make[1]: Entering directory '/src/linux/tools/usb' gcc -Wall -Wextra -g -I../include -o testusb testusb.c -lpthread gcc -Wall -Wextra -g -I../include -o ffs-test ffs-test.c -lpthread In file included from ffs-test.c:41:0: ../../include/uapi/linux/usb/functionfs.h:42:8: error: redefinition of ‘struct usb_functionfs_descs_head’ struct usb_functionfs_descs_head { ^ ../../include/uapi/linux/usb/functionfs.h:31:8: note: originally defined here struct usb_functionfs_descs_head { ^ Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05xattr: fix check for simultaneous glibc header inclusionFilipe Brandenburger
commit bfcfd44cce2774f19daeb59fb4e43fc9aa80e7b8 upstream. The guard was introduced in commit ea1a8217b06b ("xattr: guard against simultaneous glibc header inclusion") but it is using #ifdef to check for a define that is either set to 1 or 0. Fix it to use #if instead. * Without this patch: $ { echo "#include <sys/xattr.h>"; echo "#include <linux/xattr.h>"; } | gcc -E -Iinclude/uapi - >/dev/null include/uapi/linux/xattr.h:19:0: warning: "XATTR_CREATE" redefined [enabled by default] #define XATTR_CREATE 0x1 /* set value, fail if attr already exists */ ^ /usr/include/x86_64-linux-gnu/sys/xattr.h:32:0: note: this is the location of the previous definition #define XATTR_CREATE XATTR_CREATE ^ * With this patch: $ { echo "#include <sys/xattr.h>"; echo "#include <linux/xattr.h>"; } | gcc -E -Iinclude/uapi - >/dev/null (no warnings) Signed-off-by: Filipe Brandenburger <filbranden@google.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Allan McRae <allan@archlinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-03arm/arm64: KVM: vgic: delay vgic allocation until init timeMarc Zyngier
It is now quite easy to delay the allocation of the vgic tables until we actually require it to be up and running (when the first vcpu is kicking around, or someones tries to access the GIC registers). This allow us to allocate memory for the exact number of CPUs we have. As nobody configures the number of interrupts just yet, use a fallback to VGIC_NR_IRQS_LEGACY. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit 4956f2bc1fdee4bc336532f3f34635a8534cedfd) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: vgic: kill VGIC_NR_IRQSMarc Zyngier
Nuke VGIC_NR_IRQS entierly, now that the distributor instance contains the number of IRQ allocated to this GIC. Also add VGIC_NR_IRQS_LEGACY to preserve the current API. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit 5fb66da64064d0cb8dcce4cc8bf4cb1b921b13a0) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: vgic: handle out-of-range MMIO accessesMarc Zyngier
Now that we can (almost) dynamically size the number of interrupts, we're facing an interesting issue: We have to evaluate at runtime whether or not an access hits a valid register, based on the sizing of this particular instance of the distributor. Furthermore, the GIC spec says that accessing a reserved register is RAZ/WI. For this, add a new field to our range structure, indicating the number of bits a single interrupts uses. That allows us to find out whether or not the access is in range. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit c3c918361adcceb816c92b21dd95d2b46fb96a8f) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: vgic: kill VGIC_MAX_CPUSMarc Zyngier
We now have the information about the number of CPU interfaces in the distributor itself. Let's get rid of VGIC_MAX_CPUS, and just rely on KVM_MAX_VCPUS where we don't have the choice. Yet. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit fc675e355e705a046df7b635d3f3330c0ad94569) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: vgic: Parametrize VGIC_NR_SHARED_IRQSMarc Zyngier
Having a dynamic number of supported interrupts means that we cannot relly on VGIC_NR_SHARED_IRQS being fixed anymore. Instead, make it take the distributor structure as a parameter, so it can return the right value. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit fb65ab63b8cae510ea1e43e68b5da2f9980aa6d5) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: vgic: switch to dynamic allocationMarc Zyngier
So far, all the VGIC data structures are statically defined by the *maximum* number of vcpus and interrupts it supports. It means that we always have to oversize it to cater for the worse case. Start by changing the data structures to be dynamically sizeable, and allocate them at runtime. The sizes are still very static though. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit c1bfb577addd4867a82c4f235824a315d5afb94a) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: vgic: Improve handling of GICD_I{CS}PENDRnChristoffer Dall
Writes to GICD_ISPENDRn and GICD_ICPENDRn are currently not handled correctly for level-triggered interrupts. The spec states that for level-triggered interrupts, writes to the GICD_ISPENDRn activate the output of a flip-flop which is in turn or'ed with the actual input interrupt signal. Correspondingly, writes to GICD_ICPENDRn simply deactivates the output of that flip-flop, but does not (of course) affect the external input signal. Reads from GICC_IAR will also deactivate the flip-flop output. This requires us to track the state of the level-input separately from the state in the flip-flop. We therefore introduce two new variables on the distributor struct to track these two states. Astute readers may notice that this is introducing more state than required (because an OR of the two states gives you the pending state), but the remaining vgic code uses the pending bitmap for optimized operations to figure out, at the end of the day, if an interrupt is pending or not on the distributor side. Refactoring the code to consider the two state variables all the places where we currently access the precomputed pending value, did not look pretty. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit faa1b46c3e9f4d40359aee04ff275eea5f4cae3a) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: Rename irq_active to irq_queuedChristoffer Dall
We have a special bitmap on the distributor struct to keep track of when level-triggered interrupts are queued on the list registers. This was named irq_active, which is confusing, because the active state of an interrupt as per the GIC spec is a different thing, not specifically related to edge-triggered/level-triggered configurations but rather indicates an interrupt which has been ack'ed but not yet eoi'ed. Rename the bitmap and the corresponding accessor functions to irq_queued to clarify what this is actually used for. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit dbf20f9d8105cca531614c8bff9a74351e8e67e7) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03arm/arm64: KVM: Rename irq_state to irq_pendingChristoffer Dall
The irq_state field on the distributor struct is ambiguous in its meaning; the comment says it's the level of the input put, but that doesn't make much sense for edge-triggered interrupts. The code actually uses this state variable to check if the interrupt is in the pending state on the distributor so clarify the comment and rename the actual variable and accessor methods. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit 227844f53864077ccaefe01d0960fcccc03445ce) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03KVM: ARM: vgic: register kvm_device_ops dynamicallyChristoffer Dall
Now that we have a dynamic means to register kvm_device_ops, use that for the ARM VGIC, instead of relying on the static table. [ Adapted for LSK to use register_cpu_notifier() like in the code block that was moved instead of __register_cpu_notifier(), because the series to fix double lock on cpu_add_remove_lock that defines __register_cpu_notifier() is not in LSK. - Christoffer ] Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit c06a841bf36340e9e917ce60d11a6425ac85d0bd) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: device: add simple registration mechanism for kvm_device_opsChristoffer Dall
kvm_ioctl_create_device currently has knowledge of all the device types and their associated ops. This is fairly inflexible when adding support for new in-kernel device emulations, so move what we currently have out into a table, which can support dynamic registration of ops by new drivers for virtual hardware. Cc: Alex Williamson <Alex.Williamson@redhat.com> Cc: Alex Graf <agraf@suse.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit d60eacb07053142bfb9b41582074a89a790a9d46) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: remove garbage arg to *hardware_{en,dis}ableChristoffer Dall
In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 13a34e067eab24fec882e1834fbf2cc31911d474) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: forward declare structs in kvm_types.hChristoffer Dall
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 656473003bc7e056c3bbd4a4d9832dad01e86f76) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: Unconditionally export KVM_CAP_USER_NMIChristoffer Dall
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_USER_NMI was guarded by #ifdef __KVM_HAVE_USER_NMI, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_USER_NMI and change the the typo in the comment for the IOCTL number definition as well. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 44b5ce73c99c389817be71b9161bceb197d40ecb) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: Unconditionally export KVM_CAP_READONLY_MEMChristoffer Dall
The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that userspace can, at run-time, determine if a feature is supported or not. This allows KVM to being supporting a new feature with a new kernel version without any need to update user space. Unfortunately, since the definition of KVM_CAP_READONLY_MEM was guarded by #ifdef __KVM_HAVE_READONLY_MEM, such discovery still required a user space update. Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 0f8a4de3e088797576ac76200b634b802e5c7781) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: Introduce gfn_to_hva_memslot_protChristoffer Dall
To support read-only memory regions on arm and arm64, we have a need to resolve a gfn to an hva given a pointer to a memslot to avoid looping through the memslots twice and to reuse the hva error checking of gfn_to_hva_prot(), add a new gfn_to_hva_memslot_prot() function and refactor gfn_to_hva_prot() to use this function. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit 64d831269ccbca1fc6d739a0f3c8aa24afb43a5e) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: add kvm_arch_sched_inChristoffer Dall
Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit e790d9ef6405633b007339d746b709aed43a928d) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: Move more code under CONFIG_HAVE_KVM_IRQFDPaolo Bonzini
Commits e4d57e1ee1ab (KVM: Move irq notifier implementation into eventfd.c, 2014-06-30) included the irq notifier code unconditionally in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before. Similarly, commit 297e21053a52 (KVM: Give IRQFD its own separate enabling Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be under CONFIG_HAVE_KVM_IRQCHIP. Together, this broke compilation without CONFIG_KVM_XICS. Fix by adding or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit c77dcacb397519b6ade8f08201a4a90a7f4f751e) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-02KVM: Give IRQFD its own separate enabling Kconfig optionChristoffer Dall
Currently, the IRQFD code is conditional on CONFIG_HAVE_KVM_IRQ_ROUTING. So that we can have the IRQFD code compiled in without having the IRQ routing code, this creates a new CONFIG_HAVE_KVM_IRQFD, makes the IRQFD code conditional on it instead of CONFIG_HAVE_KVM_IRQ_ROUTING, and makes all the platforms that currently select HAVE_KVM_IRQ_ROUTING also select HAVE_KVM_IRQFD. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 297e21053a52f060944e9f0de4c64fad9bcd72fc) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>