aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2014-02-21sched: Add 'flags' argument to sched_{set,get}attr() syscallsPeter Zijlstra
Because of a recent syscall design debate; its deemed appropriate for each syscall to have a flags argument for future extension; without immediately requiring new syscalls. Cc: juri.lelli@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140214161929.GL27965@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-20Merge tag 'pci-v3.14-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The most interesting thing here is the change to enable INTx (by clearing PCI_COMMAND_INTX_DISABLE) if the BIOS left INTx disabled. Apparently the Baytrail BIOS does this, which means EHCI doesn't work. Also, fix an AHCI MSI regression and other issues with the recent MSI changes. This also adds pci_enable_msi_exact() and pci_enable_msix_exact(), which aren't regression fixes, but will keep us from touching drivers twice (once to stop using the deprecated pci_enable_msi(), etc., and again to use the *_exact() variants). There's also a minor MVEBU fix. Summary: MSI: - Fix AHCI single-MSI fallback (Alexander Gordeev) - Fix populate_msi_sysfs() error paths (Greg Kroah-Hartman) - Fix htmldocs problem (Masanari Iida) - Add pci_enable_msi_exact() and pci_enable_msix_exact() (Alexander Gordeev) - Update documentation (Alexander Gordeev) Miscellaneous: - mvebu: expose device ID & revision via lspci (Andrew Lunn) - Enable INTx if the BIOS left them disabled (Bjorn Helgaas)" * tag 'pci-v3.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: ahci: Fix broken fallback to single MSI mode PCI: Enable INTx if BIOS left them disabled PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact() PCI/MSI: Fix cut-and-paste errors in documentation PCI/MSI: Add pci_enable_msi() documentation back PCI/MSI: Fix pci_msix_vec_count() htmldocs failure PCI/MSI: Fix leak of msi_attrs PCI/MSI: Check kmalloc() return value, fix leak of name PCI: mvebu: Use Device ID and revision from underlying endpoint
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Quite a few fixes this time. Three locking fixes, all marked for -stable. A couple error path fixes and some misc fixes. Hugh found a bug in memcg offlining sequence and we thought we could fix that from cgroup core side but that turned out to be insufficient and got reverted. A different fix has been applied to -mm" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update cgroup_enable_task_cg_lists() to grab siglock Revert "cgroup: use an ordered workqueue for cgroup destruction" cgroup: protect modifications to cgroup_idr with cgroup_mutex cgroup: fix locking in cgroup_cfts_commit() cgroup: fix error return from cgroup_create() cgroup: fix error return value in cgroup_mount() cgroup: use an ordered workqueue for cgroup destruction nfs: include xattr.h from fs/nfs/nfs3proc.c cpuset: update MAINTAINERS entry arm, pm, vmpressure: add missing slab.h includes
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two workqueue fixes. One for an unlikely but possible critical bug during kworker shutdown and the other to make lockdep names a bit more descriptive" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: ensure @task is valid across kthread_stop() workqueue: add args to workqueue lockdep name
2014-02-19Merge tag 'mfd-fixes-3.14-1' of git://git.linaro.org/people/lee.jones/mfdLinus Torvalds
Pull MFD fixes from Lee Jones: "Couple of small issues solved: - Suspend/Resume call-backs require CONFIG_PM_SLEEP - Some drivers written for 32bit architectures fail when compiled with a 64bit compiler. The fixes will future proof the drivers" * tag 'mfd-fixes-3.14-1' of git://git.linaro.org/people/lee.jones/mfd: mfd: sec-core: sec_pmic_{suspend,resume}() should depend on CONFIG_PM_SLEEP mfd: max14577: max14577_{suspend,resume}() should depend on CONFIG_PM_SLEEP mfd: tps65217: Naturalise cross-architecture discrepancies mfd: wm8994-core: Naturalise cross-architecture discrepancies mfd: max8998: Naturalise cross-architecture discrepancies mfd: max8997: Naturalise cross-architecture discrepancies
2014-02-19mfd: tps65217: Naturalise cross-architecture discrepanciesLee Jones
If we compile the TPS65217 for a 64bit architecture we receive the following warnings: drivers/mfd/tps65217.c: In function ‘tps65217_probe’: drivers/mfd/tps65217.c:173:13: warning: cast from pointer to integer of different size chip_id = (unsigned int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-02-19mfd: max8998: Naturalise cross-architecture discrepanciesLee Jones
If we compile the MAX8998 for a 64bit architecture we receive the following warnings: drivers/mfd/max8998.c: In function ‘max8998_i2c_get_driver_data’: drivers/mfd/max8998.c:178:10: warning: cast from pointer to integer of different size return (int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-02-19mfd: max8997: Naturalise cross-architecture discrepanciesLee Jones
If we compile the MAX8997 for a 64bit architecture we receive the following warnings: drivers/mfd/max8997.c: In function ‘max8997_i2c_get_driver_data’: drivers/mfd/max8997.c:173:10: warning: cast from pointer to integer of different size return (int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-02-18Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Lots of little small things, nothing too major: nouveau regression fixes, vmware fixes for the new hw support, memory leaks in error path fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (31 commits) drm/radeon/ni: fix typo in dpm sq ramping setup drm/radeon/si: fix typo in dpm sq ramping setup drm/radeon: fix CP semaphores on CIK drm/radeon: delete a stray tab drm/radeon: fix display tiling setup on SI drm/radeon/dpm: reduce r7xx vblank mclk threshold to 200 drm/radeon: fill in DRM_CAPs for cursor size drm: add DRM_CAPs for cursor size drm/radeon: unify bpc handling drm/ttm: Fix memory leak in ttm_agp_backend.c drm/ttm: declare 'struct device' in ttm_page_alloc.h drm/nouveau: fix TTM_PL_TT memtype on pre-nv50 drm/nv50/disp: use correct register to determine DP display bpp drm/nouveau/fb: use correct ram oclass for nv1a hardware drm/nv50/gr: add missing nv_error parameter priv drm/nouveau: fix ENG_RUNLIST register address drm/nv4c/bios: disallow retrieving from prom on nv4x igp's drm/nv4c/vga: decode register is in a different place on nv4x igp's drm/nv4c/mc: nv4x igp's have a different msi rearm register drm/nouveau: set irq_enabled manually ...
2014-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) kvaser CAN driver has fixed limits of some of it's table, validate that we won't exceed those limits at probe time. Fix from Olivier Sobrie. 2) Fix rtl8192ce disabling interrupts for too long, from Olivier Langlois. 3) Fix botched shift in ath5k driver, from Dan Carpenter. 4) Fix corruption of deferred packets in TIPC, from Erik Hugne. 5) Fix newlink error path in macvlan driver, from Cong Wang. 6) Fix netpoll deadlock in bonding, from Ding Tianhong. 7) Handle GSO packets properly in forwarding path when fragmentation is necessary on egress, from Florian Westphal. 8) Fix axienet build errors, from Michal Simek. 9) Fix refcounting of ubufs on tx in vhost net driver, from Michael S Tsirkin. 10) Carrier status isn't set properly in hyperv driver, from Haiyang Zhang. 11) Missing pci_disable_device() in tulip_remove_one), from Ingo Molnar. 12) AF_PACKET qdisc bypass mode doesn't adhere to driver provided TX queue selection method. Add a fallback method mechanism to fix this bug, from Daniel Borkmann. 13) Fix regression in link local route handling on GRE tunnels, from Nicolas Dichtel. 14) Bonding can assign dup aggregator IDs in some sequences of configuration, fix by making the allocation counter per-bond instead of global. From Jiri Bohac. 15) sctp_connectx() needs compat translations, from Daniel Borkmann. 16) Fix of_mdio PHY interrupt parsing, from Ben Dooks * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) MAINTAINERS: add entry for the PHY library of_mdio: fix phy interrupt passing net: ethernet: update dependency and help text of mvneta NET: fec: only enable napi if we are successful af_packet: remove a stray tab in packet_set_ring() net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode ipv4: fix counter in_slow_tot irtty-sir.c: Do not set_termios() on irtty_close() bonding: 802.3ad: make aggregator_identifier bond-private usbnet: remove generic hard_header_len check gre: add link local route when local addr is any batman-adv: fix potential kernel paging error for unicast transmissions batman-adv: avoid double free when orig_node initialization fails batman-adv: free skb on TVLV parsing success batman-adv: fix TT CRC computation by ensuring byte order batman-adv: fix potential orig_node reference leak batman-adv: avoid potential race condition when adding a new neighbour batman-adv: properly check pskb_may_pull return value batman-adv: release vlan object after checking the CRC batman-adv: fix TT-TVLV parsing on OGM reception ...
2014-02-19Merge tag 'ttm-fixes-3.14-2014-02-18' of ↵Dave Airlie
git://people.freedesktop.org/~thomash/linux into drm-fixes Pull request of 2014-02-18 One compile fix and one memory leak. * tag 'ttm-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux: drm/ttm: Fix memory leak in ttm_agp_backend.c drm/ttm: declare 'struct device' in ttm_page_alloc.h
2014-02-19Merge tag 'vmwgfx-fixes-3.14-2014-02-18' of ↵Dave Airlie
git://people.freedesktop.org/~thomash/linux into drm-fixes Pull request of 2014-02-18. Nothing special. The biggest change is adding a couple of command defines and packing the command data correctly. * tag 'vmwgfx-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Fix command defines and checks drm/vmwgfx: Fix possible integer overflow drm/vmwgfx: Remove stray const drm/vmwgfx: unlock on error path in vmw_execbuf_process() drm/vmwgfx: Get maximum mob size from register SVGA_REG_MOB_MAX_SIZE drm/vmwgfx: Fix a couple of sparse warnings and errors
2014-02-18drm: add DRM_CAPs for cursor sizeAlex Deucher
Some hardware may not support standard 64x64 cursors. Add a drm cap to query the cursor size from the kernel. Some examples include radeon CIK parts (128x128 cursors) and armada (32x64 or 64x32). This allows things like device specific ddxes to remove asics specific logic and also allows xf86-video-modesetting to work properly with hw cursors on this hardware. Default to 64 if the driver doesn't specify a size. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rob Clark <robdclark@gmail.com>
2014-02-18drm/ttm: declare 'struct device' in ttm_page_alloc.hAlexandre Courbot
Declare 'struct device' explicitly in ttm_page_alloc.h as this file does not include any file declaring it. This removes the following warning: warning: 'struct device' declared inside parameter list Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Thierry Reding <treding@nvidia.com>
2014-02-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "We have some patches fixing up ACL support issues from Zheng and Guangliang and a mount option to enable/disable this support. (These fixes were somewhat delayed by the Chinese holiday.) There is also a small fix for cached readdir handling when directories are fragmented" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix __dcache_readdir() ceph: add acl, noacl options for cephfs mount ceph: make ceph_forget_all_cached_acls() static inline ceph: add missing init_acl() for mkdir() and atomic_open() ceph: fix ceph_set_acl() ceph: fix ceph_removexattr() ceph: remove xattr when null value is given to setxattr() ceph: properly handle XATTR_CREATE and XATTR_REPLACE
2014-02-17Merge tag 'dma-buf-for-3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf Pull dma-buf fix from Sumit Semwal: "Just some debugfs output updates. There's another patch related to dma-buf, but it'll get upstreamed via Greg KH's pull request" * tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf: dma-buf: update debugfs output
2014-02-17ceph: remove xattr when null value is given to setxattr()Yan, Zheng
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE to distinguish null value case from the zero-length value case. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "Here are some more powerpc fixes for 3.14 The main one is a nasty issue with the NUMA balancing support which requires a small generic change and the addition of a new accessor to set _PAGE_NUMA. Both have been reviewed and acked by Mel and Rik. The changelog should have plenty of details but basically, without this fix, we get random user segfaults and/or corruptions due to missing TLB/hash flushes. Aneesh series of 3 patches fixes it. We have some vDSO vs. perf fixes from Anton, some small EEH fixes from Gavin, a ppc32 regression vs the stack overflow detector, and a fix for the way we handle PCIe host bridge speed settings on pseries (which is needed for proper operations of AMD graphics cards on Power8)" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/eeh: Disable EEH on reboot powerpc/eeh: Cleanup on eeh_subsystem_enabled powerpc/powernv: Rework EEH reset powerpc: Use unstripped VDSO image for more accurate profiling data powerpc: Link VDSOs at 0x0 mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit mm: Dirty accountable change only apply to non prot numa case powerpc/mm: Add new "set" flag argument to pte/pmd update function powerpc/pseries: Add Gen3 definitions for PCIE link speed powerpc/pseries: Fix regression on PCI link speed powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
2014-02-17netdevice: move netdev_cap_txqueue for shared usage to headerDaniel Borkmann
In order to allow users to invoke netdev_cap_txqueue, it needs to be moved into netdevice.h header file. While at it, also add kernel doc header to document the API. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17netdevice: add queue selection fallback handler for ndo_select_queueDaniel Borkmann
Add a new argument for ndo_select_queue() callback that passes a fallback handler. This gets invoked through netdev_pick_tx(); fallback handler is currently __netdev_pick_tx() as most drivers invoke this function within their customized implementation in case for skbs that don't need any special handling. This fallback handler can then be replaced on other call-sites with different queue selection methods (e.g. in packet sockets, pktgen etc). This also has the nice side-effect that __netdev_pick_tx() is then only invoked from netdev_pick_tx() and export of that function to modules can be undone. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17net: sctp: Fix a_rwnd/rwnd management to reflect real state of the ↵Matija Glavinic Pecotic
receiver's buffer Implementation of (a)rwnd calculation might lead to severe performance issues and associations completely stalling. These problems are described and solution is proposed which improves lksctp's robustness in congestion state. 1) Sudden drop of a_rwnd and incomplete window recovery afterwards Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data), but size of sk_buff, which is blamed against receiver buffer, is not accounted in rwnd. Theoretically, this should not be the problem as actual size of buffer is double the amount requested on the socket (SO_RECVBUF). Problem here is that this will have bad scaling for data which is less then sizeof sk_buff. E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion of traffic of this size (less then 100B). An example of sudden drop and incomplete window recovery is given below. Node B exhibits problematic behavior. Node A initiates association and B is configured to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp message in 4G (LTE) network). On B data is left in buffer by not reading socket in userspace. Lets examine when we will hit pressure state and declare rwnd to be 0 for scenario with above stated parameters (rwnd == 10000, chunk size == 43, each chunk is sent in separate sctp packet) Logic is implemented in sctp_assoc_rwnd_decrease: socket_buffer (see below) is maximum size which can be held in socket buffer (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count) A simple expression is given for which it will be examined after how many packets for above stated parameters we enter pressure state: We start by condition which has to be met in order to enter pressure state: socket_buffer < currently_alloced; currently_alloced is represented as size of sctp packets received so far and not yet delivered to userspace. x is the number of chunks/packets (since there is no bundling, and each chunk is delivered in separate packet, we can observe each chunk also as sctp packet, and what is important here, having its own sk_buff): socket_buffer < x*each_sctp_packet; each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is twice the amount of initially requested size of socket buffer, which is in case of sctp, twice the a_rwnd requested: 2*rwnd < x*(payload+sizeof(struc sk_buff)); sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000 and each payload size is 43 20000 < x(43+190); x > 20000/233; x ~> 84; After ~84 messages, pressure state is entered and 0 rwnd is advertised while received 84*43B ~= 3612B sctp data. This is why external observer notices sudden drop from 6474 to 0, as it will be now shown in example: IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017] IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839] IP A.34340 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.34340: sctp (1) [COOKIE ACK] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0] <...> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18] --> Sudden drop IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] At this point, rwnd_press stores current rwnd value so it can be later restored in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This condition is not met since rwnd, after it hit 0, must first reach rwnd_press by adding amount which is read from userspace. Let us observe values in above example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the amount of actual sctp data currently waiting to be delivered to userspace is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed only for sctp data, which is ~3500. Condition is never met, and when userspace reads all data, rwnd stays on 3569. IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] --> At this point userspace read everything, rwnd recovered only to 3569 IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] Reproduction is straight forward, it is enough for sender to send packets of size less then sizeof(struct sk_buff) and receiver keeping them in its buffers. 2) Minute size window for associations sharing the same socket buffer In case multiple associations share the same socket, and same socket buffer (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one of the associations can permanently drop rwnd of other association(s). Situation will be typically observed as one association suddenly having rwnd dropped to size of last packet received and never recovering beyond that point. Different scenarios will lead to it, but all have in common that one of the associations (let it be association from 1)) nearly depleted socket buffer, and the other association blames socket buffer just for the amount enough to start the pressure. This association will enter pressure state, set rwnd_press and announce 0 rwnd. When data is read by userspace, similar situation as in 1) will occur, rwnd will increase just for the size read by userspace but rwnd_press will be high enough so that association doesn't have enough credit to reach rwnd_press and restore to previous state. This case is special case of 1), being worse as there is, in the worst case, only one packet in buffer for which size rwnd will be increased. Consequence is association which has very low maximum rwnd ('minute size', in our case down to 43B - size of packet which caused pressure) and as such unusable. Scenario happened in the field and labs frequently after congestion state (link breaks, different probabilities of packet drop, packet reordering) and with scenario 1) preceding. Here is given a deterministic scenario for reproduction: >From node A establish two associations on the same socket, with rcvbuf_policy being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1 repeat scenario from 1), that is, bring it down to 0 and restore up. Observe scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered', bring it down close to 0, as in just one more packet would close it. This has as a consequence that association number 2 is able to receive (at least) one more packet which will bring it in pressure state. E.g. if association 2 had rwnd of 10000, packet received was 43, and we enter at this point into pressure, rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will increase for 43, but conditions to restore rwnd to original state, just as in 1), will never be satisfied. --> Association 1, between A.y and B.12345 IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569] IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613] IP A.55915 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.55915: sctp (1) [COOKIE ACK] --> Association 2, between A.z and B.12346 IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173] IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240] IP A.55915 > B.12346: sctp (1) [COOKIE ECHO] IP B.12346 > A.55915: sctp (1) [COOKIE ACK] --> Deplete socket buffer by sending messages of size 43B over association 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] <...> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0] --> Sudden drop on 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using association 1 so there is place in buffer for only one more packet IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] --> Socket buffer is almost depleted, but there is space for one more packet, send them over association 2, size 43B IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Immediate drop IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Read everything from the socket, both association recover up to maximum rwnd they are capable of reaching, note that association 1 recovered up to 3698, and association 2 recovered only to 43 IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0] IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] A careful reader might wonder why it is necessary to reproduce 1) prior reproduction of 2). It is simply easier to observe when to send packet over association 2 which will push association into the pressure state. Proposed solution: Both problems share the same root cause, and that is improper scaling of socket buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while calculating rwnd is not possible due to fact that there is no linear relationship between amount of data blamed in increase/decrease with IP packet in which payload arrived. Even in case such solution would be followed, complexity of the code would increase. Due to nature of current rwnd handling, slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is entered is rationale, but it gives false representation to the sender of current buffer space. Furthermore, it implements additional congestion control mechanism which is defined on implementation, and not on standard basis. Proposed solution simplifies whole algorithm having on mind definition from rfc: o Receiver Window (rwnd): This gives the sender an indication of the space available in the receiver's inbound buffer. Core of the proposed solution is given with these lines: sctp_assoc_rwnd_update: if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0) asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1; else asoc->rwnd = 0; We advertise to sender (half of) actual space we have. Half is in the braces depending whether you would like to observe size of socket buffer as SO_RECVBUF or twice the amount, i.e. size is the one visible from userspace, that is, from kernelspace. In this way sender is given with good approximation of our buffer space, regardless of the buffer policy - we always advertise what we have. Proposed solution fixes described problems and removes necessity for rwnd restoration algorithm. Finally, as proposed solution is simplification, some lines of code, along with some bytes in struct sctp_association are saved. Version 2 of the patch addressed comments from Vlad. Name of the function is set to be more descriptive, and two parts of code are changed, in one removing the superfluous call to sctp_assoc_rwnd_update since call would not result in update of rwnd, and the other being reordering of the code in a way that call to sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2 in a way that existing function is not reordered/copied in line, but it is correctly called. Thanks Vlad for suggesting. Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bitAneesh Kumar K.V
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using a hash table MMU for various reasons (the flush is handled as part of the PTE modification when necessary). ppc64 thus doesn't implement flush_tlb_range for hash based MMUs. Additionally ppc64 require the tlb flushing to be batched within ptl locks. The reason to do that is to ensure that the hash page table is in sync with linux page table. We track the hpte index in linux pte and if we clear them without flushing hash and drop the ptl lock, we can have another cpu update the pte and can end up with duplicate entry in the hash table, which is fatal. We also want to keep set_pte_at simpler by not requiring them to do hash flush for performance reason. We do that by assuming that set_pte_at() is never *ever* called on a PTE that is already valid. This was the case until the NUMA code went in which broke that assumption. Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a way similar to ptep/pmdp_set_wrprotect(), with a generic implementation using set_pte_at() and a powerpc specific one using the appropriate mechanism needed to keep the hash table in sync. Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have a small collection of fixes in my for-linus branch. The big thing that stands out is a revert of a new ioctl. Users haven't shipped yet in btrfs-progs, and Dave Sterba found a better way to export the information" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: use right clone root offset for compressed extents btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol Btrfs: fix max_inline mount option Btrfs: fix a lockdep warning when cleaning up aborted transaction Revert "btrfs: add ioctl to export size of global metadata reservation"
2014-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds
Pull SCSI target fixes from Nicholas Bellinger: "Mostly minor fixes this time to v3.14-rc1 related changes. Also included is one fix for a free after use regression in persistent reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: Target/sbc: Fix protection copy routine IB/srpt: replace strict_strtoul() with kstrtoul() target: Simplify command completion by removing CMD_T_FAILED flag iser-target: Fix leak on failure in isert_conn_create_fastreg_pool iscsi-target: Fix SNACK Type 1 + BegRun=0 handling target: Fix missing length check in spc_emulate_evpd_83() qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div target: Fix free-after-use regression in PR unregister
2014-02-15Merge branches 'irq-urgent-for-linus' and 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq update from Thomas Gleixner: "Fix from the urgent branch: a trivial oneliner adding the missing Kconfig dependency curing build failures which have been discovered by several build robots. The update in the irq-core branch provides a new function in the irq/devres code, which is a prerequisite for driver developers to get rid of boilerplate code all over the place. Not a bugfix, but it has zero impact on the current kernel due to the lack of users. It's simpler to provide the infrastructure to interested parties via your tree than fulfilling the wishlist of driver maintainers on which particular commit or tag this should be based on" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Add devm_request_any_context_irq()
2014-02-15Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Kevin Hilman: "A collection of ARM SoC fixes for v3.14-rc1. Mostly a collection of Kconfig, device tree data and compilation fixes along with fix to drivers/phy that fixes a boot regression on some Marvell mvebu platforms" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: dma: mv_xor: Silence a bunch of LPAE-related warnings ARM: ux500: disable msp2 device tree node ARM: zynq: Reserve not DMAable space in front of the kernel ARM: multi_v7_defconfig: Select CONFIG_SOC_DRA7XX ARM: imx6: Initialize low-power mode early again ARM: pxa: fix various compilation problems ARM: pxa: fix compilation problem on AM300EPD board ARM: at91: add Atmel's SAMA5D3 Xplained board spi/atmel: document clock properties mmc: atmel-mci: document clock properties ARM: at91: enable USB host on at91sam9n12ek board ARM: at91/dt: fix sama5d3 ohci hclk clock reference ARM: at91/dt: sam9263: fix compatibility string for the I2C ata: sata_mv: Fix probe failures with optional phys drivers: phy: Add support for optional phys drivers: phy: Make NULL a valid phy reference ARM: fix HAVE_ARM_TWD selection for OMAP and shmobile ARM: moxart: move DMA_OF selection to driver ARM: hisi: fix kconfig warning on HAVE_ARM_TWD
2014-02-14Merge tag 'usb-3.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here is a bunch of USB fixes for 3.14-rc3. Most of these are xhci reverts, fixing a bunch of reported issues with USB 3 host controller issues that loads of people have been hitting (with the exception of kernel developers, all of our machines seem to be working fine, which is why these took so long to get resolved...) There are some other minor fixes and new device ids, as ususal. All have been in linux-next successfully" * tag 'usb-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits) usb: option: blacklist ZTE MF667 net interface Revert "usb: xhci: Link TRB must not occur within a USB payload burst" Revert "xhci: Avoid infinite loop when sg urb requires too many trbs" Revert "xhci: Set scatter-gather limit to avoid failed block writes." xhci 1.0: Limit arbitrarily-aligned scatter gather. Modpost: fixed USB alias generation for ranges including 0x9 and 0xA usb: core: Fix potential memory leak adding dyn USBdevice IDs USB: ftdi_sio: add Tagsys RFID Reader IDs usb: qcserial: add Netgear Aircard 340U usb-storage: enable multi-LUN scanning when needed USB: simple: add Dynastream ANT USB-m Stick device support usb-storage: add unusual-devs entry for BlackBerry 9000 usb-storage: restrict bcdDevice range for Super Top in Cypress ATACB usb: phy: move some error messages to debug usb: ftdi_sio: add Mindstorms EV3 console adapter usb: dwc2: fix memory corruption in dwc2 driver usb: dwc2: fix role switch breakage usb: dwc2: bail out early when booting with "nousb" Revert "xhci: replace xhci_read_64() with readq()" Revert "xhci: replace xhci_write_64() with writeq()" ...
2014-02-14Merge tag 'char-misc-3.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are some small char/misc driver fixes, along with some documentation updates, for 3.14-rc3. Nothing major, just a number of fixes for reported issues" * tag 'char-misc-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Revert "misc: eeprom: sunxi: Add new compatibles" Revert "ARM: sunxi: dt: Convert to the new SID compatibles" misc: mic: fix possible signed underflow (undefined behavior) in userspace API ARM: sunxi: dt: Convert to the new SID compatibles misc: eeprom: sunxi: Add new compatibles misc: genwqe: Fix potential memory leak when pinning memory Documentation:Update Documentation/zh_CN/arm64/memory.txt Documentation:Update Documentation/zh_CN/arm64/booting.txt Documentation:Chinese translation of Documentation/arm64/tagged-pointers.txt raw: set range for MAX_RAW_DEVS raw: test against runtime value of max_raw_minors Drivers: hv: vmbus: Don't timeout during the initial connection with host Drivers: hv: vmbus: Specify the target CPU that should receive notification VME: Correct read/write alignment algorithm mei: don't unset read cb ptr on reset mei: clear write cb from waiting list on reset
2014-02-14Revert "btrfs: add ioctl to export size of global metadata reservation"Chris Mason
This reverts commit 01e219e8069516cdb98594d417b8bb8d906ed30d. David Sterba found a different way to provide these features without adding a new ioctl. We haven't released any progs with this ioctl yet, so I'm taking this out for now until we finalize things. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz> CC: Jeff Mahoney <jeffm@suse.com>
2014-02-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "A collection of small fixes: - There still seem to be problems with asm goto which requires the empty asm hack. - If SMAP is disabled at compile time, don't enable it nor try to interpret a page fault as an SMAP violation. - Fix a case of unbounded recursion while tracing" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled compiler/gcc4: Make quirk for asm_volatile_goto() unconditional x86: Use preempt_disable_notrace() in cycles_2_ns()
2014-02-14Merge tag 'pm+acpi-3.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These include a fix for a recent intel_pstate regression, a fix for a regression in the ACPI-based PCI hotplug (ACPIPHP) code introduced during the 3.12 cycle, fixes for two bugs in the ACPI core introduced recently and a MAINTAINERS update related to cpufreq. Specifics: - Fix for a recent regression in the intel_pstate driver that introduced a race condition causing systems to crash during initialization in some situations. This removes the affected code altogether. From Dirk Brandewie. - ACPIPHP fix for a regression introduced during the 3.12 cycle causing devices to be dropped as a result of bus check notifications after system resume on some systems due to the way ACPIPHP interprets _STA return values (arguably incorrectly). From Mika Westerberg. - ACPI dock driver fix for a problem causing docking to fail due to a check that always fails after recent ACPI core changes (found by code inspection). - ACPI container driver fix to prevent memory from being leaked in an error code path after device_register() failures. - Update of the arm_big_little cpufreq driver maintainer's e-mail address" * tag 'pm+acpi-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: MAINTAINERS / cpufreq: update Sudeep's email address intel_pstate: Remove energy reporting from pstate_sample tracepoint ACPI / container: Fix error code path in container_device_attach() ACPI / hotplug / PCI: Relax the checking of _STA return values ACPI / dock: Use acpi_device_enumerated() to check if dock is present
2014-02-14Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO fixes from Jens Axboe: "Second round of updates and fixes for 3.14-rc2. Most of this stuff has been queued up for a while. The notable exception is the blk-mq changes, which are naturally a bit more in flux still. The pull request contains: - Two bug fixes for the new immutable vecs, causing crashes with raid or swap. From Kent. - Various blk-mq tweaks and fixes from Christoph. A fix for integrity bio's from Nic. - A few bcache fixes from Kent and Darrick Wong. - xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas Swenson, and Roger Pau Monne. - Fix for a vec miscount with integrity vectors from Martin. - Minor annotations or fixes from Masanari Iida and Rashika Kheria. - Tweak to null_blk to do more normal FIFO processing of requests from Shlomo Pongratz. - Elevator switching bypass fix from Tejun. - Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from me" * 'for-linus' of git://git.kernel.dk/linux-block: (31 commits) block: add cond_resched() to potentially long running ioctl discard loop xen-blkback: init persistent_purge_work work_struct blk-mq: pair blk_mq_start_request / blk_mq_requeue_request blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq block: Fix cloning of discard/write same bios block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show blk-mq: rework flush sequencing logic null_blk: use blk_complete_request and blk_mq_complete_request virtio_blk: use blk_mq_complete_request blk-mq: rework I/O completions fs: Add prototype declaration to appropriate header file include/linux/bio.h fs: Mark function as static in fs/bio-integrity.c block/null_blk: Fix completion processing from LIFO to FIFO block: Explicitly handle discard/write same segments block: Fix nr_vecs for inline integrity vectors blk-mq: Add bio_integrity setup to blk_mq_make_request blk-mq: initialize sg_reserved_size blk-mq: handle dma_drain_size blk-mq: divert __blk_put_request for MQ ops blk-mq: support at_head inserations for blk_execute_rq ...
2014-02-14Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull RDMA/InfiniBand fixes from Roland Dreier: - Fix some rough edges from the "IP addressing for IBoE" merge - Other misc fixes, mostly to hardware drivers * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (21 commits) RDMA/ocrdma: Fix load time panic during GID table init RDMA/ocrdma: Fix traffic class shift IB/iser: Fix use after free in iser_snd_completion() IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection IB/usnic: Fix smatch endianness error IB/mlx5: Remove dependency on X86 mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() use IB/qib: Add missing serdes init sequence RDMA/cxgb4: Add missing neigh_release in LE-Workaround path IB: Report using RoCE IP based gids in port caps IB/mlx4: Build the port IBoE GID table properly under bonding IB/mlx4: Do IBoE GID table resets per-port IB/mlx4: Do IBoE locking earlier when initializing the GID table IB/mlx4: Move rtnl locking to the right place IB/mlx4: Make sure GID index 0 is always occupied IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device RDMA/amso1100: Fix error return code RDMA/nes: Fix error return code IB/mlx5: Don't set "block multicast loopback" capability IB/mlx5: Fix binary compatibility with libmlx5 ...
2014-02-14Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', ↵Roland Dreier
'ocrdma', 'qib' and 'usnic' into for-next
2014-02-14workqueue: add args to workqueue lockdep nameLi Zhong
Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in process_one_work(). Maybe #fmt plus #args could be used as the lock_name to give some more information for some fmt string like the above. __builtin_constant_p() check is removed (as there seems no good way to check all the variables in args list). However, by removing the check, it only adds two additional "s for those constants. Some lockdep name examples printed out after the change: lockdep name wq->name "events_long" events_long "%s"("khelper") khelper "xfs-data/%s"mp->m_fsname xfs-data/dm-3 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-02-13mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() useRoland Dreier
On some architectures (for example, arm), we don't end up indirectly pulling in the declaration of kzalloc() and kfree(), and so building anything that includes <linux/mlx5/driver.h> breaks. Fix this by adding an explicit include to get the declaration. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB: Report using RoCE IP based gids in port capsMoni Shoua
For userspace RoCE UD QPs we need to know the GID format that the kernel uses, e.g when working over older kernels. For that end, add a new port capability IB_PORT_IP_BASED_GIDS and report it when query port is issued. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13net: ip, ipv6: handle gso skbs in forwarding pathFlorian Westphal
Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13net: core: introduce netif_skb_dev_featuresFlorian Westphal
Will be used by upcoming ipv4 forward path change that needs to determine feature mask using skb->dst->dev instead of skb->dev. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact()Alexander Gordeev
The new functions are special cases for pci_enable_msi_range() and pci_enable_msix_range() when a particular number of MSI or MSI-X is needed. By contrast with pci_enable_msi_range() and pci_enable_msix_range() functions, pci_enable_msi_exact() and pci_enable_msix_exact() return zero in case of success, which indicates MSI or MSI-X interrupts have been successfully allocated. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-02-13compiler/gcc4: Make quirk for asm_volatile_goto() unconditionalSteven Noonan
I started noticing problems with KVM guest destruction on Linux 3.12+, where guest memory wasn't being cleaned up. I bisected it down to the commit introducing the new 'asm goto'-based atomics, and found this quirk was later applied to those. Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the known 'asm goto' bug) I am still getting some kind of miscompilation. If I enable the asm_volatile_goto quirk for my compiler, KVM guests are destroyed correctly and the memory is cleaned up. So make the quirk unconditional for now, until bug is found and fixed. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Noonan <steven@uplinklabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1392274867-15236-1-git-send-email-steven@uplinklabs.net Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-13dma-buf: update debugfs outputSumit Semwal
Russell King observed 'wierd' looking output from debugfs, and also suggested better ways of getting device names (use KBUILD_MODNAME, dev_name()) This patch addresses these issues to make the debugfs output correct and better looking. While at it, replace seq_printf with seq_puts to remove the checkpatch.pl warnings. Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2014-02-13Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: MAINTAINERS / cpufreq: update Sudeep's email address intel_pstate: Remove energy reporting from pstate_sample tracepoint
2014-02-13intel_pstate: Remove energy reporting from pstate_sample tracepointDirk Brandewie
Remove the reporting of energy since it does not provide any useful information about the state of the driver and will be a maintainance headache going forward since the RAPL energy units register is not architectural and subject to change between micro-architectures References: https://bugzilla.kernel.org/show_bug.cgi?id=69831 Fixes: b69880f9ccf7 (intel_pstate: Add trace point to report internal state.) Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-12target: Simplify command completion by removing CMD_T_FAILED flagRoland Dreier
The CMD_T_FAILED flag is set used in one place to record the result of a trivial test, and it is only tested once, few lines later. We might as well make the code simpler and easier to read by directly doing the test of "success" where we want to use it. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12Merge tag 'stable/for-linus-3.14-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bugfixes from Konrad Rzeszutek Wilk: "This has an healthy amount of code being removed - which we do not use anymore (the only user of it was ia64 Xen which had been removed already). The other bug-fixes are to make Xen ARM be able to use the new event channel mechanism and proper export of header files to user-space. Summary: - Fix ARM and Xen FIFO not working. - Remove more Xen ia64 vestigates. - Fix UAPI missing Xen files" * tag 'stable/for-linus-3.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: ia64/xen: Remove Xen support for ia64 even more xen: install xen/gntdev.h and xen/gntalloc.h xen/events: bind all new interdomain events to VCPU0
2014-02-12Merge tag 'gpio-v3.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Here are some accumulated patches with small fixes for this and that in a few GPIO drivers, and a more important fix to an #ifdef in the GPIO consumer header. Summary: - Get #ifdef's right in the <linux/gpio/consumer.h> header. - Minor fixes to tb10x, clps711x, bcm281xx, intel-mid and xtensa GPIO drivers" * tag 'gpio-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: consumer.h: Move forward declarations outside #ifdef gpio: tb10x: GPIO_TB10X needs to select GENERIC_IRQ_CHIP gpio: clps711x: Add module alias to support module auto loading gpio: bcm281xx: Update MODULE_AUTHOR gpio: intel-mid: fix the incorrect return of idle callback gpio: xtensa: fix build when XCHAL_HAVE_CP is 0
2014-02-12Merge tag 'spi-v3.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few driver and documentation fixes, plus a fix for double error handling which had crept in due to the confusing documentation - it wasn't clear if the core or the driver was responsible for cleanup in error cases so both tried to do it with unfortunate results" * tag 'spi-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: nuc900: Set SPI_LSB_FIRST for master->mode_bits if hw->pdata->lsb is true spi: rspi: Document support for Renesas QSPI in Kconfig spi: Fix crash with double message finalisation on error handling spi: correct the transfer_one_message documentation wording spi: document the transfer_one spi_master callback spi: spi.h: clarify the documentation of transfer_one
2014-02-12Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Nothing too crazy. Radeon irq fixes, i915 regression fixes, exynos fixes, tda998x chip fixes, and a bunch of msm fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (31 commits) drm/i915: Pair va_copy with va_end in i915_error_vprintf drm/i915: Fix intel_pipe_to_cpu_transcoder for UMS drm/i915: Disable dp aux irq on g4x drm/msm: bigger synchronization hammer drm/exynos: Convert to use the standard hdmi.h header drm/exynos: Fix trivial typo drm/exynos: Remove unnecessary semicolon drm/exynos: Fix multiplatform breakage for ipp/gsc drm/exynos: Fix freeing issues in exynos_drm_drv.c drm/radeon: add missing include in btc_dpm.c drm/radeon/dpm: fix uninitialized read from stack in kv_dpm_late_enable drm/radeon: remove useless return drm/radeon/dpm: use stored max_vddc rather than looking it up drm/radeon/dpm: use the driver state for dpm debugfs drm/radeon: fix UVD IRQ support on 7xx drm/radeon: fix UVD IRQ support on SI drm/msm: fix deadlock in bo create fail path drm/msm/mdp4: cursor fixes drm/msm/mdp4: pageflip fixes drm/msm/mdp5: fix ref leaks in error paths ...
2014-02-12drm/vmwgfx: Get maximum mob size from register SVGA_REG_MOB_MAX_SIZECharmaine Lee
This patch queries the register SVGA_REG_MOB_MAX_SIZE for the maximum size of a single mob. Signed-off-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>