aboutsummaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2015-05-21Merge branch 'linux-linaro-lsk-v3.10' into linux-linaro-lsk-v3.10-rtAlex Shi
Conflicts: include/linux/interrupt.h
2015-05-21 Merge tag 'v3.10.79' into linux-linaro-lsk-v3.10Alex Shi
This is the 3.10.79 stable release
2015-05-17ACPICA: Utilities: Cleanup to enforce ↵Lv Zheng
ACPI_PHYSADDR_TO_PTR()/ACPI_PTR_TO_PHYSADDR(). commit 6d3fd3cc33d50e4c0d0c0bd172de02caaec3127c upstream. ACPICA commit 154f6d074dd38d6ebc0467ad454454e6c5c9ecdf There are code pieces converting pointers using "(acpi_physical_address) x" or "ACPI_CAST_PTR (t, x)" formats, this patch cleans up them. Known issues: 1. Cleanup of "(ACPI_PHYSICAL_ADDRRESS) x" for a table field For the conversions around the table fields, it is better to fix it with alignment also fixed. So this patch doesn't modify such code. There should be no functional problem by leaving them unchanged. Link: https://github.com/acpica/acpica/commit/154f6d07 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dirk Behme <dirk.behme@gmail.com> Signed-off-by: George G. Davis <george_davis@mentor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17ACPICA: Tables: Change acpi_find_root_pointer() to use acpi_physical_address.Lv Zheng
commit f254e3c57b9d952e987502aefa0804c177dd2503 upstream. ACPICA commit 7d9fd64397d7c38899d3dc497525f6e6b044e0e3 OSPMs like Linux expect an acpi_physical_address returning value from acpi_find_root_pointer(). This triggers warnings if sizeof (acpi_size) doesn't equal to sizeof (acpi_physical_address): drivers/acpi/osl.c:275:3: warning: passing argument 1 of 'acpi_find_root_pointer' from incompatible pointer type [enabled by default] In file included from include/acpi/acpi.h:64:0, from include/linux/acpi.h:36, from drivers/acpi/osl.c:41: include/acpi/acpixf.h:433:1: note: expected 'acpi_size *' but argument is of type 'acpi_physical_address *' This patch corrects acpi_find_root_pointer(). Link: https://github.com/acpica/acpica/commit/7d9fd643 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dirk Behme <dirk.behme@gmail.com> Signed-off-by: George G. Davis <george_davis@mentor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17mmc: card: Don't access RPMB partitions for normal read/writeChuanxiao Dong
commit 4e93b9a6abc0d028daf3c8a00cb77b679d8a4df4 upstream. During kernel boot, it will try to read some logical sectors of each block device node for the possible partition table. But since RPMB partition is special and can not be accessed by normal eMMC read / write CMDs, it will cause below error messages during kernel boot: ... mmc0: Got data interrupt 0x00000002 even though no data operation was in progress. mmcblk0rpmb: error -110 transferring data, sector 0, nr 32, cmd response 0x900, card status 0xb00 mmcblk0rpmb: retrying using single block read mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900 mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900 mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900 mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900 mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900 mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900 end_request: I/O error, dev mmcblk0rpmb, sector 0 Buffer I/O error on device mmcblk0rpmb, logical block 0 end_request: I/O error, dev mmcblk0rpmb, sector 8 Buffer I/O error on device mmcblk0rpmb, logical block 1 end_request: I/O error, dev mmcblk0rpmb, sector 16 Buffer I/O error on device mmcblk0rpmb, logical block 2 end_request: I/O error, dev mmcblk0rpmb, sector 24 Buffer I/O error on device mmcblk0rpmb, logical block 3 ... This patch will discard the access request in eMMC queue if it is RPMB partition access request. By this way, it avoids trigger above error messages. Fixes: 090d25fe224c ("mmc: core: Expose access to RPMB partition") Signed-off-by: Yunpeng Gao <yunpeng.gao@intel.com> Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Tested-by: Michael Shigorin <mike@altlinux.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17pinctrl: Don't just pretend to protect pinctrl_maps, do it for realDoug Anderson
commit c5272a28566b00cce79127ad382406e0a8650690 upstream. Way back, when the world was a simpler place and there was no war, no evil, and no kernel bugs, there was just a single pinctrl lock. That was how the world was when (57291ce pinctrl: core device tree mapping table parsing support) was written. In that case, there were instances where the pinctrl mutex was already held when pinctrl_register_map() was called, hence a "locked" parameter was passed to the function to indicate that the mutex was already locked (so we shouldn't lock it again). A few years ago in (42fed7b pinctrl: move subsystem mutex to pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex. ...but (oops) we forgot to re-think about the whole "locked" parameter for pinctrl_register_map(). Basically the "locked" parameter appears to still refer to whether the bigger pinctrl_dev mutex is locked, but we're using it to skip locks of our (now separate) pinctrl_maps_mutex. That's kind of a bad thing(TM). Probably nobody noticed because most of the calls to pinctrl_register_map happen at boot time and we've got synchronous device probing. ...and even cases where we're asynchronous don't end up actually hitting the race too often. ...but after banging my head against the wall for a bug that reproduced 1 out of 1000 reboots and lots of looking through kgdb, I finally noticed this. Anyway, we can now safely remove the "locked" parameter and go back to a war-free, evil-free, and kernel-bug-free world. Fixes: 42fed7ba44e4 ("pinctrl: move subsystem mutex to pinctrl_dev struct") Signed-off-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17drm/i915: Add missing MacBook Pro models with dual channel LVDSLukas Wunner
commit 3916e3fd81021fb795bfbdb17f375b6b3685bced upstream. Single channel LVDS maxes out at 112 MHz. The 15" pre-retina models shipped with 1440x900 (106 MHz) by default or 1680x1050 (119 MHz) as a BTO option, both versions used dual channel LVDS even though the smaller one would have fit into a single channel. Notes: Bug report showing that the MacBookPro8,2 with 1440x900 uses dual channel LVDS (this lead to it being hardcoded in intel_lvds.c by Daniel Vetter with commit 618563e3945b9d0864154bab3c607865b557cecc): https://bugzilla.kernel.org/show_bug.cgi?id=42842 If i915.lvds_channel_mode=2 is missing even though the machine needs it, every other vertical line is white and consequently, only the left half of the screen is visible (verified by myself on a MacBookPro9,1). Forum posting concerning a MacBookPro6,2 with 1440x900, author is using i915.lvds_channel_mode=2 on the kernel command line, proving that the machine uses dual channels: https://bbs.archlinux.org/viewtopic.php?id=185770 Chi Mei N154C6-L04 with 1440x900 is a replacement panel for all MacBook Pro "A1286" models, and that model number encompasses the MacBookPro6,2 / 8,2 / 9,1. Page 17 of the panel's datasheet shows it's driven with dual channel LVDS: http://www.ebay.com/itm/-/400690878560 http://www.everymac.com/ultimate-mac-lookup/?search_keywords=A1286 http://www.taopanel.com/chimei/datasheet/N154C6-L04.pdf Those three 15" models, MacBookPro6,2 / 8,2 / 9,1, are the only ones with i915 graphics and dual channel LVDS, so that list should be complete. And the 8,2 is already in intel_lvds.c. Possible motivation to use dual channel LVDS even on the 1440x900 models: Reduce the number of different parts, i.e. use identical logic boards and display cabling on both versions and the only differing component is the panel. Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Jani Nikula <jani.nikula@intel.com> [Jani: included notes in the commit message for posterity] Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17gpio: sysfs: fix memory leaks and device hotplugJohan Hovold
commit 483d821108791092798f5d230686868112927044 upstream. Unregister GPIOs requested through sysfs at chip remove to avoid leaking the associated memory and sysfs entries. The stale sysfs entries prevented the gpio numbers from being exported when the gpio range was later reused (e.g. at device reconnect). This also fixes the related module-reference leak. Note that kernfs makes sure that any on-going sysfs operations finish before the class devices are unregistered and that further accesses fail. The chip exported flag is used to prevent gpiod exports during removal. This also makes it harder to trigger, but does not fix, the related race between gpiochip_remove and export_store, which is really a race with gpiod_request that needs to be addressed separately. Also note that this would prevent the crashes (e.g. NULL-dereferences) at reconnect that affects pre-3.18 kernels, as well as use-after-free on operations on open attribute files on pre-3.14 kernels (prior to kernfs). Fixes: d8f388d8dc8d ("gpio: sysfs interface") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17gpio: unregister gpiochip device before removing itJohan Hovold
commit 01cca93a9491ed95992523ff7e79dd9bfcdea8e0 upstream. Unregister gpiochip device (used to export information through sysfs) before removing it internally. This way removal will reverse addition. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17xen/console: Update console event channel on resumeBoris Ostrovsky
commit b9d934f27c91b878c4b2e64299d6e419a4022f8d upstream. After a resume the hypervisor/tools may change console event channel number. We should re-query it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13UBI: fix soft lockup in ubi_check_volume()hujianyang
commit 9aa272b492e7551a9ee0e2c83c720ea013698485 upstream. Running mtd-utils/tests/ubi-tests/io_basic.c could cause soft lockup or watchdog reset. It is because *updatevol* will perform ubi_check_volume() after updating finish and this function will full scan the updated lebs if the volume is initialized as STATIC_VOLUME. This patch adds *cond_resched()* in the loop of lebs scan to avoid soft lockup. Helped by Richard Weinberger <richard@nod.at> [ 2158.067096] INFO: rcu_sched self-detected stall on CPU { 1} (t=2101 jiffies g=1606 c=1605 q=56) [ 2158.172867] CPU: 1 PID: 2073 Comm: io_basic Tainted: G O 3.10.53 #21 [ 2158.172898] [<c000f624>] (unwind_backtrace+0x0/0x120) from [<c000c294>] (show_stack+0x10/0x14) [ 2158.172918] [<c000c294>] (show_stack+0x10/0x14) from [<c008ac3c>] (rcu_check_callbacks+0x1c0/0x660) [ 2158.172936] [<c008ac3c>] (rcu_check_callbacks+0x1c0/0x660) from [<c002b480>] (update_process_times+0x38/0x64) [ 2158.172953] [<c002b480>] (update_process_times+0x38/0x64) from [<c005ff38>] (tick_sched_handle+0x54/0x60) [ 2158.172966] [<c005ff38>] (tick_sched_handle+0x54/0x60) from [<c00601ac>] (tick_sched_timer+0x44/0x74) [ 2158.172978] [<c00601ac>] (tick_sched_timer+0x44/0x74) from [<c003f348>] (__run_hrtimer+0xc8/0x1b8) [ 2158.172992] [<c003f348>] (__run_hrtimer+0xc8/0x1b8) from [<c003fd9c>] (hrtimer_interrupt+0x128/0x2a4) [ 2158.173007] [<c003fd9c>] (hrtimer_interrupt+0x128/0x2a4) from [<c0246f1c>] (arch_timer_handler_virt+0x28/0x30) [ 2158.173022] [<c0246f1c>] (arch_timer_handler_virt+0x28/0x30) from [<c0086214>] (handle_percpu_devid_irq+0x9c/0x124) [ 2158.173036] [<c0086214>] (handle_percpu_devid_irq+0x9c/0x124) from [<c0082bd8>] (generic_handle_irq+0x20/0x30) [ 2158.173049] [<c0082bd8>] (generic_handle_irq+0x20/0x30) from [<c000969c>] (handle_IRQ+0x64/0x8c) [ 2158.173060] [<c000969c>] (handle_IRQ+0x64/0x8c) from [<c0008544>] (gic_handle_irq+0x3c/0x60) [ 2158.173074] [<c0008544>] (gic_handle_irq+0x3c/0x60) from [<c02f0f80>] (__irq_svc+0x40/0x50) [ 2158.173083] Exception stack(0xc4043c98 to 0xc4043ce0) [ 2158.173092] 3c80: c4043ce4 00000019 [ 2158.173102] 3ca0: 1f8a865f c050ad10 1f8a864c 00000031 c04b5970 0003ebce 00000000 f3550000 [ 2158.173113] 3cc0: bf00bc68 00000800 0003ebce c4043ce0 c0186d14 c0186cb8 80000013 ffffffff [ 2158.173130] [<c02f0f80>] (__irq_svc+0x40/0x50) from [<c0186cb8>] (read_current_timer+0x4/0x38) [ 2158.173145] [<c0186cb8>] (read_current_timer+0x4/0x38) from [<1f8a865f>] (0x1f8a865f) [ 2183.927097] BUG: soft lockup - CPU#1 stuck for 22s! [io_basic:2073] [ 2184.002229] Modules linked in: nandflash(O) [last unloaded: nandflash] Signed-off-by: Wang Kai <morgan.wang@huawei.com> Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13Drivers: hv: vmbus: Don't wait after requesting offersK. Y. Srinivasan
commit 73cffdb65e679b98893f484063462c045adcf212 upstream. Don't wait after sending request for offers to the host. This wait is unnecessary and simply adds 5 seconds to the boot time. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13staging: panel: fix lcd typeSudip Mukherjee
commit 2c20d92dad5db6440cfa88d811b69fd605240ce4 upstream. the lcd type as defined in the Kconfig is not matching in the code. as a result the rs, rw and en pins were getting interchanged. Kconfig defines the value of PANEL_LCD to be 1 if we select custom configuration but in the code LCD_TYPE_CUSTOM is defined as 5. my hardware is LCD_TYPE_CUSTOM, but the pins were assigned to it as pins of LCD_TYPE_OLD, and it was not working. Now values are corrected with referenece to the values defined in Kconfig and it is working. checked on JHD204A lcd with LCD_TYPE_CUSTOM configuration. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Acked-by: Willy Tarreau <w@1wt.eu> [wt: backport to 3.10 and 3.14] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13usb: gadget: printer: enqueue printer's response for setup requestAndrzej Pietrasiewicz
commit eb132ccbdec5df46e29c9814adf76075ce83576b upstream. Function-specific setup requests should be handled in such a way, that apart from filling in the data buffer, the requests are also actually enqueued: if function-specific setup is called from composte_setup(), the "usb_ep_queue()" block of code in composite_setup() is skipped. The printer function lacks this part and it results in e.g. get device id requests failing: the host expects some response, the device prepares it but does not equeue it for sending to the host, so the host finally asserts timeout. This patch adds enqueueing the prepared responses. Fixes: 2e87edf49227: "usb: gadget: make g_printer use composite" Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Signed-off-by: Felipe Balbi <balbi@ti.com> [ported to stable 3.10 and 3.14] Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13usb: host: oxu210hp: use new USB_RESUME_TIMEOUTFelipe Balbi
commit 84c0d178eb9f3a3ae4d63dc97a440266cf17f7f5 upstream. Make sure we're using the new macro, so our resume signaling will always pass certification. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-133w-sas: fix command completion raceChristoph Hellwig
commit 579d69bc1fd56d5af5761969aa529d1d1c188300 upstream. The 3w-sas driver needs to tear down the dma mappings before returning the command to the midlayer, as there is no guarantee the sglist and count are valid after that point. Also remove the dma mapping helpers which have another inherent race due to the request_id index. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Torsten Luettgert <ml-lkml@enda.eu> Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-133w-9xxx: fix command completion raceChristoph Hellwig
commit 118c855b5623f3e2e6204f02623d88c09e0c34de upstream. The 3w-9xxx driver needs to tear down the dma mappings before returning the command to the midlayer, as there is no guarantee the sglist and count are valid after that point. Also remove the dma mapping helpers which have another inherent race due to the request_id index. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-133w-xxxx: fix command completion raceChristoph Hellwig
commit 9cd9554615cba14f0877cc9972a6537ad2bdde61 upstream. The 3w-xxxx driver needs to tear down the dma mappings before returning the command to the midlayer, as there is no guarantee the sglist and count are valid after that point. Also remove the dma mapping helpers which have another inherent race due to the request_id index. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13rbd: end I/O the entire obj_request on errorIlya Dryomov
commit 082a75dad84d79d1c15ea9e50f31cb4bb4fa7fd6 upstream. When we end I/O struct request with error, we need to pass obj_request->length as @nr_bytes so that the entire obj_request worth of bytes is completed. Otherwise block layer ends up confused and we trip on rbd_assert(more ^ (which == img_request->obj_request_count)); in rbd_img_obj_callback() due to more being true no matter what. We already do it in most cases but we are missing some, in particular those where we don't even get a chance to submit any obj_requests, due to an early -ENOMEM for example. A number of obj_request->xferred assignments seem to be redundant but I haven't touched any of obj_request->xferred stuff to keep this small and isolated. Cc: Alex Elder <elder@linaro.org> Reported-by: Shawn Edwards <lesser.evil@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13serial: of-serial: Remove device_type = "serial" registrationMichal Simek
commit 6befa9d883385c580369a2cc9e53fbf329771f6d upstream. Do not probe all serial drivers by of_serial.c which are using device_type = "serial"; property. Only drivers which have valid compatible strings listed in the driver should be probed. When PORT_UNKNOWN is setup probe will fail anyway. Arnd quotation about driver historical background: "when I wrote that driver initially, the idea was that it would get used as a stub to hook up all other serial drivers but after that, the common code learned to create platform devices from DT" This patch fix the problem with on the system with xilinx_uartps and 16550a where of_serial failed to register for xilinx_uartps and because of irq_dispose_mapping() removed irq_desc. Then when xilinx_uartps was asking for irq with request_irq() EINVAL is returned. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-12Merge branch 'v3.10/topic/gator' into linux-linaro-lsk-v3.10Kevin Hilman
* v3.10/topic/gator: gator: Enable multiple source copies to exist in Android build environments gator: Add config for building the module in-tree gator: Version 5.21.1
2015-05-12Merge branch 'linux-linaro-lsk-v3.10' into linux-linaro-lsk-v3.10-rtAlex Shi
Conflicts: kernel/softirq.c
2015-05-12Merge remote-tracking branch 'origin/v3.10/topic/zram' into linux-linaro-lskAlex Shi
Conflicts: mm/Kconfig mm/Makefile
2015-05-12 Merge tag 'v3.10.77' into linux-linaro-lskAlex Shi
This is the 3.10.77 stable release Conflicts: drivers/video/console/Kconfig scripts/kconfig/menu.c
2015-05-11zram: fix umount-reset_store-mount race conditionSergey Senozhatsky
Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to avoid ->bd_holders race condition: CPU0 CPU1 umount /* zram->init_done is true */ reset_store() bdev->bd_holders == 0 mount ... zram_make_request() zram_reset_device() However, his solution required some considerable amount of code movement, which we can avoid. Apart from using bdev->bd_mutex in reset_store(), this patch also simplifies zram_reset_device(). zram_reset_device() has a bool parameter reset_capacity which tells it whether disk capacity and itself disk should be reset. There are two zram_reset_device() callers: -- zram_exit() passes reset_capacity=false -- reset_store() passes reset_capacity=true So we can move reset_capacity-sensitive work out of zram_reset_device() and perform it unconditionally in reset_store(). This also lets us drop reset_capacity parameter from zram_reset_device() and pass zram pointer only. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ba6b17d68c8e3aa8d55d0474299cb931965c5ea5) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: free meta table in zram_meta_freeGanesh Mahendran
zram_meta_alloc() and zram_meta_free() are a pair. In zram_meta_alloc(), meta table is allocated. So it it better to free it in zram_meta_free(). Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 1fec117281d9f5349c35279c9521f4096fa33357) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11mm/zram: correct ZRAM_ZERO flag bit positionMahendran Ganesh
In struct zram_table_entry, the element *value* contains obj size and obj zram flags. Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj zram_flags. So the first zram flag(ZRAM_ZERO) should be from ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1). This patch fixes this cosmetic issue. Also fix a typo, "page in now accessed" -> "page is now accessed" Signed-off-by: Mahendran Ganesh <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d49b1c254c997195872a9e8913660a788298921e) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: avoid kunmap_atomic() of a NULL pointerWeijie Yang
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram page becomes a full-zeroed page after a partial write io. The current code doesn't handle this case and performs kunmap_atomic() on a NULL pointer, which panics the kernel. This patch fixes this issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit c406515239376fc93a30d5d03192182160cbd3fb) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: avoid NULL pointer access in concurrent situationWeijie Yang
There is a rare NULL pointer bug in mem_used_total_show() and mem_used_max_store() in concurrent situation, like this: zram is not initialized, process A is a mem_used_total reader which runs periodically, while process B try to init zram. process A process B access meta, get a NULL value init zram, done init_done() is true access meta->mem_pool, get a NULL pointer BUG This patch fixes this issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 5a99e95b8d1cd47f6feddcdca6c71d22060df8a2) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: use notify_free to account all free notificationsSergey Senozhatsky
`notify_free' device attribute accounts the number of slot free notifications and internally represents the number of zram_free_page() calls. Slot free notifications are sent only when device is used as a swap device, hence `notify_free' is used only for swap devices. Since f4659d8e620d08 (zram: support REQ_DISCARD) ZRAM handles yet another one free notification (also via zram_free_page() call) -- REQ_DISCARD requests, which are sent by a filesystem, whenever some data blocks are discarded. However, there is no way to know the number of notifications in the latter case. Use `notify_free' to account the number of pages freed by zram_bio_discard() and zram_slot_free_notify(). Depending on usage scenario `notify_free' represents: a) the number of pages freed because of slot free notifications, which is equal to the number of swap_slot_free_notify() calls, so there is no behaviour change b) the number of pages freed because of REQ_DISCARD notifications Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 015254daf1753003c19c46b90ee85a963260d270) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: Documentation/ABI/testing/sysfs-block-zram
2015-05-11zram: report maximum used memoryMinchan Kim
Normally, zram user could get maximum memory usage zram consumed via polling mem_used_total with sysfs in userspace. But it has a critical problem because user can miss peak memory usage during update inverval of polling. For avoiding that, user should poll it with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault delay when memory pressure is heavy. It would be troublesome. This patch adds new knob "mem_used_max" so user could see the maximum memory usage easily via reading the knob and reset it via "echo 0 > /sys/block/zram0/mem_used_max". Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Reviewed-by: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 461a8eee6af3b55745be64bea403ed0b743563cf) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: zram memory size limitationMinchan Kim
Since zram has no control feature to limit memory usage, it makes hard to manage system memrory. This patch adds new knob "mem_limit" via sysfs to set up the a limit so that zram could fail allocation once it reaches the limit. In addition, user could change the limit in runtime so that he could manage the memory more dynamically. Initial state is no limit so it doesn't break old behavior. [akpm@linux-foundation.org: fix typo, per Sergey] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 9ada9da9573f3460b156b7755c093e30b258eacb) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zsmalloc: change return value unit of zs_get_total_size_bytesMinchan Kim
zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with *byte unit* but zsmalloc operates *page unit* rather than byte unit so let's change the API so benefit we could get is that reduce unnecessary overhead (ie, change page unit with byte unit) in zsmalloc. Since return type is pages, "zs_get_total_pages" is better than "zs_get_total_size_bytes". Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 722cdc17232f0f684011407f7cf3c40d39457971) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: fix incorrect stat with failed_readsChao Yu
Since we allocate a temporary buffer in zram_bvec_read to handle partial page operations in commit 924bd88d703e ("Staging: zram: allow partial page operations"), our ->failed_reads value may be incorrect as we do not increase its value when failing to allocate the temporary buffer. Let's fix this issue and correct the annotation of failed_reads. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 0cf1e9d6c34d4c82ac3af8015594849814843d36) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: replace global tb_lock with fine grain lockWeijie Yang
Currently, we use a rwlock tb_lock to protect concurrent access to the whole zram meta table. However, according to the actual access model, there is only a small chance for upper user to access the same table[index], so the current lock granularity is too big. The idea of optimization is to change the lock granularity from whole meta table to per table entry (table -> table[index]), so that we can protect concurrent access to the same table[index], meanwhile allow the maximum concurrency. With this in mind, several kinds of locks which could be used as a per-entry lock were tested and compared: Test environment: x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04, kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO. iozone test: iozone -t 4 -R -r 16K -s 200M -I +Z (1GB zram with ext4 filesystem, take the average of 10 tests, KB/s) Test base CAS spinlock rwlock bit_spinlock ------------------------------------------------------------------- Initial write 1381094 1425435 1422860 1423075 1421521 Rewrite 1529479 1641199 1668762 1672855 1654910 Read 8468009 11324979 11305569 11117273 10997202 Re-read 8467476 11260914 11248059 11145336 10906486 Reverse Read 6821393 8106334 8282174 8279195 8109186 Stride read 7191093 8994306 9153982 8961224 9004434 Random read 7156353 8957932 9167098 8980465 8940476 Mixed workload 4172747 5680814 5927825 5489578 5972253 Random write 1483044 1605588 1594329 1600453 1596010 Pwrite 1276644 1303108 1311612 1314228 1300960 Pread 4324337 4632869 4618386 4457870 4500166 To enhance the possibility of access the same table[index] concurrently, set zram a small disksize(10MB) and let threads run with large loop count. fio test: fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers --scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4 --filename=/dev/zram0 --name=seq-write --rw=write --stonewall --name=seq-read --rw=read --stonewall --name=seq-readwrite --rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall (10MB zram raw block device, take the average of 10 tests, KB/s) Test base CAS spinlock rwlock bit_spinlock ------------------------------------------------------------- seq-write 933789 999357 1003298 995961 1001958 seq-read 5634130 6577930 6380861 6243912 6230006 seq-rw 1405687 1638117 1640256 1633903 1634459 rand-rw 1386119 1614664 1617211 1609267 1612471 All the optimization methods show a higher performance than the base, however, it is hard to say which method is the most appropriate. On the other hand, zram is mostly used on small embedded system, so we don't want to increase any memory footprint. This patch pick the bit_spinlock method, pack object size and page_flag into an unsigned long table.value, so as to not increase any memory overhead on both 32-bit and 64-bit system. On the third hand, even though different kinds of locks have different performances, we can ignore this difference, because: if zram is used as zram swapfile, the swap subsystem can prevent concurrent access to the same swapslot; if zram is used as zram-blk for set up filesystem on it, the upper filesystem and the page cache also prevent concurrent access of the same block mostly. So we can ignore the different performances among locks. Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d2d5e762c8990c4031890e03565983a05febd64a) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: drivers/block/zram/zram_drv.c Conflicts solution: using old bio struct
2015-05-11zram: use size_t instead of u16Minchan Kim
Some architectures (eg, hexagon and PowerPC) could use PAGE_SHIFT of 16 or more. In these cases u16 is not sufficiently large to represent a compressed page's size so use size_t. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 023b409f9dac4cdea3322009f2e592068558690c) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: remove unused SECTOR_SIZE defineSergey Senozhatsky
Drop SECTOR_SIZE define, because it's not used. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit a830eff749eb2bf906783f6bf74a74dad3de3aea) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: rename struct `table' to `zram_table_entry'Sergey Senozhatsky
Andrew Morton has recently noted that `struct table' actually represents table entry and, thus, should be renamed. Rename to `zram_table_entry'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit cb8f2eec3c5c87e31219c5e58625b8e890004e48) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: avoid lockdep splat by revalidate_diskMinchan Kim
Sasha reported lockdep warning [1] introduced by [2]. It could be fixed by doing disk revalidation out of the init_lock. It's okay because disk capacity change is protected by init_lock so that revalidate_disk always sees up-to-date value so there is no race. [1] https://lkml.org/lkml/2014/7/3/735 [2] zram: revalidate disk after capacity change Fixes 2e32baea46ce ("zram: revalidate disk after capacity change"). Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: "Alexander E. Patrakov" <patrakov@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> CC: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b4c5c60920e3b0c4598f43e7317559f6aec51531) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: revalidate disk after capacity changeMinchan Kim
Alexander reported mkswap on /dev/zram0 is failed if other process is opening the block device file. Step is as follows, 0. Reset the unused zram device. 1. Use a program that opens /dev/zram0 with O_RDWR and sleeps until killed. 2. While that program sleeps, echo the correct value to /sys/block/zram0/disksize. 3. Verify (e.g. in /proc/partitions) that the disk size is applied correctly. It is. 4. While that program still sleeps, attempt to mkswap /dev/zram0. This fails: mkswap: error: swap area needs to be at least 40 KiB When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on mkswap to get a size of blockdev was zero although zram0 has right size by 2. The reason is zram didn't revalidate disk after changing capacity so that size of blockdev's inode is not uptodate until all of file is close. This patch should fix the BUG. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> Tested-by: Alexander E. Patrakov <patrakov@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 2e32baea46ce542c561a519414c840295b229c8f) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: correct offset usage in zram_bio_discardWeijie Yang
We want to skip the physical block(PAGE_SIZE) which is partially covered by the discard bio, so we check the remaining size and subtract it if there is a need to goto the next physical block. The current offset usage in zram_bio_discard is incorrect, it will cause its upper filesystem breakdown. Consider the following scenario: On some architecture or config, PAGE_SIZE is 64K for example, filesystem is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a offset = 4K and size=72K, normally, it should not really discard any physical block as it partially cover two physical blocks. However, with the current offset usage, it will discard the second physical block and free its memory, which will cause filesystem breakdown. This patch corrects the offset usage in zram_bio_discard. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 38515c73398a4c58059ecf1087e844561b58ee0f) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: support REQ_DISCARDJoonsoo Kim
zram is ram based block device and can be used by backend of filesystem. When filesystem deletes a file, it normally doesn't do anything on data block of that file. It just marks on metadata of that file. This behavior has no problem on disk based block device, but has problems on ram based block device, since we can't free memory used for data block. To overcome this disadvantage, there is REQ_DISCARD functionality. If block device support REQ_DISCARD and filesystem is mounted with discard option, filesystem sends REQ_DISCARD to block device whenever some data blocks are discarded. All we have to do is to handle this request. This patch implements to flag up QUEUE_FLAG_DISCARD and handle this REQ_DISCARD request. With it, we can free memory used by zram if it isn't used. [akpm@linux-foundation.org: tweak comments] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit f4659d8e620d08bd1a84a8aec5d2f5294a242764) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: drivers/block/zram/zram_drv.c Conflicts solution: keep use old bio struct, and bio_for_each_segment()
2015-05-11zram: use scnprintf() in attrs show() methodsSergey Senozhatsky
sysfs.txt documentation lists the following requirements: - The buffer will always be PAGE_SIZE bytes in length. On i386, this is 4096. - show() methods should return the number of bytes printed into the buffer. This is the return value of scnprintf(). - show() should always use scnprintf(). Use scnprintf() in show() functions. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 56b4e8cb85827a2ccc4752a2a7148e56b62b7e96) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: propagate error to userMinchan Kim
When we initialized zcomp with single, we couldn't change max_comp_streams without zram reset but current interface doesn't show any error to user and even it changes max_comp_streams's value without any effect so it would make user very confusing. This patch prevents max_comp_streams's change when zcomp was initialized as single zcomp and emit the error to user(ex, echo). [akpm@linux-foundation.org: don't return with the lock held, per Sergey] [fengguang.wu@intel.com: fix coccinelle warnings] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 60a726e33375a1096e85399cfa1327081b4c38be) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: return error-valued pointer from zcomp_create()Sergey Senozhatsky
Instead of returning just NULL, return ERR_PTR from zcomp_create() if compressing backend creation has failed. ERR_PTR(-EINVAL) for unsupported compression algorithm request, ERR_PTR(-ENOMEM) for allocation (zcomp or compression stream) error. Perform IS_ERR() check of returned from zcomp_create() value in disksize_store() and set return code to PTR_ERR(). Change suggested by Jerome Marchand. [akpm@linux-foundation.org: clean up error recovery flow] Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Jerome Marchand <jmarchan@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit fcfa8d95cacf5cbbe6dee6b8d229fe86142266e0) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: move comp allocation out of init_lockSergey Senozhatsky
While fixing lockdep spew of ->init_lock reported by Sasha Levin [1], Minchan Kim noted [2] that it's better to move compression backend allocation (using GPF_KERNEL) out of the ->init_lock lock, same way as with zram_meta_alloc(), in order to prevent the same lockdep spew. [1] https://lkml.org/lkml/2014/2/27/337 [2] https://lkml.org/lkml/2014/3/3/32 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Minchan Kim <minchan@kernel.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d61f98c70e8b0d324e8e83be2ed546d6295e63f3) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: add lz4 algorithm backendSergey Senozhatsky
Introduce LZ4 compression backend and make it available for selection. LZ4 support is optional and requires user to set ZRAM_LZ4_COMPRESS config option. The default compression backend is LZO. TEST (x86_64, core i5, 2 cores + 2 hyperthreading, zram disk size 1G, ext4 file system, 3 compression streams) iozone -t 3 -R -r 16K -s 60M -I +Z Test LZO LZ4 ---------------------------------------------- Initial write 1642744.62 1317005.09 Rewrite 2498980.88 1800645.16 Read 3957026.38 5877043.75 Re-read 3950997.38 5861847.00 Reverse Read 2937114.56 5047384.00 Stride read 2948163.19 4929587.38 Random read 3292692.69 4880793.62 Mixed workload 1545602.62 3502940.38 Random write 2448039.75 1758786.25 Pwrite 1670051.03 1338329.69 Pread 2530682.00 5097177.62 Fwrite 3232085.62 3275942.56 Fread 6306880.25 6645271.12 So on my system LZ4 is slower in write-only tests, while it performs better in read-only and mixed (reads + writes) tests. Official LZ4 benchmarks available here http://code.google.com/p/lz4/ (linux kernel uses revision r90). Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6e76668e415adf799839f0ab205142ad7002d260) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: make compression algorithm selection possibleSergey Senozhatsky
Add and document `comp_algorithm' device attribute. This attribute allows to show supported compression and currently selected compression algorithms: cat /sys/block/zram0/comp_algorithm [lzo] lz4 and change selected compression algorithm: echo lzo > /sys/block/zram0/comp_algorithm Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e46b8a030d76d3c94156c545c3f4c3676d813435) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: add set_max_streams knobSergey Senozhatsky
This patch allows to change max_comp_streams on initialised zcomp. Introduce zcomp set_max_streams() knob, zcomp_strm_multi_set_max_streams() and zcomp_strm_single_set_max_streams() callbacks to change streams limit for zcomp_strm_multi and zcomp_strm_single, accordingly. set_max_streams for single steam zcomp does nothing. If user has lowered the limit, then zcomp_strm_multi_set_max_streams() attempts to immediately free extra streams (as much as it can, depending on idle streams availability). Note, this patch does not allow to change stream 'policy' from single to multi stream (or vice versa) on already initialised compression backend. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit fe8eb122c82b2049c460fc6df6e8583a2f935cff) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-05-11zram: add multi stream functionalitySergey Senozhatsky
Existing zram (zcomp) implementation has only one compression stream (buffer and algorithm private part), so in order to prevent data corruption only one write (compress operation) can use this compression stream, forcing all concurrent write operations to wait for stream lock to be released. This patch changes zcomp to keep a compression streams list of user-defined size (via sysfs device attr). Each write operation still exclusively holds compression stream, the difference is that we can have N write operations (depending on size of streams list) executing in parallel. See TEST section later in commit message for performance data. Introduce struct zcomp_strm_multi and a set of functions to manage zcomp_strm stream access. zcomp_strm_multi has a list of idle zcomp_strm structs, spinlock to protect idle list and wait queue, making it possible to perform parallel compressions. The following set of functions added: - zcomp_strm_multi_find()/zcomp_strm_multi_release() find and release a compression stream, implement required locking - zcomp_strm_multi_create()/zcomp_strm_multi_destroy() create and destroy zcomp_strm_multi zcomp ->strm_find() and ->strm_release() callbacks are set during initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release() correspondingly. Each time zcomp issues a zcomp_strm_multi_find() call, the following set of operations performed: - spin lock strm_lock - if idle list is not empty, remove zcomp_strm from idle list, spin unlock and return zcomp stream pointer to caller - if idle list is empty, current adds itself to wait queue. it will be awaken by zcomp_strm_multi_release() caller. zcomp_strm_multi_release(): - spin lock strm_lock - add zcomp stream to idle list - spin unlock, wake up sleeper Minchan Kim reported that spinlock-based locking scheme has demonstrated a severe perfomance regression for single compression stream case, comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16) base spinlock mutex ==Initial write ==Initial write ==Initial write records: 5 records: 5 records: 5 avg: 1642424.35 avg: 699610.40 avg: 1655583.71 std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96 max: 1690170.94 max: 1163473.45 max: 1697164.75 min: 1568669.52 min: 573429.88 min: 1553410.23 ==Rewrite ==Rewrite ==Rewrite records: 5 records: 5 records: 5 avg: 1611775.39 avg: 501406.64 avg: 1684419.11 std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42 max: 1641800.95 max: 531356.78 max: 1706445.84 min: 1593515.27 min: 488817.78 min: 1655335.73 When only one compression stream available, mutex with spin on owner tends to perform much better than frequent wait_event()/wake_up(). This is why single stream implemented as a special case with mutex locking. Introduce and document zram device attribute max_comp_streams. This attr shows and stores current zcomp's max number of zcomp streams (max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter. `max_strm' limits the number of zcomp_strm structs in compression backend's idle list (max_comp_streams). max_comp_streams used during initialisation as follows: -- passing to zcomp_create() max_strm equals to 1 will initialise zcomp using single compression stream zcomp_strm_single (mutex-based locking). -- passing to zcomp_create() max_strm greater than 1 will initialise zcomp using multi compression stream zcomp_strm_multi (spinlock-based locking). default max_comp_streams value is 1, meaning that zram with single stream will be initialised. Later patch will introduce configuration knob to change max_comp_streams on already initialised and used zcomp. TEST iozone -t 3 -R -r 16K -s 60M -I +Z test base 1 strm (mutex) 3 strm (spinlock) ----------------------------------------------------------------------- Initial write 589286.78 583518.39 718011.05 Rewrite 604837.97 596776.38 1515125.72 Random write 584120.11 595714.58 1388850.25 Pwrite 535731.17 541117.38 739295.27 Fwrite 1418083.88 1478612.72 1484927.06 Usage example: set max_comp_streams to 4 echo 4 > /sys/block/zram0/max_comp_streams show current max_comp_streams (default value is 1). cat /sys/block/zram0/max_comp_streams Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit beca3ec71fe5490ee9237dc42400f50402baf83e) Signed-off-by: Alex Shi <alex.shi@linaro.org>