aboutsummaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2013-04-08bcache: Use WARN_ONCE() instead of __WARN()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Add missing #include <linux/prefetch.h>Geert Uytterhoeven
m68k/allmodconfig: drivers/md/bcache/bset.c: In function ‘bset_search_tree’: drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’ drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’: drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-08bcache: Sparse fixesKent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-04-05dm cache: reduce bio front_pad size in writeback modeMike Snitzer
A recent patch to fix the dm cache target's writethrough mode extended the bio's front_pad to include a 1056-byte struct dm_bio_details. Writeback mode doesn't need this, so this patch reduces the per_bio_data_size to 16 bytes in this case instead of 1096. The dm_bio_details structure was added in "dm cache: fix writes to cache device in writethrough mode" which fixed commit e2e74d617e ("dm cache: fix race in writethrough implementation"). In writeback mode we avoid allocating the writethrough-specific members of the per_bio_data structure (the dm_bio_details structure included). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-04-05dm cache: fix writes to cache device in writethrough modeDarrick J. Wong
The dm-cache writethrough strategy introduced by commit e2e74d617eadc15 ("dm cache: fix race in writethrough implementation") issues a bio to the origin device, remaps and then issues the bio to the cache device. This more conservative in-series approach was selected to favor correctness over performance (of the previous parallel writethrough). However, this in-series implementation that reuses the same bio to write both the origin and cache device didn't take into account that the block layer's req_bio_endio() modifies a completing bio's bi_sector and bi_size. So the new writethrough strategy needs to preserve these bio fields, and restore them before submission to the cache device, otherwise nothing gets written to the cache (because bi_size is 0). This patch adds a struct dm_bio_details field to struct per_bio_data, and uses dm_bio_record() and dm_bio_restore() to ensure the bio is restored before reissuing to the cache device. Adding such a large structure to the per_bio_data is not ideal but we can improve this later, for now correctness is the important thing. This problem initially went unnoticed because the dm-cache test-suite uses a linear DM device for the dm-cache device's origin device. Writethrough worked as expected because DM submits a *clone* of the original bio, so the original bio which was reused for the cache was never touched. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-04-02Merge branch 'writeback-workqueue' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core Tejun writes: ----- This is the pull request for the earlier patchset[1] with the same name. It's only three patches (the first one was committed to workqueue tree) but the merge strategy is a bit involved due to the dependencies. * Because the conversion needs features from wq/for-3.10, block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those workqueue conflicts from flaring up in block tree. * Resolving the issue that Jan and Dave raised about debugging requires arch-wide changes. The patchset is being worked on[2] but it'll have to go through -mm after these changes show up in -next, and not included in this pull request. The three commits are located in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue Pulling it into block/for-3.10/core produces a conflict in drivers/md/raid5.c between the following two commits. e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available") 2f6db2a707 ("raid5: use bio_reset()") The conflict is trivial - one removes an "if ()" conditional while the other removes "rbi->bi_next = NULL" right above it. We just need to remove both. The merged branch is available at git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge so that you can use it for verification. The test merge commit has proper merge description. While these changes are a bit of pain to route, they make code simpler and even have, while minute, measureable performance gain[3] even on a workload which isn't particularly favorable to showing the benefits of this conversion. ---- Fixed up the conflict. Conflicts: drivers/md/raid5.c Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28bcache: Don't export utility code, prefix with bch_Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-25bcache: Fix for the build fixesKent Overstreet
Commit 82a84eaf7e51ba3da0c36cbc401034a4e943492d left a return 0 in closure_debug_init(). Whoops. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-25bcache: Style/checkpatch fixesKent Overstreet
Took out some nested functions, and fixed some more checkpatch complaints. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-25bcache: Build fixes from test robotKent Overstreet
config: make ARCH=i386 allmodconfig All error/warnings: drivers/md/bcache/bset.c: In function 'bch_ptr_bad': >> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/debug.c: In function 'bch_pbtree': >> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/btree.c: In function 'bch_btree_read_done': >> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/closure.o: In function `closure_debug_init': >> (.init.text+0x0): multiple definition of `init_module' >> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-23bcache: A block layer cacheKent Overstreet
Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-03-23Merge tag 'md-3.9-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md fixes from NeilBrown: "A few bugfixes for md - recent regressions in raid5 - recent regressions in dmraid - a few instances of CONFIG_MULTICORE_RAID456 linger Several tagged for -stable" * tag 'md-3.9-fixes' of git://neil.brown.name/md: md: remove CONFIG_MULTICORE_RAID456 entirely md/raid5: ensure sync and DISCARD don't happen at the same time. MD: Prevent sysfs operations on uninitialized kobjects MD RAID5: Avoid accessing gendisk or queue structs when not available md/raid5: schedule_construction should abort if nothing to do.
2013-03-23block: Add bio_alloc_pages()Kent Overstreet
More utility code to replace stuff that's getting open coded. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23block: Convert some code to bio_for_each_segment_all()Kent Overstreet
More prep work for immutable bvecs: A few places in the code were either open coding or using the wrong version - fix. After we introduce the bvec iter, it'll no longer be possible to modify the biovec through bio_for_each_segment_all() - it doesn't increment a pointer to the current bvec, you pass in a struct bio_vec (not a pointer) which is updated with what the current biovec would be (taking into account bi_bvec_done and bi_size). So because of that it's more worthwhile to be consistent about bio_for_each_segment()/bio_for_each_segment_all() usage. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Alexander Viro <viro@zeniv.linux.org.uk>
2013-03-23block: Add bio_for_each_segment_all()Kent Overstreet
__bio_for_each_segment() iterates bvecs from the specified index instead of bio->bv_idx. Currently, the only usage is to walk all the bvecs after the bio has been advanced by specifying 0 index. For immutable bvecs, we need to split these apart; bio_for_each_segment() is going to have a different implementation. This will also help document the intent of code that's using it - bio_for_each_segment_all() is only legal to use for code that owns the bio. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Neil Brown <neilb@suse.de> CC: Boaz Harrosh <bharrosh@panasas.com>
2013-03-23raid1: use bio_copy_data()Kent Overstreet
This doesn't really delete any code _yet_, but once immutable bvecs are done we can just delete the rest of the code in that loop. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid1: Refactor narrow_write_error() to not use bi_idxKent Overstreet
More bi_idx removal. This code was just open coding bio_clone(). This could probably be further improved by using bio_advance() instead of skipping over null pages, but that'd be a larger rework. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid5: use bio_reset()Kent Overstreet
Had to shuffle the code around a bit (where bi_rw and bi_end_io were set), but shouldn't really be anything tricky here Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid1: use bio_reset()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid10: Use bio_reset()Kent Overstreet
More prep work for immutable bio vecs, mainly getting rid of references to bi_idx. bio_reset was being open coded in a few places. The one in sync_request was a bit nontrivial to convert, so could use some extra eyeballs. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: NeilBrown <neilb@suse.de>
2013-03-23block: Add submit_bio_wait(), remove from mdKent Overstreet
Random cleanup - this code was duplicated and it's not really specific to md. Also added the ability to return the actual error code. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org>
2013-03-23block: Remove bi_idx referencesKent Overstreet
For immutable bvecs, all bi_idx usage needs to be audited - so here we're removing all the unnecessary uses. Most of these are places where it was being initialized on a bio that was just allocated, a few others are conversions to standard macros. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk>
2013-03-23block: Change bio_split() to respect the current value of bi_idxKent Overstreet
In the current code bio_split() won't be seeing partially completed bios so this doesn't change any behaviour, but this makes the code a bit clearer as to what bio_split() actually requires. The immediate purpose of the patch is removing unnecessary bi_idx references, but the end goal is to allow partial completed bios to be submitted, which along with immutable biovecs enables effecient bio splitting. Some of the callers were (double) checking that bios could be split, so update their checks too. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Neil Brown <neilb@suse.de> CC: Martin K. Petersen <martin.petersen@oracle.com>
2013-03-23block: Use bio_sectors() more consistentlyKent Overstreet
Bunch of places in the code weren't using it where they could be - this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: "Ed L. Cashin" <ecashin@coraid.com> CC: Nick Piggin <npiggin@kernel.dk> CC: Jiri Kosina <jkosina@suse.cz> CC: Jim Paris <jim@jtan.com> CC: Geoff Levand <geoff@infradead.org> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ed Cashin <ecashin@coraid.com>
2013-03-23block: Add bio_end_sector()Kent Overstreet
Just a little convenience macro - main reason to add it now is preparing for immutable bio vecs, it'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Jiri Kosina <jkosina@suse.cz> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org CC: Chris Mason <chris.mason@fusionio.com> CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2013-03-23md: Convert md_trim_bio() to use bio_advance()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: NeilBrown <neilb@suse.de>
2013-03-20dm cache: policy ignore hints if generated by different versionMike Snitzer
When reading the dm cache metadata from disk, ignore the policy hints unless they were generated by the same major version number of the same policy module. The hints are considered to be private data belonging to the specific module that generated them and there is no requirement for them to make sense to different versions of the policy that generated them. Policy modules are all required to work fine if no previous hints are supplied (or if existing hints are lost). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm cache: policy change version from string to integer setMike Snitzer
Separate dm cache policy version string into 3 unsigned numbers corresponding to major, minor and patchlevel and store them at the end of the on-disk metadata so we know which version of the policy generated the hints in case a future version wants to use them differently. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm cache: fix race in writethrough implementationJoe Thornber
We have found a race in the optimisation used in the dm cache writethrough implementation. Currently, dm core sends the cache target two bios, one for the origin device and one for the cache device and these are processed in parallel. This patch avoids the race by changing the code back to a simpler (slower) implementation which processes the two writes in series, one after the other, until we can develop a complete fix for the problem. When the cache is in writethrough mode it needs to send WRITE bios to both the origin and cache devices. Previously we've been implementing this by having dm core query the cache target on every write to find out how many copies of the bio it wants. The cache will ask for two bios if the block is in the cache, and one otherwise. Then main problem with this is it's racey. At the time this check is made the bio hasn't yet been submitted and so isn't being taken into account when quiescing a block for migration (promotion or demotion). This means a single bio may be submitted when two were needed because the block has since been promoted to the cache (catastrophic), or two bios where only one is needed (harmless). I really don't want to start entering bios into the quiescing system (deferred_set) in the get_num_write_bios callback. Instead this patch simplifies things; only one bio is submitted by the core, this is first written to the origin and then the cache device in series. Obviously this will have a latency impact. deferred_writethrough_bios is introduced to record bios that must be later issued to the cache device from the worker thread. This deferred submission, after the origin bio completes, is required given that we're in interrupt context (writethrough_endio). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm cache: metadata clear dirty bits on clean shutdownJoe Thornber
When writing the dirty bitset to the metadata device on a clean shutdown, clear the dirty bits. Previously they were left indicating the cache was dirty. This led to confusion about whether there really was dirty data in the cache or not. (This was a harmless bug.) Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm cache: avoid calling policy destructor twice on errorHeinz Mauelshagen
If the cache policy's config values are not able to be set we must set the policy to NULL after destroying it in create_cache_policy() so we don't attempt to destroy it a second time later. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm cache: detect cache_create failureHeinz Mauelshagen
Return error if cache_create() fails. A missing return check made cache_ctr continue even after an error in cache_create() resulting in the cache object being destroyed. So a simple failure like an odd number of cache policy config value arguments would result in an oops. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm cache: avoid 64 bit division on 32 bitJoe Thornber
Squash various 32bit link errors. >> on i386: >> drivers/built-in.o: In function `is_discarded_oblock': >> dm-cache-target.c:(.text+0x1ea28e): undefined reference to `__udivdi3' ... Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm verity: avoid deadlockMikulas Patocka
A deadlock was found in the prefetch code in the dm verity map function. This patch fixes this by transferring the prefetch to a worker thread and skipping it completely if kmalloc fails. If generic_make_request is called recursively, it queues the I/O request on the current->bio_list without making the I/O request and returns. The routine making the recursive call cannot wait for the I/O to complete. The deadlock occurs when one thread grabs the bufio_client mutex and waits for an I/O to complete but the I/O is queued on another thread's current->bio_list and is waiting to get the mutex held by the first thread. The fix recognises that prefetching is not essential. If memory can be allocated, it queues the prefetch request to the worker thread, but if not, it does nothing. Signed-off-by: Paul Taysom <taysom@chromium.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
2013-03-20dm thin: fix non power of two discard granularity calcJoe Thornber
Fix a discard granularity calculation to work for non power of 2 block sizes. In order for thinp to passdown discard bios to the underlying data device, the data device must have a discard granularity that is a factor of the thinp block size. Originally this check was done by using bitops since the block_size was known to be a power of two. Introduced by commit f13945d75730081830b6f3360266950e2b7c9067 ("dm thin: support a non power of 2 discard_granularity"). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm thin: fix discard corruptionJoe Thornber
Fix a bug in dm_btree_remove that could leave leaf values with incorrect reference counts. The effect of this was that removal of a shared block could result in the space maps thinking the block was no longer used. More concretely, if you have a thin device and a snapshot of it, sending a discard to a shared region of the thin could corrupt the snapshot. Thinp uses a 2-level nested btree to store it's mappings. This first level is indexed by thin device, and the second level by logical block. Often when we're removing an entry in this mapping tree we need to rebalance nodes, which can involve shadowing them, possibly creating a copy if the block is shared. If we do create a copy then children of that node need to have their reference counts incremented. In this way reference counts percolate down the tree as shared trees diverge. The rebalance functions were incrementing the children at the appropriate time, but they were always assuming the children were internal nodes. This meant the leaf values (in our case packed block/flags entries) were not being incremented. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20md: remove CONFIG_MULTICORE_RAID456 entirelyPaul Bolle
Once instance of this Kconfig macro remained after commit 51acbcec6c42b24482bac18e42befc822524535d ("md: remove CONFIG_MULTICORE_RAID456"). Remove that one too. And, while we're at it, also remove it from the defconfig files that carry it. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: NeilBrown <neilb@suse.de>
2013-03-20md/raid5: ensure sync and DISCARD don't happen at the same time.NeilBrown
A number of problems can occur due to races between resync/recovery and discard. - if sync_request calls handle_stripe() while a discard is happening on the stripe, it might call handle_stripe_clean_event before all of the individual discard requests have completed (so some devices are still locked, but not all). Since commit ca64cae96037de16e4af92678814f5d4bf0c1c65 md/raid5: Make sure we clear R5_Discard when discard is finished. this will cause R5_Discard to be cleared for the parity device, so handle_stripe_clean_event() will not be called when the other devices do become unlocked, so their ->written will not be cleared. This ultimately leads to a WARN_ON in init_stripe and a lock-up. - If handle_stripe_clean_event() does clear R5_UPTODATE at an awkward time for resync, it can lead to s->uptodate being less than disks in handle_parity_checks5(), which triggers a BUG (because it is one). So: - keep R5_Discard on the parity device until all other devices have completed their discard request - make sure we don't try to have a 'discard' and a 'sync' action at the same time. This involves a new stripe flag to we know when a 'discard' is happening, and the use of R5_Overlap on the parity disk so when a discard is wanted while a sync is active, so we know to wake up the discard at the appropriate time. Discard support for RAID5 was added in 3.7, so this is suitable for any -stable kernel since 3.7. Cc: stable@vger.kernel.org (v3.7+) Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-03-20MD: Prevent sysfs operations on uninitialized kobjectsJonathan Brassow
MD: Prevent sysfs operations on uninitialized kobjects Device-mapper does not use sysfs; but when device-mapper is leveraging MD's RAID personalities, MD sometimes attempts to update sysfs. This patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure it is about to operate on something valid. This patch also checks for 'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'. Although 'sysfs_notify' already makes this check, doing so in 'remove_and_add_spares' prevents an additional mutex operation. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-03-20MD RAID5: Avoid accessing gendisk or queue structs when not availableJonathan Brassow
MD RAID5: Fix kernel oops when RAID4/5/6 is used via device-mapper Commit a9add5d (v3.8-rc1) added blktrace calls to the RAID4/5/6 driver. However, when device-mapper is used to create RAID4/5/6 arrays, the mddev->gendisk and mddev->queue fields are not setup. Therefore, calling things like trace_block_bio_remap will cause a kernel oops. This patch conditionalizes those calls on whether the proper fields exist to make the calls. (Device-mapper will call trace_block_bio_remap on its own.) This patch is suitable for the 3.8.y stable kernel. Cc: stable@vger.kernel.org (v3.8+) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-03-20md/raid5: schedule_construction should abort if nothing to do.NeilBrown
Since commit 1ed850f356a0a422013846b5291acff08815008b md/raid5: make sure to_read and to_write never go negative. It has been possible for handle_stripe_dirtying to be called when there isn't actually any work to do. It then calls schedule_reconstruction() which will set R5_LOCKED on the parity block(s) even when nothing else is happening. This then causes problems in do_release_stripe(). So add checks to schedule_reconstruction() so that if it doesn't find anything to do, it just aborts. This bug was introduced in v3.7, so the patch is suitable for -stable kernels since then. Cc: stable@vger.kernel.org (v3.7+) Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-03-05Merge tag 'md-3.9' of git://neil.brown.name/mdLinus Torvalds
Pull md updates from NeilBrown: "Mostly little bugfixes. Only "feature" is a new RAID10 layout which slightly improves the number of sets of devices that can concurrently fail, without data loss." * tag 'md-3.9' of git://neil.brown.name/md: md: expedite metadata update when switching read-auto -> active md: remove CONFIG_MULTICORE_RAID456 md/raid1,raid10: fix deadlock with freeze_array() md/raid0: improve error message when converting RAID4-with-spares to RAID0 md: raid0: fix error return from create_stripe_zones. md: fix two bugs when attempting to resize RAID0 array. DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2) MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) MD RAID10: Minor non-functional code changes md: raid1,10: Handle REQ_WRITE_SAME flag in write bios md: protect against crash upon fsync on ro array
2013-03-01dm cache: add cleaner policyHeinz Mauelshagen
A simple cache policy that writes back all data to the origin. This is used to decommission a dm cache by emptying it. Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm cache: add mq policyJoe Thornber
A cache policy that uses a multiqueue ordered by recent hit count to select which blocks should be promoted and demoted. This is meant to be a general purpose policy. It prioritises reads over writes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: add cache targetJoe Thornber
Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: add bitsetJoe Thornber
Add a persistent bitset as a wrapper around dm-array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: add transactional arrayJoe Thornber
Add a transactional array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm thin: remove cells from stackJoe Thornber
This patch takes advantage of the new bio-prison interface where the memory is now passed in rather than using a mempool in bio-prison. This allows the map function to avoid performing potentially-blocking allocations that could lead to deadlocks: We want to avoid the cell allocation that is done in bio_detain. (The potential for mempool deadlocks still remains in other functions that use bio_detain.) Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm bio prison: pass cell memory inJoe Thornber
Change the dm_bio_prison interface so that instead of allocating memory internally, dm_bio_detain is supplied with a pre-allocated cell each time it is called. This enables a subsequent patch to move the allocation of the struct dm_bio_prison_cell outside the thin target's mapping function so it can no longer block there. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: add btree_walkJoe Thornber
Add dm_btree_walk to iterate through the contents of a btree. This will be used by the dm cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>