aboutsummaryrefslogtreecommitdiff
path: root/drivers/md/bitmap.c
AgeCommit message (Collapse)Author
2011-12-23md/bitmap: be more consistent when setting new bits in memory bitmap.NeilBrown
For each active region corresponding to a bit in the bitmap with have a 14bit counter (and some flags). This counts number of active writes + bit in the on-disk bitmap + delay-needed. The "delay-needed" is because we always want a delay before clearing a bit. So the number here is normally number of active writes plus 2. If there have been no writes for a while, we drop to 1. If still no writes we clear the bit and drop to 0. So for consistency, when setting bit from the on-disk bitmap or by request from user-space it is best to set the counter to '2' to start with. In particular we might also set the NEEDED_MASK flag at this time, and in all other cases NEEDED_MASK is only set when the counter is 2 or more. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/bitmap: daemon_work cleanup.NeilBrown
We have a variable 'mddev' in this function, but repeatedly get the same value by dereferencing bitmap->mddev. There is room for simplification here... Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/bitmap: It is OK to clear bits during recovery.NeilBrown
commit d0a4bb492772ce5c4bdfba3744a99ed6f6fb238f introduced a regression which is annoying but fairly harmless. When writing to an array that is undergoing recovery (a spare in being integrated into the array), writing to the array will set bits in the bitmap, but they will not be cleared when the write completes. For bits covering areas that have not been recovered yet this is not a problem as the recovery will clear the bits. However bits set in already-recovered region will stay set and never be cleared. This doesn't risk data integrity. The only negatives are: - next time there is a crash, more resyncing than necessary will be done. - the bitmap doesn't look clean, which is confusing. While an array is recovering we don't want to update the 'events_cleared' setting in the bitmap but we do still want to clear bits that have very recently been set - providing they were written to the recovering device. So split those two needs - which previously both depended on 'success' and always clear the bit of the write went to all devices. Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-23md/lock: ensure updates to page_attrs are properly locked.NeilBrown
Page attributes are set using __set_bit rather than set_bit as it normally called under a spinlock so the extra atomicity is not needed. However there are two places where we might set or clear page attributes without holding the spinlock. So add the spinlock in those cases. This might be the cause of occasional reports that bits a aren't getting clear properly - theory is that BITMAP_PAGE_PENDING gets lost when BITMAP_PAGE_NEEDWRITE is set or cleared. This is an inconvenience, not a threat to data safety. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md/bitmap remove fault injection options.NeilBrown
These are too hard to use to be much more than noise. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: remove typedefs: mddev_t -> struct mddevNeilBrown
Having mddev_t and 'struct mddev_s' is ugly and not preferred Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: removing typedefs: mdk_rdev_t -> struct md_rdevNeilBrown
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h' which used to be an include file that defined this thing. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md: remove PRINTK and dprintk debugging and use pr_debugNeilBrown
Being able to dynamically enable these make them much more useful. Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21md/bitmap: improve handling of 'allclean'.NeilBrown
The 'allclean' flag is used to cache the fact that there is nothing to do, so we can avoid waking up and scanning the bitmap regularly. The two sorts of pages that might need the attention of the bitmap daemon are BITMAP_PAGE_PENDING and BITMAP_PAGE_NEEDWRITE pages. So make sure allclean reflects exactly when there are none of those. So: set it before scanning all pages with either bit set. clear it whenever these bits are set clear it when we desire not to clear one of these bits. don't clear it any other time. Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21md/bitmap: rename and tidy up BITMAP_PAGE_CLEANNeilBrown
The flag 'BITMAP_PAGE_CLEAN' has a confusing name as it doesn't mean that the page is clean, but rather that there are counters in the page which allow bits in the bitmap to be cleared - i.e. maybe cleaning can happen. So change it to BITMAP_PAGE_PENDING and fix some irregularities: - Don't set it in bitmap_init_from_disk as bitmap_set_memory_bits sets it when needed - in bitmap_daemon_work, if we find a counter that is '1', but need_sync is set, then set BITMAP_PAGE_PENDING again (it was recently cleared) to ensure we don't forget about this bit. Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27MD bitmap: Revert DM dirty log hooksJonathan Brassow
Revert most of commit e384e58549a2e9a83071ad80280c1a9053cfd84c md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log. MD should not need to use DM's dirty log - we decided to use md's bitmaps instead. Keeping the DIV_ROUND_UP clean-ups that were part of commit e384e58549a2e9a83071ad80280c1a9053cfd84c, however. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27md: use proper little-endian bitopsAkinobu Mita
Using __test_and_{set,clear}_bit_le() with ignoring its return value can be replaced with __{set,clear}_bit_le(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: NeilBrown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09md/bitmap: remove unused fields from struct bitmapNamhyung Kim
Get rid of ->syncchunk and ->counter_bits since they're never used. Also discard COUNTER_BYTE_RATIO which is unused. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09md/bitmap: use proper accessor macroNamhyung Kim
Use COUNTER()/NEEDED() macro instead of open-coding them. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09MD: use is_power_of_2 macroJonathan Brassow
Make use of is_power_of_2 macro. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-09MD: support initial bitmap creation in-kernelJonathan Brassow
Add bitmap support to the device-mapper specific metadata area. This patch allows the creation of the bitmap metadata area upon initial array creation via device-mapper. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-11md/bitmap: fix saving of events_cleared and other state.NeilBrown
If a bitmap is found to be 'stale' the events_cleared value is set to match 'events'. However if the array is degraded this does not get stored on disk. This can subsequently lead to incorrect behaviour. So change bitmap_update_sb to always update events_cleared in the superblock from the known events_cleared. For neatness also set ->state from ->flags. This requires updating ->state whenever we update ->flags, which makes sense anyway. This is suitable for any active -stable release. cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
2011-03-23md: use little-endian bitopsAkinobu Mita
As a preparation for removing ext2 non-atomic bit operations from asm/bitops.h. This converts ext2 non-atomic bit operations to little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-10block: kill off REQ_UNPLUGJens Axboe
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: remove per-queue pluggingJens Axboe
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-01-14md: Don't let implementation detail of curr_resync leak out through sysfs.NeilBrown
mddev->curr_resync has artificial values of '1' and '2' which are used by the code which ensures only one resync is happening at a time on any given device. These values are internal and should never be exposed to user-space (except when translated appropriately as in the 'pending' status in /proc/mdstat). Unfortunately they are as ->curr_resync is assigned to ->curr_resync_completed and that value is directly visible through sysfs. So change the assignments to ->curr_resync_completed to get the same valued from elsewhere in a form that doesn't have the magic '1' or '2' values. Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-14md: separate meta and data devsJonathan Brassow
Allow the metadata to be on a separate device from the data. This doesn't mean the data and metadata will by on separate physical devices - it simply gives device-mapper and userspace tools more flexibility. Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-14md-new-param-to_sync_page_ioJonathan Brassow
Add new parameter to 'sync_page_io'. The new parameter allows us to distinguish between metadata and data operations. This becomes important later when we add the ability to use separate devices for data and metadata. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
2010-10-29md: unplug writes to external bitmaps.NeilBrown
When writing to an 'external' bitmap we don't currently unplug the device before waiting, so we can get a 3msec delay each time; So use REQ_UNPLUG to force and unplug. Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28md: change type of first arg to sync_page_io.NeilBrown
Currently sync_page_io takes a 'bdev'. Every caller passes 'rdev->bdev'. We will soon want another field out of the rdev in sync_page_io, So just pass the rdev instead of the bdev out of it. Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-28md: use sector_t in bitmap_get_counterNeilBrown
bitmap_get_counter returns the number of sectors covered by the counter in a pass-by-reference variable. In some cases this can be very large, so make it a sector_t for safety. Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-07md: check return code of read_sb_pageVasiliy Kulikov
Function read_sb_page may return ERR_PTR(...). Check for it. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-30md: resolve confusion of MD_CHANGE_CLEANNeilBrown
MD_CHANGE_CLEAN is used for two different purposes and this leads to confusion. One of the purposes is largely mirrored by MD_CHANGE_PENDING which is not used for anything else, so have MD_CHANGE_PENDING take over that purpose fully. The two purposes are: 1/ tell md_update_sb that an update is needed and that it is just a clean/dirty transition. 2/ tell user-space that an transition from clean to dirty is pending (something wants to write), and tell te kernel (by clearin the flag) that the transition is OK. The first purpose remains wit MD_CHANGE_CLEAN, the second is moved fully to MD_CHANGE_PENDING. This means that various places which conditionally set or cleared MD_CHANGE_CLEAN no longer need to be conditional. Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md/bitmap: separate out loading a bitmap from initialising the structures.NeilBrown
dm makes this distinction between ->ctr and ->resume, so we need to too. Also get the new bitmap_load to clear out the bitmap first, as this is most consistent with the dm suspend/resume approach Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log.NeilBrown
This allows md/raid5 to fully work as a dm target. Normally md uses a 'filemap' which contains a list of pages of bits each of which may be written separately. dm-log uses and all-or-nothing approach to writing the log, so when using a dm-log, ->filemap is NULL and the flags normally stored in filemap_attr are stored in ->logattrs instead. Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md/bitmap: optimise scanning of empty bitmaps.NeilBrown
A bitmap is stored as one page per 2048 bits. If none of the bits are set, the page is not allocated. When bitmap_get_counter finds that a page isn't allocate, it just reports that one bit work of space isn't flagged, rather than reporting that 2048 bits worth of space are unflagged. This can cause searches for flagged bits (e.g. bitmap_close_sync) to do more work than is really necessary. So change bitmap_get_counter (when creating) to report a number of blocks that more accurately reports the range of the device for which no counter currently exists. Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md/bitmap: clean up plugging calls.NeilBrown
1/ use md_unplug in bitmap.c as we will soon be using bitmaps under arrays with no queue attached. 2/ Don't bother plugging the queue when we set a bit in the bitmap. The reason for this was to encourage as many bits as possible to get set before we unplug and write stuff out. However every personality already plugs the queue after bitmap_startwrite either directly (raid1/raid10) or be setting STRIPE_BIT_DELAY which causes the queue to be plugged later (raid5). Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md/bitmap: reduce dependence on sysfs.NeilBrown
For dm-raid45 we will want to use bitmaps in dm-targets which don't have entries in sysfs, so cope with the mddev not living in sysfs. Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md/bitmap: white space clean up and similar.NeilBrown
Fixes some whitespace problems Fixed some checkpatch.pl complaints. Replaced kmalloc ... memset(0), with kzalloc Fixed an unlikely memory leak on an error path. Reformatted a number of 'if/else' sets, sometimes replacing goto with an else clause. Removed some old comments and commented-out code. Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26md: be more careful setting MD_CHANGE_CLEANNeilBrown
When MD_CHANGE_CLEAN is set we might block in md_write_start. So we should only set it when fairly sure that something will clear it. There are two places where it is set so as to encourage a metadata update to record the progress of resync/recovery. This should only be done if the internal metadata update mechanisms are in use, which can be tested by by inspecting '->persistent'. Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits) fix handling of offsets in cris eeprom.c, get rid of fake on-stack files get rid of home-grown mutex in cris eeprom.c switch ecryptfs_write() to struct inode *, kill on-stack fake files switch ecryptfs_get_locked_page() to struct inode * simplify access to ecryptfs inodes in ->readpage() and friends AFS: Don't put struct file on the stack Ban ecryptfs over ecryptfs logfs: replace inode uid,gid,mode initialization with helper function ufs: replace inode uid,gid,mode initialization with helper function udf: replace inode uid,gid,mode init with helper ubifs: replace inode uid,gid,mode initialization with helper function sysv: replace inode uid,gid,mode initialization with helper function reiserfs: replace inode uid,gid,mode initialization with helper function ramfs: replace inode uid,gid,mode initialization with helper function omfs: replace inode uid,gid,mode initialization with helper function bfs: replace inode uid,gid,mode initialization with helper function ocfs2: replace inode uid,gid,mode initialization with helper function nilfs2: replace inode uid,gid,mode initialization with helper function minix: replace inode uid,gid,mode init with helper ext4: replace inode uid,gid,mode init with helper ... Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
2010-05-22Merge commit '3ff195b011d7decf501a4d55aeed312731094796' into for-linusNeilBrown
Conflicts: drivers/md/md.c - Resolved conflict in md_update_sb - Added extra 'NULL' arg to new instance of sysfs_get_dirent. Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-21sanitize vfs_fsync calling conventionsChristoph Hellwig
Now that the last user passing a NULL file pointer is gone we can remove the redundant dentry argument and associated hacks inside vfs_fsynmc_range. The next step will be removig the dentry argument from ->fsync, but given the luck with the last round of method prototype changes I'd rather defer this until after the main merge window. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21sysfs: Implement sysfs tagged directory support.Eric W. Biederman
The problem. When implementing a network namespace I need to be able to have multiple network devices with the same name. Currently this is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and potentially a few other directories of the form /sys/ ... /net/*. What this patch does is to add an additional tag field to the sysfs dirent structure. For directories that should show different contents depending on the context such as /sys/class/net/, and /sys/devices/virtual/net/ this tag field is used to specify the context in which those directories should be visible. Effectively this is the same as creating multiple distinct directories with the same name but internally to sysfs the result is nicer. I am calling the concept of a single directory that looks like multiple directories all at the same path in the filesystem tagged directories. For the networking namespace the set of directories whose contents I need to filter with tags can depend on the presence or absence of hotplug hardware or which modules are currently loaded. Which means I need a simple race free way to setup those directories as tagged. To achieve a reace free design all tagged directories are created and managed by sysfs itself. Users of this interface: - define a type in the sysfs_tag_type enumeration. - call sysfs_register_ns_types with the type and it's operations - sysfs_exit_ns when an individual tag is no longer valid - Implement mount_ns() which returns the ns of the calling process so we can attach it to a sysfs superblock. - Implement ktype.namespace() which returns the ns of a syfs kobject. Everything else is left up to sysfs and the driver layer. For the network namespace mount_ns and namespace() are essentially one line functions, and look to remain that. Tags are currently represented a const void * pointers as that is both generic, prevides enough information for equality comparisons, and is trivial to create for current users, as it is just the existing namespace pointer. The work needed in sysfs is more extensive. At each directory or symlink creating I need to check if the directory it is being created in is a tagged directory and if so generate the appropriate tag to place on the sysfs_dirent. Likewise at each symlink or directory removal I need to check if the sysfs directory it is being removed from is a tagged directory and if so figure out which tag goes along with the name I am deleting. Currently only directories which hold kobjects, and symlinks are supported. There is not enough information in the current file attribute interfaces to give us anything to discriminate on which makes it useless, and there are no potential users which makes it an uninteresting problem to solve. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-18md/raid1: delay reads that could overtake behind-writes.NeilBrown
When a raid1 array is configured to support write-behind on some devices, it normally only reads from other devices. If all devices are write-behind (because the rest have failed) it is possible for a read request to be serviced before a behind-write request, which would appear as data corruption. So when forced to read from a WriteMostly device, wait for any write-behind to complete, and don't start any more behind-writes. Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18drivers/md: Remove unnecessary casts of void *H Hartley Sweeten
void pointers do not need to be cast to other pointer types. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18md: expose max value of behind writes counterPaul Clements
Keep track of the maximum number of concurrent write-behind requests for an md array and exposed this number in sysfs at md/bitmap/max_backlog_used Writing any value to this file will clear it. This allows userspace to be involved in tuning bitmap/backlog. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14md/bitmap: update dirty flag when bitmap bits are explicitly set.NeilBrown
There is a sysfs file which allows bits in the write-intent bitmap to be explicit set - indicating that the block is thought to be 'dirty'. When this happens we should really set recovery_cp backwards to include the block to reflect this dirtiness. In particular, a 'resync' process will refuse to start if recovery_cp is beyond the end of the array, so this is needed to allow a resync to be triggered. Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14md: Support write-intent bitmaps with externally managed metadata.NeilBrown
In this case, the metadata needs to not be in the same sector as the bitmap. md will not read/write any bitmap metadata. Config must be done via sysfs and when a recovery makes the array non-degraded again, writing 'true' to 'bitmap/can_clear' will allow bits in the bitmap to be cleared again. Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14md/bitmap: move setting of daemon_lastrun out of bitmap_read_sbNeilBrown
Setting daemon_lastrun really has nothing to do with reading the bitmap superblock, it just happens to be needed at the same time. bitmap_read_sb is about to become options, so move that code out to after the call to bitmap_read_sb. Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14md: support updating bitmap parameters via sysfs.NeilBrown
A new attribute directory 'bitmap' in 'md' is created which contains files for configuring the bitmap. 'location' identifies where the bitmap is, either 'none', or 'file' or 'sector offset from metadata'. Writing 'location' can create or remove a bitmap. Adding a 'file' bitmap this way is not yet supported. 'chunksize' and 'time_base' must be set before 'location' can be set. 'chunksize' can be set before creating a bitmap, but is currently always over-ridden by the bitmap superblock. 'time_base' and 'backlog' can be updated at any time. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Andre Noll <maan@systemlinux.org>
2009-12-14md: support bitmap offset appropriate for external-metadata arrays.NeilBrown
For md arrays were metadata is managed externally, the kernel does not know about a superblock so the superblock offset is 0. If we want to have a write-intent-bitmap near the end of the devices of such an array, we should support sector_t sized offset. We need offset be possibly negative for when the bitmap is before the metadata, so use loff_t instead. Also add sanity check that bitmap does not overlap with data. Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14md: remove needless setting of thread->timeout in raid10_quiesceNeilBrown
As bitmap_create and bitmap_destroy already set thread->timeout as appropriate, there is no need to do it in raid10_quiesce. There is a possible need to wake the thread after the timeout has been set low, but it is better to do that where the timeout is actually set low, in bitmap_create. Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.NeilBrown
This removes a lot of multiplications by HZ. Signed-off-by: NeilBrown <neilb@suse.de>