aboutsummaryrefslogtreecommitdiff
path: root/fs/ocfs2
AgeCommit message (Collapse)Author
2011-07-20fs: always maintain i_dio_countChristoph Hellwig
Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING. This these filesystems to also protect truncate against direct I/O requests by using common code. Right now the only non-DIO_LOCKING filesystem that appears to do so is XFS, which uses an opencoded variant of the i_dio_count scheme. Behaviour doesn't change for filesystems never calling inode_dio_wait. For ext4 behaviour changes when using the dioread_nonlock option, which previously was missing any protection between truncate and direct I/O reads. For ocfs2 that handcrafted i_dio_count manipulations are replaced with the common code now enable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20fs: move inode_dio_wait calls into ->setattrChristoph Hellwig
Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20fs: kill i_alloc_semChristoph Hellwig
i_alloc_sem is a rather special rw_semaphore. It's the last one that may be released by a non-owner, and it's write side is always mirrored by real exclusion. It's intended use it to wait for all pending direct I/O requests to finish before starting a truncate. Replace it with a hand-grown construct: - exclusion for truncates is already guaranteed by i_mutex, so it can simply fall way - the reader side is replaced by an i_dio_count member in struct inode that counts the number of pending direct I/O requests. Truncate can't proceed as long as it's non-zero - when i_dio_count reaches non-zero we wake up a pending truncate using wake_up_bit on a new bit in i_flags - new references to i_dio_count can't appear while we are waiting for it to read zero because the direct I/O count always needs i_mutex (or an equivalent like XFS's i_iolock) for starting a new operation. This scheme is much simpler, and saves the space of a spinlock_t and a struct list_head in struct inode (typically 160 bits on a non-debug 64-bit system). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20new helpers: kern_path_create/user_path_createAl Viro
combination of kern_path_parent() and lookup_create(). Does *not* expose struct nameidata to caller. Syscalls converted to that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20->permission() sanitizing: don't pass flags to ->permission()Al Viro
not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20->permission() sanitizing: don't pass flags to generic_permission()Al Viro
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20->permission() sanitizing: don't pass flags to ->check_acl()Al Viro
not used in the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20->permission() sanitizing: pass MAY_NOT_BLOCK to ->check_acl()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20kill check_acl callback of generic_permission()Al Viro
its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-18security: new security_inode_init_security API adds function callbackMimi Zohar
This patch changes the security_inode_init_security API by adding a filesystem specific callback to write security extended attributes. This change is in preparation for supporting the initialization of multiple LSM xattrs and the EVM xattr. Initially the callback function walks an array of xattrs, writing each xattr separately, but could be optimized to write multiple xattrs at once. For existing security_inode_init_security() calls, which have not yet been converted to use the new callback function, such as those in reiserfs and ocfs2, this patch defines security_old_inode_init_security(). Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
2011-06-20treewide: remove duplicate includesVitaliy Ivanov
Many stupid corrections of duplicated includes based on the output of scripts/checkincludes.pl. Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-06-03more conservative S_NOSEC handlingAl Viro
Caching "we have already removed suid/caps" was overenthusiastic as merged. On network filesystems we might have had suid/caps set on another client, silently picked by this client on revalidate, all of that *without* clearing the S_NOSEC flag. AFAICS, the only reasonably sane way to deal with that is * new superblock flag; unless set, S_NOSEC is not going to be set. * local block filesystems set it in their ->mount() (more accurately, mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than local block ones clear it) * if any network filesystem (or a cluster one) wants to use S_NOSEC, it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when inode attribute changes are picked from other clients. It's not an earth-shattering hole (anybody that can set suid on another client will almost certainly be able to write to the file before doing that anyway), but it's a bug that needs fixing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-31ocfs2: use proper little-endian bitopsAkinobu Mita
Using __test_and_{set,clear}_bit_le() with ignoring its return value can be replaced with __{set,clear}_bit_le(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-31ocfs2: null deref on allocation errorDan Carpenter
The original code had a null derefence in the error handling. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-31ocfs2: checking the wrong variable in ocfs2_move_extent()Dan Carpenter
"new_phys_cpos" is always a valid pointer here. ocfs2_probe_alloc_group() allocates "*new_phys_cpos". Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-31ocfs2: Bugfix for hard readonly mountTiger Yang
ocfs2 cannot currently mount a device that is readonly at the media ("hard readonly"). Fix the broken places. see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322 [ Description edited -- Joel ] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Reviewed-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-27Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: Ocfs2/move_extents: Validate moving goal after the adjustment. Ocfs2/move_extents: Avoid doing division in extent moving.
2011-05-27Merge branch 'move_extents' of git://oss.oracle.com/git/tye/linux-2.6 into ↵Joel Becker
ocfs2-merge-window
2011-05-27Ocfs2/move_extents: Validate moving goal after the adjustment.Tristan Ye
though the goal_to_be_moved will be validated again in following moving, it's still a good idea to validate it after adjustment at the very beginning, instead of validating it before adjustment. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-27Ocfs2/move_extents: Avoid doing division in extent moving.Tristan Ye
It's not wise enough to do a 64bits division anywhere in kernside, replace it with a decent helper or proper shifts. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-26Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (28 commits) Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly. ocfs2/dlm: Do not migrate resource to a node that is leaving the domain ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG Ocfs2/move_extents: Set several trivial constraints for threshold. Ocfs2/move_extents: Let defrag handle partial extent moving. Ocfs2/move_extents: move/defrag extents within a certain range. Ocfs2/move_extents: helper to calculate the defraging length in one run. Ocfs2/move_extents: move entire/partial extent. Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode. Ocfs2/move_extents: helper to probe a proper region to move in an alloc group. Ocfs2/move_extents: helper to validate and adjust moving goal. Ocfs2/move_extents: find the victim alloc group, where the given #blk fits. Ocfs2/move_extents: defrag a range of extent. Ocfs2/move_extents: move a range of extent. Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving. Ocfs2/move_extents: Add basic framework and source files for extent moving. Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2. Ocfs2/refcounttree: Publicize couple of funcs from refcounttree.c Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. ...
2011-05-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem: xen: cleancache shim to Xen Transcendent Memory ocfs2: add cleancache support ext4: add cleancache support btrfs: add cleancache support ext3: add cleancache support mm/fs: add hooks to support cleancache mm: cleancache core ops functions and config fs: add field to superblock to support cleancache mm/fs: cleancache documentation Fix up trivial conflict in fs/btrfs/extent_io.c due to includes
2011-05-26ocfs2: add cleancache supportDan Magenheimer
This eighth patch of eight in this cleancache series "opts-in" cleancache for ocfs2. Clustered filesystems must explicitly enable cleancache by calling cleancache_init_shared_fs anytime an instance of the filesystem is mounted. Ocfs2 is currently the only user of the clustered filesystem interface but nevertheless, the cleancache hooks in the VFS layer are sufficient for ocfs2 including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v8: trivial merge conflict update] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Tso <tytso@mit.edu> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-25Merge branch 'move_extents' of git://oss.oracle.com/git/tye/linux-2.6 into ↵Joel Becker
ocfs2-merge-window Conflicts: fs/ocfs2/ioctl.c
2011-05-25Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly.Tristan Ye
Oops, local-mounted of 'ocfs2_fops_no_plocks' is just missing the support of unwritten_extents/punching-hole due to no func pointer was given correctly to '.follocate' field. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25ocfs2/dlm: Do not migrate resource to a node that is leaving the domainSunil Mushran
During dlm domain shutdown, o2dlm has to free all the lock resources. Ones that have no locks and references are freed. Ones that have locks and/or references are migrated to another node. The first task in migration is finding a target. Currently we scan the lock resource and find one node that either has a lock or a reference. This is not very efficient in a parallel umount case as we might end up migrating the lock resource to a node which itself may have to migrate it to a third node. The patch scans the dlm->exit_domain_map to ensure the target node is not leaving the domain. If no valid target node is found, o2dlm does not migrate the resource but instead waits for the unlock and deref messages that will allow it to free the resource. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-25ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSGSunil Mushran
This patch adds a new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG and ups the dlm protocol to 1.2. o2dlm sends this new message in dlm_unregister_domain() to mark the beginning of the exit domain. This message is sent to all nodes in the domain. Currently o2dlm has no way of informing other nodes of its impending exit. This information is useful as the other nodes could disregard the exiting node in certain operations. For example, in resource migration. If two or more nodes were umounting in parallel, it would be more efficient if o2dlm were to choose a non-exiting node to be the new master node rather than an exiting one. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-25Ocfs2/move_extents: Set several trivial constraints for threshold.Tristan Ye
The threshold should be greater than clustersize and less than i_size. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: Let defrag handle partial extent moving.Tristan Ye
We're going to support partial extent moving, which may split entire extent movement into pieces to compromise the insuffice allocations, it eases the 'ENSPC' pain and makes the whole moving much less likely to fail, the downside is it may make the fs even more fragmented before moving, just let the userspace make a trade-off here. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: move/defrag extents within a certain range.Tristan Ye
the basic logic of moving extents for a file is pretty like punching-hole sequence, walk the extents within the range as user specified, calculating an appropriate len to defrag/move, then let ocfs2_defrag/move_extent() to do the actual moving. This func ends up setting 'OCFS2_MOVE_EXT_FL_COMPLETE' to userpace if operation gets done successfully. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: helper to calculate the defraging length in one run.Tristan Ye
The helper is to calculate the defrag length in one run according to a threshold, it will proceed doing defragmentation until the threshold was meet, and skip a LARGE extent if any. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: move entire/partial extent.Tristan Ye
ocfs2_move_extent() logic will validate the goal_offset_in_block, where extents to be moved, what's more, it also compromises a bit to probe the appropriate region around given goal_offset when the original goal is not able to fit the movement. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: helpers to update the group descriptor and global bitmap ↵Tristan Ye
inode. These helpers were actually borrowed from alloc.c, which may be publicized later. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: helper to probe a proper region to move in an alloc group.Tristan Ye
Before doing the movement of extents, we'd better probe the alloc group from 'goal_blk' for searching a contiguous region to fit the wanted movement, we even will have a best-effort try by compromising to a threshold around the given goal. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: helper to validate and adjust moving goal.Tristan Ye
First best-effort attempt to validate and adjust the goal (physical address in block), while it can't guarantee later operation can succeed all the time since global_bitmap may change a bit over time. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: find the victim alloc group, where the given #blk fits.Tristan Ye
This function tries locate the right alloc group, where a given physical block resides, it returns the caller a buffer_head of victim group descriptor, and also the offset of block in this group, by passing the block number. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: defrag a range of extent.Tristan Ye
It's a relatively complete function to accomplish defragmentation for entire or partial extent, one journal handle was kept during the operation, it was logically doing one more thing than ocfs2_move_extent() acutally, yes, it's claiming the new clusters itself;-) Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: move a range of extent.Tristan Ye
The moving range of __ocfs2_move_extent() was within one extent always, it consists following parts: 1. Duplicates the clusters in pages to new_blkoffset, where extent to be moved. 2. Split the original extent with new extent, coalecse the nearby extents if possible. 3. Append old clusters to truncate log, or decrease_refcount if the extent was refcounted. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: lock allocators and reserve metadata blocks and data ↵Tristan Ye
clusters for extents moving. ocfs2_lock_allocators_move_extents() was like the common ocfs2_lock_allocators(), to lock metadata and data alloctors during extents moving, reserve appropriate metadata blocks and data clusters, also performa a best- effort to calculate the credits for journal transaction in one run of movement. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: Add basic framework and source files for extent moving.Tristan Ye
Adding new files move_extents.[c|h] and fill it with nothing but only a context structure. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2.Tristan Ye
Patch also manages to add a manipulative struture for this ioctl. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2/refcounttree: Publicize couple of funcs from refcounttree.cTristan Ye
The original goal of commonizing these funcs is to benefit defraging/extent_moving codes in the future, based on the fact that reflink and defragmentation having the same Copy-On-Wrtie mechanism. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.Tristan Ye
This new code is a bit more complicated than former ones, the goal is to show user all statistics required to take a deep insight into filesystem on how the disk is being fragmentaed. The goal is achieved by scaning global bitmap from (cluster)group to group to figure out following factors in the filesystem: - How many free chunks in a fixed size as user requested. - How many real free chunks in all size. - Min/Max/Avg size(in) clusters of free chunks. - How do free chunks distribute(in size) in terms of a histogram, just like following: --------------------------------------------------------- Extent Size Range : Free extents Free Clusters Percent 32K... 64K- : 1 1 0.00% 1M... 2M- : 9 288 0.03% 8M... 16M- : 2 831 0.09% 32M... 64M- : 1 2047 0.23% 128M... 256M- : 1 8191 0.92% 256M... 512M- : 2 21706 2.43% 512M... 1024M- : 27 858623 96.29% --------------------------------------------------------- Userspace ioctl() call eventually gets the above info returned by passing a 'struct ocfs2_info_freefrag' with the chunk_size being specified first. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.Tristan Ye
The new code is dedicated to calculate free inodes number of all inode_allocs, then return the info to userpace in terms of an array. Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' from userspace, is now going to be involved. setting the flag on means no cluster coherency considered, usually, userspace tools choose none-coherency strategy by default for the sake of performace. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-25Ocfs2: Using inline funcs to set/clear *FILLED* flags in info handler.Tristan Ye
It just removes some macros for the sake of typechecking gains. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
2011-05-23ocfs2: change incorrect 'extern' keyword to 'static' in dlmfsRobin Dong
Change function param_set_dlmfs_capabilities from 'extern' to 'static' since function param_get_dlmfs_capabilities is also 'static'. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23ocfs2/dlm: dlm_is_lockres_migrateable() returns booleanSunil Mushran
Patch cleans up the gunk added by commit 388c4bcb4e63e88fb1f312a2f5f9eb2623afcf5b. dlm_is_lockres_migrateable() now returns 1 if lockresource is deemed migrateable and 0 if not. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23ocfs2: Add trace event for trim.Tao Ma
Add the corresponding trace event for trim. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23ocfs2: Add FITRIM ioctl.Tao Ma
Add the corresponding ioctl function for FITRIM. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23ocfs2: Add ocfs2_trim_fs for SSD trim support.Tao Ma
Add ocfs2_trim_fs to support trimming freed clusters in the volume. A range will be given and all the freed clusters greater than minlen will be discarded to the block layer. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>