aboutsummaryrefslogtreecommitdiff
path: root/fs/dcache.c
AgeCommit message (Collapse)Author
2018-12-20Merge tag 'v4.9.127' into v4.9-rtJulia Cartwright
This is the 4.9.127 stable release
2018-12-20Merge tag 'v4.9.120' into v4.9-rtJulia Cartwright
This is the 4.9.120 stable release
2018-09-15fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot()Tetsuo Handa
[ Upstream commit 6cd00a01f0c1ae6a852b09c59b8dd55cc6c35d1d ] Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes are initialized at __d_alloc(), we can't copy the whole size unconditionally. WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50) 636f6e66696766732e746d70000000000010000000000000020000000188ffff i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u ^ RIP: 0010:take_dentry_name_snapshot+0x28/0x50 RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002 RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60 RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001 R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00 R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0 take_dentry_name_snapshot+0x28/0x50 vfs_rename+0x128/0x870 SyS_rename+0x3b2/0x3d0 entry_SYSCALL_64_fastpath+0x1a/0xa4 0xffffffffffffffff Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15make sure that __dentry_kill() always invalidates d_seq, unhashed or notAl Viro
commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream. RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by: "Dae R. Jeong" <threeearcat@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15root dentries need RCU-delayed freeingAl Viro
commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream. Since mountpoint crossing can happen without leaving lazy mode, root dentries do need the same protection against having their memory freed without RCU delay as everything else in the tree. It's partially hidden by RCU delay between detaching from the mount tree and dropping the vfsmount reference, but the starting point of pathwalk can be on an already detached mount, in which case umount-caused RCU delay has already passed by the time the lazy pathwalk grabs rcu_read_lock(). If the starting point happens to be at the root of that vfsmount *and* that vfsmount covers the entire filesystem, we get trouble. Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03Merge tag 'v4.9.104' into v4.9-rtJulia Cartwright
This is the 4.9.104 stable release
2018-05-30fs: dcache: Use READ_ONCE when accessing i_dir_seqWill Deacon
[ Upstream commit 8cc07c808c9d595e81cbe5aad419b7769eb2e5c9 ] i_dir_seq is subject to concurrent modification by a cmpxchg or store-release operation, so ensure that the relaxed access in d_alloc_parallel uses READ_ONCE. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30fs: dcache: Avoid livelock between d_alloc_parallel and __d_addWill Deacon
[ Upstream commit 015555fd4d2930bc0c86952c46ad88b3392f66e4 ] If d_alloc_parallel runs concurrently with __d_add, it is possible for d_alloc_parallel to continuously retry whilst i_dir_seq has been incremented to an odd value by __d_add: CPU0: __d_add n = start_dir_add(dir); cmpxchg(&dir->i_dir_seq, n, n + 1) == n CPU1: d_alloc_parallel retry: seq = smp_load_acquire(&parent->d_inode->i_dir_seq) & ~1; hlist_bl_lock(b); bit_spin_lock(0, (unsigned long *)b); // Always succeeds CPU0: __d_lookup_done(dentry) hlist_bl_lock bit_spin_lock(0, (unsigned long *)b); // Never succeeds CPU1: if (unlikely(parent->d_inode->i_dir_seq != seq)) { hlist_bl_unlock(b); goto retry; } Since the simple bit_spin_lock used to implement hlist_bl_lock does not provide any fairness guarantees, then CPU1 can starve CPU0 of the lock and prevent it from reaching end_dir_add(dir), therefore CPU1 cannot exit its retry loop because the sequence number always has the bottom bit set. This patch resolves the livelock by not taking hlist_bl_lock in d_alloc_parallel if the sequence counter is odd, since any subsequent masked comparison with i_dir_seq will fail anyway. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Naresh Madhusudana <naresh.madhusudana@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30do d_instantiate/unlock_new_inode combinations safelyAl Viro
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream. For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-08Merge tag 'v4.9.94' into v4.9-rtJulia Cartwright
This is the 4.9.94 stable release
2018-05-08Merge tag 'v4.9.89' into v4.9-rtJulia Cartwright
This is the 4.9.89 stable release
2018-04-13VFS: close race between getcwd() and d_move()NeilBrown
[ Upstream commit 61647823aa920e395afcce4b57c32afb51456cab ] d_move() will call __d_drop() and then __d_rehash() on the dentry being moved. This creates a small window when the dentry appears to be unhashed. Many tests of d_unhashed() are made under ->d_lock and so are safe from racing with this window, but some aren't. In particular, getcwd() calls d_unlinked() (which calls d_unhashed()) without d_lock protection, so it can race. This races has been seen in practice with lustre, which uses d_move() as part of name lookup. See: https://jira.hpdd.intel.com/browse/LU-9735 It could race with a regular rename(), and result in ENOENT instead of either the 'before' or 'after' name. The race can be demonstrated with a simple program which has two threads, one renaming a directory back and forth while another calls getcwd() within that directory: it should never fail, but does. See: https://patchwork.kernel.org/patch/9455345/ We could fix this race by taking d_lock and rechecking when d_unhashed() reports true. Alternately when can remove the window, which is the approach this patch takes. ___d_drop() is introduce which does *not* clear d_hash.pprev so the dentry still appears to be hashed. __d_drop() calls ___d_drop(), then clears d_hash.pprev. __d_move() now uses ___d_drop() and only clears d_hash.pprev when not rehashing. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22lock_parent() needs to recheck if dentry got __dentry_kill'ed under itAl Viro
commit 3b821409632ab778d46e807516b457dfa72736ed upstream. In case when dentry passed to lock_parent() is protected from freeing only by the fact that it's on a shrink list and trylock of parent fails, we could get hit by __dentry_kill() (and subsequent dentry_kill(parent)) between unlocking dentry and locking presumed parent. We need to recheck that dentry is alive once we lock both it and parent *and* postpone rcu_read_unlock() until after that point. Otherwise we could return a pointer to struct dentry that already is rcu-scheduled for freeing, with ->d_lock held on it; caller's subsequent attempt to unlock it can end up with memory corruption. Cc: stable@vger.kernel.org # 3.12+, counting backports Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-20fs/dcache: disable preemption on i_dir_seq's write sideSebastian Andrzej Siewior
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we could have two writers in parallel despite the fact that the d_lock is held. The problem is that during the write process on RT the preemption is still enabled and if this process is interrupted by a reader with RT priority then we lock up. To avoid that lock up I am disabling the preemption during the update. The rename of i_dir_seq is here to ensure to catch new write sides in future. Cc: stable-rt@vger.kernel.org Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-05Merge tag 'v4.9.41' into v4.9-rtSteven Rostedt (VMware)
This is the 4.9.41 stable release
2017-08-06dentry name snapshotsAl Viro
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream. take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-02Merge tag 'v4.9.39' into v4.9-rtSteven Rostedt (VMware)
This is the 4.9.39 stable release Conflicts: kernel/trace/trace.c
2017-07-21fs/dcache.c: fix spin lockup issue on nlru->lockSahitya Tummala
commit b17c070fb624cf10162cf92ea5e1ec25cd8ac176 upstream. __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer duration if there are more number of items in the lru list. As per the current code, it can hold the spin lock for upto maximum UINT_MAX entries at a time. So if there are more number of items in the lru list, then "BUG: spinlock lockup suspected" is observed in the below path: spin_bug+0x90 do_raw_spin_lock+0xfc _raw_spin_lock+0x28 list_lru_add+0x28 dput+0x1c8 path_put+0x20 terminate_walk+0x3c path_lookupat+0x100 filename_lookup+0x6c user_path_at_empty+0x54 SyS_faccessat+0xd0 el0_svc_naked+0x24 This nlru->lock is acquired by another CPU in this path - d_lru_shrink_move+0x34 dentry_lru_isolate_shrink+0x48 __list_lru_walk_one.isra.10+0x94 list_lru_walk_node+0x40 shrink_dcache_sb+0x60 do_remount_sb+0xbc do_emergency_remount+0xb0 process_one_work+0x228 worker_thread+0x2e0 kthread+0xf4 ret_from_fork+0x10 Fix this lockup by reducing the number of entries to be shrinked from the lru list to 1024 at once. Also, add cond_resched() before processing the lru list again. Link: http://marc.info/?t=149722864900001&r=1&w=2 Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Alexander Polakov <apolyakov@beget.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-30Merge tag 'v4.9.6' into linux-4.9.y-rtSebastian Andrzej Siewior
This is the 4.9.6 stable release Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2017-01-19mnt: Protect the mountpoint hashtable with mount_lockEric W. Biederman
commit 3895dbf8985f656675b5bde610723a29cbce3fa7 upstream. Protecting the mountpoint hashtable with namespace_sem was sufficient until a call to umount_mnt was added to mntput_no_expire. At which point it became possible for multiple calls of put_mountpoint on the same hash chain to happen on the same time. Kristen Johansen <kjlx@templeofstupid.com> reported: > This can cause a panic when simultaneous callers of put_mountpoint > attempt to free the same mountpoint. This occurs because some callers > hold the mount_hash_lock, while others hold the namespace lock. Some > even hold both. > > In this submitter's case, the panic manifested itself as a GP fault in > put_mountpoint() when it called hlist_del() and attempted to dereference > a m_hash.pprev that had been poisioned by another thread. Al Viro observed that the simple fix is to switch from using the namespace_sem to the mount_lock to protect the mountpoint hash table. I have taken Al's suggested patch moved put_mountpoint in pivot_root (instead of taking mount_lock an additional time), and have replaced new_mountpoint with get_mountpoint a function that does the hash table lookup and addition under the mount_lock. The introduction of get_mounptoint ensures that only the mount_lock is needed to manipulate the mountpoint hashtable. d_set_mounted is modified to only set DCACHE_MOUNTED if it is not already set. This allows get_mountpoint to use the setting of DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry happens exactly once. Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts") Reported-by: Krister Johansen <kjlx@templeofstupid.com> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-23fs/dcache: use swait_queue instead of waitqueueSebastian Andrzej Siewior
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock() which disables preemption. As a workaround convert it to swait. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2016-12-23fs: dcache: Use cpu_chill() in trylock loopsThomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-23fs/dcache: init in_lookup_hashtableSebastian Andrzej Siewior
in_lookup_hashtable was introduced in commit 94bdd655caba ("parallel lookups machinery, part 3") and never initialized but since it is in the data it is all zeros. But we need this for -RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2016-08-07Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted cleanups and fixes. In the "trivial API change" department - ->d_compare() losing 'parent' argument" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cachefiles: Fix race between inactivating and culling a cache object 9p: use clone_fid() 9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()" vfs: make dentry_needs_remove_privs() internal vfs: remove file_needs_remove_privs() vfs: fix deadlock in file_remove_privs() on overlayfs get rid of 'parent' argument of ->d_compare() cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare() affs ->d_compare(): don't bother with ->d_inode fold _d_rehash() and __d_rehash() together fold dentry_rcuwalk_invalidate() into its only remaining caller
2016-08-06Merge branch 'work.const-qstr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull qstr constification updates from Al Viro: "Fairly self-contained bunch - surprising lot of places passes struct qstr * as an argument when const struct qstr * would suffice; it complicates analysis for no good reason. I'd prefer to feed that separately from the assorted fixes (those are in #for-linus and with somewhat trickier topology)" * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qstr: constify instances in adfs qstr: constify instances in lustre qstr: constify instances in f2fs qstr: constify instances in ext2 qstr: constify instances in vfat qstr: constify instances in procfs qstr: constify instances in fuse qstr constify instances in fs/dcache.c qstr: constify instances in nfs qstr: constify instances in ocfs2 qstr: constify instances in autofs4 qstr: constify instances in hfs qstr: constify instances in hfsplus qstr: constify instances in logfs qstr: constify dentry_init_security
2016-07-31get rid of 'parent' argument of ->d_compare()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-29fold _d_rehash() and __d_rehash() togetherAl Viro
The only place where we feed to __d_rehash() something other than d_hash(dentry->d_name.hash) is __d_move(), where we give it d_hash of another dentry. Postpone rehashing until we'd switched the names and we are rid of that exception, along with the need to keep _d_rehash() and __d_rehash() separate. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-29fold dentry_rcuwalk_invalidate() into its only remaining callerAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-28Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...
2016-07-28Merge branch 'salted-string-hash'Linus Torvalds
This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash
2016-07-24fs/dcache.c: avoid soft-lockup in dput()Wei Fang
We triggered soft-lockup under stress test which open/access/write/close one file concurrently on more than five different CPUs: WARN: soft lockup - CPU#0 stuck for 11s! [who:30631] ... [<ffffffc0003986f8>] dput+0x100/0x298 [<ffffffc00038c2dc>] terminate_walk+0x4c/0x60 [<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8 [<ffffffc00038f780>] filename_lookup+0x38/0xf0 [<ffffffc000391180>] user_path_at_empty+0x78/0xd0 [<ffffffc0003911f4>] user_path_at+0x1c/0x28 [<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230 ->d_lock trylock may failed many times because of concurrently operations, and dput() may execute a long time. Fix this by replacing cpu_relax() with cond_resched(). dput() used to be sleepable, so make it sleepable again should be safe. Cc: <stable@vger.kernel.org> Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-24vfs: new d_init methodMiklos Szeredi
Allow filesystem to initialize dentry at allocation time. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-24Merge branch 'test.d_iput' into work.miscAl Viro
2016-07-20qstr constify instances in fs/dcache.cAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30Merge branch 'd_real' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc
2016-06-30new helper: d_same_name()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()He Kuang
lockless_dereference() was added which can be used in place of hard-coding smp_read_barrier_depends(). Signed-off-by: He Kuang <hekuang@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-30Merge branch 'for-linus' into work.miscAl Viro
2016-06-30vfs: merge .d_select_inode() into .d_real()Miklos Szeredi
The two methods essentially do the same: find the real dentry/inode belonging to an overlay dentry. The difference is in the usage: vfs_open() uses ->d_select_inode() and expects the function to perform copy-up if necessary based on the open flags argument. file_dentry() uses ->d_real() passing in the overlay dentry as well as the underlying inode. vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real() with a zero inode would have worked just as well here. This patch merges the functionality of ->d_select_inode() into ->d_real() by adding an 'open_flags' argument to the latter. [Al Viro] Make the signature of d_real() match that of ->d_real() again. And constify the inode argument, while we are at it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-06-20fix idiotic braino in d_alloc_parallel()Al Viro
Check for d_unhashed() while searching in in-lookup hash was absolutely wrong. Worse, it masked a deadlock on dget() done under bitlock that nests inside ->d_lock. Thanks to J. R. Okajima for spotting it. Spotted-by: "J. R. Okajima" <hooanon05g@gmail.com> Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-11fs/dcache.c: Save one 32-bit multiply in dcache lookupGeorge Spelvin
Noe that we're mixing in the parent pointer earlier, we don't need to use hash_32() to mix its bits. Instead, we can just take the msbits of the hash value directly. For those applications which use the partial_name_hash(), move the multiply to end_name_hash. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10vfs: make the string hashes salt the hashLinus Torvalds
We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10much milder d_walk() raceAl Viro
d_walk() relies upon the tree not getting rearranged under it without rename_lock being touched. And we do grab rename_lock around the places that change the tree topology. Unfortunately, branch reordering is just as bad from d_walk() POV and we have two places that do it without touching rename_lock - one in handling of cursors (for ramfs-style directories) and another in autofs. autofs one is a separate story; this commit deals with the cursors. * mark cursor dentries explicitly at allocation time * make __dentry_kill() leave ->d_child.next pointing to the next non-cursor sibling, making sure that it won't be moved around unnoticed before the parent is relocked on ascend-to-parent path in d_walk(). * make d_walk() skip cursors explicitly; strictly speaking it's not necessary (all callbacks we pass to d_walk() are no-ops on cursors), but it makes analysis easier. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-07fix d_walk()/non-delayed __d_free() raceAl Viro
Ascend-to-parent logics in d_walk() depends on all encountered child dentries not getting freed without an RCU delay. Unfortunately, in quite a few cases it is not true, with hard-to-hit oopsable race as the result. Fortunately, the fix is simiple; right now the rule is "if it ever been hashed, freeing must be delayed" and changing it to "if it ever had a parent, freeing must be delayed" closes that hole and covers all cases the old rule used to cover. Moreover, pipes and sockets remain _not_ covered, so we do not introduce RCU delay in the cases which are the reason for having that delay conditional in the first place. Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-29unify dentry_iput() and dentry_unlink_inode()Al Viro
There is a lot of duplication between dentry_unlink_inode() and dentry_iput(). The only real difference is that dentry_unlink_inode() bumps ->d_seq and dentry_iput() doesn't. The argument of the latter is known to have been unhashed, so anybody who might've found it in RCU lookup would already be doomed to a ->d_seq mismatch. And we want to avoid pointless smp_rmb() there. This patch makes dentry_unlink_inode() bump ->d_seq only for hashed dentries. It's safe (d_delete() calls that sucker only if we are holding the only reference to dentry, so rehash is not going to happen) and it allows to use dentry_unlink_inode() in __dentry_kill() and get rid of dentry_iput(). The interesting question here is profiling; it *is* a hot path, and extra conditional jumps in there might or might not be painful. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-29trim fsnotify hooks a bitAl Viro
fsnotify_d_move()/__fsnotify_d_instantiate()/__fsnotify_update_dcache_flags() are identical to each other, regardless of the config. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-28Merge branch 'hash' of git://ftp.sciencehorizons.net/linuxLinus Torvalds
Pull string hash improvements from George Spelvin: "This series does several related things: - Makes the dcache hash (fs/namei.c) useful for general kernel use. (Thanks to Bruce for noticing the zero-length corner case) - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above. - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two 32-bit multiplies will do well enough. - Rids the world of the bad hash multipliers in hash_32. This finishes the job started in commit 689de1d6ca95 ("Minimal fix-up of bad hashing behavior of hash_64()") The vast majority of Linux architectures have hardware support for 32x32-bit multiply and so derive no benefit from "simplified" multipliers. The few processors that do not (68000, h8/300 and some models of Microblaze) have arch-specific implementations added. Those patches are last in the series. - Overhauls the dcache hash mixing. The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion. Replaced with a much more careful design that's simultaneously faster and better. (My own invention, as there was noting suitable in the literature I could find. Comments welcome!) - Modify the hash_name() loop to skip the initial HASH_MIX(). This would let us salt the hash if we ever wanted to. - Sort out partial_name_hash(). The hash function is declared as using a long state, even though it's truncated to 32 bits at the end and the extra internal state contributes nothing to the result. And some callers do odd things: - fs/hfs/string.c only allocates 32 bits of state - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes - Modify bytemask_from_count to handle inputs of 1..sizeof(long) rather than 0..sizeof(long)-1. This would simplify users other than full_name_hash" Special thanks to Bruce Fields for testing and finding bugs in v1. (I learned some humbling lessons about "obviously correct" code.) On the arch-specific front, the m68k assembly has been tested in a standalone test harness, I've been in contact with the Microblaze maintainers who mostly don't care, as the hardware multiplier is never omitted in real-world applications, and I haven't heard anything from the H8/300 world" * 'hash' of git://ftp.sciencehorizons.net/linux: h8300: Add <asm/hash.h> microblaze: Add <asm/hash.h> m68k: Add <asm/hash.h> <linux/hash.h>: Add support for architecture-specific functions fs/namei.c: Improve dcache hash function Eliminate bad hash multipliers from hash_32() and hash_64() Change hash_64() return value to 32 bits <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() fs/namei.c: Add hashlen_string() function Pull out string hash to <linux/stringhash.h>
2016-05-28fs/namei.c: Add hashlen_string() functionGeorge Spelvin
We'd like to make more use of the highly-optimized dcache hash functions throughout the kernel, rather than have every subsystem create its own, and a function that hashes basic null-terminated strings is required for that. (The name is to emphasize that it returns both hash and length.) It's actually useful in the dcache itself, specifically d_alloc_name(). Other uses in the next patch. full_name_hash() is also tweaked to make it more generally useful: 1) Take a "char *" rather than "unsigned char *" argument, to be consistent with hash_name(). 2) Handle zero-length inputs. If we want more callers, we don't want to make them worry about corner cases. Signed-off-by: George Spelvin <linux@sciencehorizons.net>
2016-05-18Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs cleanups from Al Viro: "Assorted cleanups and fixes all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: only charge written data against RLIMIT_CORE coredump: get rid of coredump_params->written ecryptfs_lookup(): try either only encrypted or plaintext name ecryptfs: avoid multiple aliases for directories bpf: reject invalid names right in ->lookup() __d_alloc(): treat NULL name as QSTR("/", 1) mtd: switch ubi_open_volume_path() to vfs_stat() mtd: switch open_mtd_by_chdev() to use of vfs_stat()
2016-05-02parallel lookups: actual switch to rwsemAl Viro
ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>