aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-06-25Merge branch 'linux-linaro-lsk-v3.14' into linux-linaro-lsk-v3.14-androidlsk-v3.14-15.07-androidAlex Shi
Conflicts: fs/exec.c Solutions: follow commit d221244a7 sched: move no_new_privs into new atomic flags to use task_no_new_privs(current).
2015-06-10Merge branch 'linaro-android-3.14-lsk' of ↵lsk-v3.14-15.06-androidKevin Hilman
git://android.git.linaro.org/kernel/linaro-android into linux-linaro-lsk-v3.14-android * 'linaro-android-3.14-lsk' of git://android.git.linaro.org/kernel/linaro-android: fix: align closely to AOSP. sched: cpufreq: update power usage only if cpufreq_stat is enabled uid_cputime: Extends the cputime functionality to report power per uid sched: cpufreq: Adds a field cpu_power in the task_struct cpufreq_stats: Adds the fucntionality to load current values for each frequency for all the cores. New Build Breakage in branch: kernel-m-dev-tegra-flounder-3.10 @ 1960706 net/unix: sk_socket can disappear when state is unlocked selinux: enable genfscon labeling for sysfs and pstore files ext4: don't save the error information if the block device is read-only selinux: enable per-file labeling for debugfs files. cpufreq: interactive: Rearm governor timer at max freq cpufreq: interactive: Implement cluster-based min_sample_time cpufreq: interactive: Exercise hispeed settings at a policy level suspend: Return error when pending wakeup source is found. proc: uid_cputime: fix show_uid_stat permission nf: IDLETIMER: Fix broken uid field in the msg
2015-06-10Merge branch 'linux-3.14.y' of ↵lsk-v3.14-15.06Kevin Hilman
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into linux-linaro-lsk-v3.14 * 'linux-3.14.y' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (63 commits) Linux 3.14.44 fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings vfs: read file_handle only once in handle_to_path drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling" drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling drm/radeon: add new bonaire pci id ACPI / init: Fix the ordering of acpi_reserve_resources() sd: Disable support for 256 byte/sector disks storvsc: Set the SRB flags correctly when no data transfer is needed Input: elantech - fix semi-mt protocol for v3 HW rtlwifi: rtl8192cu: Fix kernel deadlock md/raid0: fix restore to sector variable in raid0_make_request md/raid5: don't record new size if resize_stripes fails. thermal: step_wise: Revert optimization svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures mm, numa: really disable NUMA balancing by default on single node machines tools/vm: fix page-flags build ARM: fix missing syscall trace exit ARM: dts: imx27: only map 4 Kbyte for fec registers mac80211: move WEP tailroom size check ...
2015-06-06fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappingsAndrew Morton
commit 2b1d3ae940acd11be44c6eced5873d47c2e00ffa upstream. load_elf_binary() returns `retval', not `error'. Fixes: a87938b2e246b81b4fb ("fs/binfmt_elf.c: fix bug in loading of PIE binaries") Reported-by: James Hogan <james.hogan@imgtec.com> Cc: Michael Davidson <md@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-06vfs: read file_handle only once in handle_to_pathSasha Levin
commit 161f873b89136eb1e69477c847d5a5033239d9ba upstream. We used to read file_handle twice. Once to get the amount of extra bytes, and once to fetch the entire structure. This may be problematic since we do size verifications only after the first read, so if the number of extra bytes changes in userspace between the first and second calls, we'll have an incoherent view of file_handle. Instead, read the constant size once, and copy that over to the final structure without having to re-read it again. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-06jbd2: fix r_count overflows leading to buffer overflow in journal recoveryDarrick J. Wong
commit e531d0bceb402e643a4499de40dd3fa39d8d2e43 upstream. The journal revoke block recovery code does not check r_count for sanity, which means that an evil value of r_count could result in the kernel reading off the end of the revoke table and into whatever garbage lies beyond. This could crash the kernel, so fix that. However, in testing this fix, I discovered that the code to write out the revoke tables also was not correctly checking to see if the block was full -- the current offset check is fine so long as the revoke table space size is a multiple of the record size, but this is not true when either journal_csum_v[23] are set. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-06ext4: check for zero length extent explicitlyEryu Guan
commit 2f974865ffdfe7b9f46a9940836c8b167342563d upstream. The following commit introduced a bug when checking for zero length extent 5946d08 ext4: check for overlapping extents in ext4_valid_extent_entries() Zero length extent could pass the check if lblock is zero. Adding the explicit check for zero length back. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-06ext4: fix NULL pointer dereference when journal restart failsLukas Czerner
commit 9d506594069355d1fb2de3f9104667312ff08ed3 upstream. Currently when journal restart fails, we'll have the h_transaction of the handle set to NULL to indicate that the handle has been effectively aborted. We handle this situation quietly in the jbd2_journal_stop() and just free the handle and exit because everything else has been done before we attempted (and failed) to restart the journal. Unfortunately there are a number of problems with that approach introduced with commit 41a5b913197c "jbd2: invalidate handle if jbd2_journal_restart() fails" First of all in ext4 jbd2_journal_stop() will be called through __ext4_journal_stop() where we would try to get a hold of the superblock by dereferencing h_transaction which in this case would lead to NULL pointer dereference and crash. In addition we're going to free the handle regardless of the refcount which is bad as well, because others up the call chain will still reference the handle so we might potentially reference already freed memory. Moreover it's expected that we'll get aborted handle as well as detached handle in some of the journalling function as the error propagates up the stack, so it's unnecessary to call WARN_ON every time we get detached handle. And finally we might leak some memory by forgetting to free reserved handle in jbd2_journal_stop() in the case where handle was detached from the transaction (h_transaction is NULL). Fix the NULL pointer dereference in __ext4_journal_stop() by just calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix the potential memory leak in jbd2_journal_stop() and use proper handle refcounting before we attempt to free it to avoid use-after-free issues. And finally remove all WARN_ON(!transaction) from the code so that we do not get random traces when something goes wrong because when journal restart fails we will get to some of those functions. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-06d_walk() might skip too muchAl Viro
commit 2159184ea01e4ae7d15f2017e296d4bc82d5aeb0 upstream. when we find that a child has died while we'd been trying to ascend, we should go into the first live sibling itself, rather than its sibling. Off-by-one in question had been introduced in "deal with deadlock in d_walk()" and the fix needs to be backported to all branches this one has been backported to. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-06fs, omfs: add NULL terminator in the end up the token listSasha Levin
commit dcbff39da3d815f08750552fdd04f96b51751129 upstream. match_token() expects a NULL terminator at the end of the token list so that it would know where to stop. Not having one causes it to overrun to invalid memory. In practice, passing a mount option that omfs didn't recognize would sometimes panic the system. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-26Merge branch 'linux-3.14.y' of ↵Shannon Zhao
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into linux-linaro-lsk-v3.14 Conflicts: arch/arm/include/asm/kvm_mmu.h arch/arm/kvm/mmu.c arch/arm64/include/asm/kvm_arm.h arch/arm64/include/asm/kvm_mmu.h arch/arm64/kvm/sys_regs.c virt/kvm/arm/vgic.c
2015-05-22ext4: don't save the error information if the block device is read-onlyTheodore Ts'o
Google-Bug-Id: 20939131 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-17deal with deadlock in d_walk()Al Viro
commit ca5358ef75fc69fee5322a38a340f5739d997c10 upstream. ... by not hitting rename_retry for reasons other than rename having happened. In other words, do _not_ restart when finding that between unlocking the child and locking the parent the former got into __dentry_kill(). Skip the killed siblings instead... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Hutchings <ben@decadent.org.uk> [hujianyang: Backported to 3.14 refer to the work of Ben Hutchings in 3.2: - Adjust context to make __dentry_kill() apply to d_kill()] Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17mnt: Fix fs_fully_visible to verify the root directory is visibleEric W. Biederman
commit 7e96c1b0e0f495c5a7450dc4aa7c9a24ba4305bd upstream. This fixes a dumb bug in fs_fully_visible that allows proc or sys to be mounted if there is a bind mount of part of /proc/ or /sys/ visible. Reported-by: Eric Windisch <ewindisch@docker.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()Ryusuke Konishi
commit d8fd150fe3935e1692bf57c66691e17409ebb9c1 upstream. The range check for b-tree level parameter in nilfs_btree_root_broken() is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even though the level is limited to values in the range of 0 to (NILFS_BTREE_LEVEL_MAX - 1). Since the level parameter is read from storage device and used to index nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it can cause memory overrun during btree operations if the boundary value is set to the level parameter on device. This fixes the broken sanity check and adds a comment to clarify that the upper bound NILFS_BTREE_LEVEL_MAX is exclusive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17ocfs2: dlm: fix race between purge and get lock resourceJunxiao Bi
commit b1432a2a35565f538586774a03bf277c27fc267d upstream. There is a race window in dlm_get_lock_resource(), which may return a lock resource which has been purged. This will cause the process to hang forever in dlmlock() as the ast msg can't be handled due to its lock resource not existing. dlm_get_lock_resource { ... spin_lock(&dlm->spinlock); tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash); if (tmpres) { spin_unlock(&dlm->spinlock); >>>>>>>> race window, dlm_run_purge_list() may run and purge the lock resource spin_lock(&tmpres->spinlock); ... spin_unlock(&tmpres->spinlock); } } Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-13ext4: fix data corruption caused by unwritten and delayed extentsLukas Czerner
commit d2dc317d564a46dfc683978a2e5a4f91434e9711 upstream. Currently it is possible to lose whole file system block worth of data when we hit the specific interaction with unwritten and delayed extents in status extent tree. The problem is that when we insert delayed extent into extent status tree the only way to get rid of it is when we write out delayed buffer. However there is a limitation in the extent status tree implementation so that when inserting unwritten extent should there be even a single delayed block the whole unwritten extent would be marked as delayed. At this point, there is no way to get rid of the delayed extents, because there are no delayed buffers to write out. So when a we write into said unwritten extent we will convert it to written, but it still remains delayed. When we try to write into that block later ext4_da_map_blocks() will set the buffer new and delayed and map it to invalid block which causes the rest of the block to be zeroed loosing already written data. For now we can fix this by simply not allowing to set delayed status on written extent in the extent status tree. Also add WARN_ON() to make sure that we notice if this happens in the future. This problem can be easily reproduced by running the following xfs_io. xfs_io -f -c "pwrite -S 0xaa 4096 2048" \ -c "falloc 0 131072" \ -c "pwrite -S 0xbb 65536 2048" \ -c "fsync" /mnt/test/fff echo 3 > /proc/sys/vm/drop_caches xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff This can be theoretically also reproduced by at random by running fsx, but it's not very reliable, though on machines with bigger page size (like ppc) this can be seen more often (especially xfstest generic/127) Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06Merge branch 'linux-3.14.y' of ↵Kevin Hilman
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into linux-linaro-lsk-v3.14
2015-05-06fs: take i_mutex during prepare_binprm for set[ug]id executablesJann Horn
commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream. This prevents a race between chown() and execve(), where chowning a setuid-user binary to root would momentarily make the binary setuid root. This patch was mostly written by Linus Torvalds. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Charles Williams <ciwillia@brocade.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06RCU pathwalk breakage when running into a symlink overmounting somethingAl Viro
commit 3cab989afd8d8d1bc3d99fef0e7ed87c31e7b647 upstream. Calling unlazy_walk() in walk_component() and do_last() when we find a symlink that needs to be followed doesn't acquire a reference to vfsmount. That's fine when the symlink is on the same vfsmount as the parent directory (which is almost always the case), but it's not always true - one _can_ manage to bind a symlink on top of something. And in such cases we end up with excessive mntput(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06ext4: make fsync to sync parent dir in no-journal for real this timeLukas Czerner
commit e12fb97222fc41e8442896934f76d39ef99b590a upstream. Previously commit 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c added a support for for syncing parent directory of newly created inodes to make sure that the inode is not lost after a power failure in no-journal mode. However this does not work in majority of cases, namely: - if the directory has inline data - if the directory is already indexed - if the directory already has at least one block and: - the new entry fits into it - or we've successfully converted it to indexed So in those cases we might lose the inode entirely even after fsync in the no-journal mode. This also includes ext2 default mode obviously. I've noticed this while running xfstest generic/321 and even though the test should fail (we need to run fsck after a crash in no-journal mode) I could not find a newly created entries even when if it was fsynced before. Fix this by adjusting the ext4_add_entry() successful exit paths to set the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the parent directory as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Frank Mayhar <fmayhar@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06fs/binfmt_elf.c: fix bug in loading of PIE binariesMichael Davidson
commit a87938b2e246b81b4fb713edb371a9fa3c5c3c86 upstream. With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down address allocation strategy, load_elf_binary() will attempt to map a PIE binary into an address range immediately below mm->mmap_base. Unfortunately, load_elf_ binary() does not take account of the need to allocate sufficient space for the entire binary which means that, while the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are that is supposed to be the "gap" between the stack and the binary. Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this means that binaries with large data segments > 128MB can end up mapping part of their data segment over their stack resulting in corruption of the stack (and the data segment once the binary starts to run). Any PIE binary with a data segment > 128MB is vulnerable to this although address randomization means that the actual gap between the stack and the end of the binary is normally greater than 128MB. The larger the data segment of the binary the higher the probability of failure. Fix this by calculating the total size of the binary in the same way as load_elf_interp(). Signed-off-by: Michael Davidson <md@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06NFS: fix BUG() crash in notify_change() with patch to chown_common()Andrew Elble
commit c1b8940b42bb6487b10f2267a96b486276ce9ff7 upstream. We have observed a BUG() crash in fs/attr.c:notify_change(). The crash occurs during an rsync into a filesystem that is exported via NFS. 1.) fs/attr.c:notify_change() modifies the caller's version of attr. 2.) 6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations") introduced a BUG() restriction such that "no function will ever call notify_change() with both ATTR_MODE and ATTR_KILL_S*ID set". Under some circumstances though, it will have assisted in setting the caller's version of attr to this very combination. 3.) 27ac0ffeac80 ("locks: break delegations on any attribute modification") introduced code to handle breaking delegations. This can result in notify_change() being re-called. attr _must_ be explicitly reset to avoid triggering the BUG() established in #2. 4.) The path that that triggers this is via fs/open.c:chmod_common(). The combination of attr flags set here and in the first call to notify_change() along with a later failed break_deleg_wait() results in notify_change() being called again via retry_deleg without resetting attr. Solution is to move retry_deleg in chmod_common() a bit further up to ensure attr is completely reset. There are other places where this seemingly could occur, such as fs/utimes.c:utimes_common(), but the attr flags are not initially set in such a way to trigger this. Fixes: 27ac0ffeac80 ("locks: break delegations on any attribute modification") Reported-by: Eric Meddaugh <etmsys@rit.edu> Tested-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06Btrfs: fix inode eviction infinite loop after extent_same ioctlFilipe Manana
commit 113e8283869b9855c8b999796aadd506bbac155f upstream. If we pass a length of 0 to the extent_same ioctl, we end up locking an extent range with a start offset greater then its end offset (if the destination file's offset is greater than zero). This results in a warning from extent_io.c:insert_state through the following call chain: btrfs_extent_same() btrfs_double_lock() lock_extent_range() lock_extent(inode->io_tree, offset, offset + len - 1) lock_extent_bits() __set_extent_bit() insert_state() --> WARN_ON(end < start) This leads to an infinite loop when evicting the inode. This is the same problem that my previous patch titled "Btrfs: fix inode eviction infinite loop after cloning into it" addressed but for the extent_same ioctl instead of the clone ioctl. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06Btrfs: fix inode eviction infinite loop after cloning into itFilipe Manana
commit ccccf3d67294714af2d72a6fd6fd7d73b01c9329 upstream. If we attempt to clone a 0 length region into a file we can end up inserting a range in the inode's extent_io tree with a start offset that is greater then the end offset, which triggers immediately the following warning: [ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]() [ 3914.620886] BTRFS: end < start 4095 4096 (...) [ 3914.638093] Call Trace: [ 3914.638636] [<ffffffff81425fd9>] dump_stack+0x4c/0x65 [ 3914.639620] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb [ 3914.640789] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs] [ 3914.642041] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48 [ 3914.643236] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs] [ 3914.644441] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs] [ 3914.645711] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs] [ 3914.646914] [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33 [ 3914.648058] [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs] [ 3914.650105] [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs] [ 3914.651361] [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs] [ 3914.652761] [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs] [ 3914.654128] [<ffffffff811226dd>] ? might_fault+0x58/0xb5 [ 3914.655320] [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs] (...) [ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]--- This later makes the inode eviction handler enter an infinite loop that keeps dumping the following warning over and over: [ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]() [ 3915.119913] BTRFS: end < start 4095 4096 (...) [ 3915.137394] Call Trace: [ 3915.137913] [<ffffffff81425fd9>] dump_stack+0x4c/0x65 [ 3915.139154] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb [ 3915.140316] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs] [ 3915.141505] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48 [ 3915.142709] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs] [ 3915.143849] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs] [ 3915.145120] [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs] [ 3915.146352] [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50 [ 3915.147565] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs] [ 3915.148785] [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33 [ 3915.149931] [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs] [ 3915.151154] [<ffffffff81168904>] evict+0xa0/0x148 [ 3915.152094] [<ffffffff811689e5>] dispose_list+0x39/0x43 [ 3915.153081] [<ffffffff81169564>] evict_inodes+0xdc/0xeb [ 3915.154062] [<ffffffff81154418>] generic_shutdown_super+0x49/0xef [ 3915.155193] [<ffffffff811546d1>] kill_anon_super+0x13/0x1e [ 3915.156274] [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs] (...) [ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]--- So just bail out of the clone ioctl if the length of the region to clone is zero, without locking any extent range, in order to prevent this issue (same behaviour as a pwrite with a 0 length for example). This is trivial to reproduce. For example, the steps for the test I just made for fstests: mkfs.btrfs -f SCRATCH_DEV mount SCRATCH_DEV $SCRATCH_MNT touch $SCRATCH_MNT/foo touch $SCRATCH_MNT/bar $CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar umount $SCRATCH_MNT A test case for fstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06btrfs: don't accept bare namespace as a valid xattrDavid Sterba
commit 3c3b04d10ff1811a27f86684ccd2f5ba6983211d upstream. Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly works: $ touch file $ setfattr -n user. -v 1 file $ getfattr -d file user.="1" ie. the missing attribute name after the namespace. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291 Reported-by: William Douglas <william.douglas@intel.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06Btrfs: fix log tree corruption when fs mounted with -o discardFilipe Manana
commit dcc82f4783ad91d4ab654f89f37ae9291cdc846a upstream. While committing a transaction we free the log roots before we write the new super block. Freeing the log roots implies marking the disk location of every node/leaf (metadata extent) as pinned before the new super block is written. This is to prevent the disk location of log metadata extents from being reused before the new super block is written, otherwise we would have a corrupted log tree if before the new super block is written a crash/reboot happens and the location of any log tree metadata extent ended up being reused and rewritten. Even though we pinned the log tree's metadata extents, we were issuing a discard against them if the fs was mounted with the -o discard option, resulting in corruption of the log tree if a crash/reboot happened before writing the new super block - the next time the fs was mounted, during the log replay process we would find nodes/leafs of the log btree with a content full of zeroes, causing the process to fail and require the use of the tool btrfs-zero-log to wipeout the log tree (and all data previously fsynced becoming lost forever). Fix this by not doing a discard when pinning an extent. The discard will be done later when it's safe (after the new super block is committed) at extent-tree.c:btrfs_finish_extent_commit(). Fixes: e688b7252f78 (Btrfs: fix extent pinning bugs in the tree log) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29proc/pagemap: walk page tables under pte lockKonstantin Khlebnikov
commit 05fbf357d94152171bc50f8a369390f1f16efd89 upstream. Lockless access to pte in pagemap_pte_range() might race with page migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page(): CPU A (pagemap) CPU B (migration) lock_page() try_to_unmap(page, TTU_MIGRATION...) make_migration_entry() set_pte_at() <read *pte> pte_to_pagemap_entry() remove_migration_ptes() unlock_page() if(is_migration_entry()) migration_entry_to_page() BUG_ON(!PageLocked(page)) Also lockless read might be non-atomic if pte is larger than wordsize. Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes. Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29mm: softdirty: unmapped addresses between VMAs are cleanPeter Feiner
commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4 upstream. If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported as softdirty. Here's a program to demonstrate the bug: int main() { const uint64_t PAGEMAP_SOFTDIRTY = 1ul << 55; uint64_t pme[3]; int fd = open("/proc/self/pagemap", O_RDONLY);; char *m = mmap(NULL, 3 * getpagesize(), PROT_READ, MAP_ANONYMOUS | MAP_SHARED, -1, 0); munmap(m + getpagesize(), getpagesize()); pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8); assert(pme[0] & PAGEMAP_SOFTDIRTY); /* passes */ assert(!(pme[1] & PAGEMAP_SOFTDIRTY)); /* fails */ assert(pme[2] & PAGEMAP_SOFTDIRTY); /* passes */ return 0; } (Note that all pages in new VMAs are softdirty until cleared). Tested: Used the program given above. I'm going to include this code in a selftest in the future. [n-horiguchi@ah.jp.nec.com: prevent pagemap_pte_range() from overrunning] Signed-off-by: Peter Feiner <pfeiner@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29move d_rcu from overlapping d_child to overlapping d_aliasAl Viro
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. move d_rcu from overlapping d_child to overlapping d_alias Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Hutchings <ben@decadent.org.uk> [hujianyang: Backported to 3.14 refer to the work of Ben Hutchings in 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-20Merge branch 'linux-linaro-lsk-v3.14' of ↵Alex Shi
git://git.linaro.org/kernel/linux-linaro-stable into linux-linaro-lsk-v3.14
2015-04-20Merge branch 'linux-linaro-lsk-v3.14-android' of ↵Alex Shi
git://git.linaro.org/kernel/linux-linaro-stable into linux-linaro-lsk-v3.14-android
2015-04-20Merge branch 'linux-linaro-lsk-v3.14' into linux-linaro-lsk-v3.14-androidAlex Shi
2015-04-20 Merge tag 'v3.14.39' into linux-linaro-lsk-v3.14Alex Shi
This is the 3.14.39 stable release
2015-04-20Merge branch 'linux-linaro-lsk-v3.14' into linux-linaro-lsk-v3.14-androidAlex Shi
2015-04-20Merge branch 'v3.14/topic/overlayfs' into linux-linaro-lsk-v3.14Alex Shi
To fix fuse compiling issues.
2015-04-20fuse: add renameat2 supportMiklos Szeredi
Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit 1560c974dcd40a8d3f193283acd7cc6aee13dc13) Signed-off-by: Alex Shi <alex.shi@linaro.org> Conflicts: fs/fuse/dir.c include/uapi/linux/fuse.h
2015-04-20fuse: trust kernel i_ctime onlyMaxim Patlasov
Let the kernel maintain i_ctime locally: update i_ctime explicitly on truncate, fallocate, open(O_TRUNC), setxattr, removexattr, link, rename, unlink. The inode flag I_DIRTY_SYNC serves as indication that local i_ctime should be flushed to the server eventually. The patch sets the flag and updates i_ctime in course of operations listed above. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit 31f3267b4ba16b12fb9dd3b1953ea0f221cc2ab4) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-04-20fuse: Trust kernel i_mtime onlyMaxim Patlasov
Let the kernel maintain i_mtime locally: - clear S_NOCMTIME - implement i_op->update_time() - flush mtime on fsync and last close - update i_mtime explicitly on truncate and fallocate Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime should be flushed to the server eventually. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit b0aa760652179072119582375f8dc896ed5b5dfd) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-04-20fuse: Trust kernel i_size onlyPavel Emelyanov
Make fuse think that when writeback is on the inode's i_size is always up-to-date and not update it with the value received from the userspace. This is done because the page cache code may update i_size without letting the FS know. This assumption implies fixing the previously introduced short-read helper -- when a short read occurs the 'hole' is filled with zeroes. fuse_file_fallocate() is also fixed because now we should keep i_size up to date, so it must be updated if FUSE_FALLOCATE request succeeded. Signed-off-by: Maxim V. Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit 8373200b124d03de7fa2e99be56de8642e604e9e) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-04-20fuse: Prepare to handle short readsPavel Emelyanov
A helper which gets called when read reports less bytes than was requested. See patch "trust kernel i_size only" for details. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit a92adc824ed5feaa2d4f7029f21170f574987aee) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-04-20fuse: Linking file to inode helperPavel Emelyanov
When writeback is ON every writeable file should be in per-inode write list, not only mmap-ed ones. Thus introduce a helper for this linkage. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit 650b22b941fa03590c4a3671e79ec2c96ea59e9a) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-04-20fuse: Connection bit for enabling writebackPavel Emelyanov
Off (0) by default. Will be used in the next patches and will be turned on at the very end. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit d5cd66c58edf10a7ee786659994595fd43995aab) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-04-19ioctx_alloc(): fix vma (and file) leak on failureAl Viro
commit deeb8525f9bcea60f5e86521880c1161de7a5829 upstream. If we fail past the aio_setup_ring(), we need to destroy the mapping. We don't need to care about anybody having found ctx, or added requests to it, since the last failure exit is exactly the failure to make ctx visible to lookups. Reproducer (based on one by Joe Mario <jmario@redhat.com>): void count(char *p) { char s[80]; printf("%s: ", p); fflush(stdout); sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid()); system(s); } int main() { io_context_t *ctx; int created, limit, i, destroyed; FILE *f; count("before"); if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL) perror("opening aio-max-nr"); else if (fscanf(f, "%d", &limit) != 1) fprintf(stderr, "can't parse aio-max-nr\n"); else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL) perror("allocating aio_context_t array"); else { for (i = 0, created = 0; i < limit; i++) { if (io_setup(1000, ctx + created) == 0) created++; } for (i = 0, destroyed = 0; i < created; i++) if (io_destroy(ctx[i]) == 0) destroyed++; printf("created %d, failed %d, destroyed %d\n", created, limit - created, destroyed); count("after"); } } Found-by: Joe Mario <jmario@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19ocfs2: _really_ sync the right rangeAl Viro
commit 64b4e2526d1cf6e6a4db6213d6e2b6e6ab59479a upstream. "ocfs2 syncs the wrong range" had been broken; prior to it the code was doing the wrong thing in case of O_APPEND, all right, but _after_ it we were syncing the wrong range in 100% cases. *ppos, aka iocb->ki_pos is incremented prior to that point, so we are always doing sync on the area _after_ the one we'd written to. Spotted by Joseph Qi <joseph.qi@huawei.com> back in January; unfortunately, I'd missed his mail back then ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19cifs: fix use-after-free bug in find_writable_fileDavid Disseldorp
commit e1e9bda22d7ddf88515e8fe401887e313922823e upstream. Under intermittent network outages, find_writable_file() is susceptible to the following race condition, which results in a user-after-free in the cifs_writepages code-path: Thread 1 Thread 2 ======== ======== inv_file = NULL refind = 0 spin_lock(&cifs_file_list_lock) // invalidHandle found on openFileList inv_file = open_file // inv_file->count currently 1 cifsFileInfo_get(inv_file) // inv_file->count = 2 spin_unlock(&cifs_file_list_lock); cifs_reopen_file() cifs_close() // fails (rc != 0) ->cifsFileInfo_put() spin_lock(&cifs_file_list_lock) // inv_file->count = 1 spin_unlock(&cifs_file_list_lock) spin_lock(&cifs_file_list_lock); list_move_tail(&inv_file->flist, &cifs_inode->openFileList); spin_unlock(&cifs_file_list_lock); cifsFileInfo_put(inv_file); ->spin_lock(&cifs_file_list_lock) // inv_file->count = 0 list_del(&cifs_file->flist); // cleanup!! kfree(cifs_file); spin_unlock(&cifs_file_list_lock); spin_lock(&cifs_file_list_lock); ++refind; // refind = 1 goto refind_writable; At this point we loop back through with an invalid inv_file pointer and a refind value of 1. On second pass, inv_file is not overwritten on openFileList traversal, and is subsequently dereferenced. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19cifs: smb2_clone_range() - exit on unhandled errorSachin Prabhu
commit 2477bc58d49edb1c0baf59df7dc093dce682af2b upstream. While attempting to clone a file on a samba server, we receive a STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which isn't handled in smb2_clone_range(). We end up looping in the while loop making same call to the samba server over and over again. The proposed fix is to exit and return the error value when encountered with an unhandled error. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <steve.french@primarydata.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19btrfs: simplify insert_orphan_itemDavid Sterba
commit 9c4f61f01d269815bb7c37be3ede59c5587747c6 upstream. We can search and add the orphan item in one go, btrfs_insert_orphan_item will find out if the item already exists. Signed-off-by: David Sterba <dsterba@suse.cz> Cc: Chris Mason <clm@fb.com> Cc: Roman Mamedov <rm@romanrm.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-18Merge branch 'linux-linaro-lsk-v3.14' into linux-linaro-lsk-v3.14-androidAlex Shi
Conflicts: arch/x86/syscalls/syscall_64.tbl
2015-04-18Merge remote-tracking branch 'origin/v3.14/topic/overlayfs' into ↵Alex Shi
linux-linaro-lsk-v3.14