aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2015-11-13xattr handlers: Pass handler to operations instead of flagsAndreas Gruenbacher
The xattr_handler operations are currently all passed a file system specific flags value which the operations can use to disambiguate between different handlers; some file systems use that to distinguish the xattr namespace, for example. In some oprations, it would be useful to also have access to the handler prefix. To allow that, pass a pointer to the handler to operations instead of the flags value alone. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...
2015-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions
2015-11-06mm, page_alloc: distinguish between being unable to sleep, unwilling to ↵Mel Gorman
sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem update from James Morris: "This is mostly maintenance updates across the subsystem, with a notable update for TPM 2.0, and addition of Jarkko Sakkinen as a maintainer of that" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits) apparmor: clarify CRYPTO dependency selinux: Use a kmem_cache for allocation struct file_security_struct selinux: ioctl_has_perm should be static selinux: use sprintf return value selinux: use kstrdup() in security_get_bools() selinux: use kmemdup in security_sid_to_context_core() selinux: remove pointless cast in selinux_inode_setsecurity() selinux: introduce security_context_str_to_sid selinux: do not check open perm on ftruncate call selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default KEYS: Merge the type-specific data with the payload data KEYS: Provide a script to extract a module signature KEYS: Provide a script to extract the sys cert list from a vmlinux file keys: Be more consistent in selection of union members used certs: add .gitignore to stop git nagging about x509_certificate_list KEYS: use kvfree() in add_key Smack: limited capability for changing process label TPM: remove unnecessary little endian conversion vTPM: support little endian guests char: Drop owner assignment from i2c_driver ...
2015-11-05Merge tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking updates from Jeff Layton: "The largest series of changes is from Ben who offered up a set to add a new helper function for setting locks based on the type set in fl_flags. Dmitry also send in a fix for a potential race that he found with KTSAN" * tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux: locks: cleanup posix_lock_inode_wait and flock_lock_inode_wait Move locks API users to locks_lock_inode_wait() locks: introduce locks_lock_inode_wait() locks: Use more file_inode and fix a comment fs: fix data races on inode->i_flctx locks: change tracepoint for generic_add_lease
2015-10-22Move locks API users to locks_lock_inode_wait()Benjamin Coddington
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct locks API function, use the check within locks_lock_inode_wait(). This allows for some later cleanup. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-10-21KEYS: Merge the type-specific data with the payload dataDavid Howells
Merge the type-specific data with the payload data into one four-word chunk as it seems pointless to keep them separate. Use user_key_payload() for accessing the payloads of overloaded user-defined keys. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cifs@vger.kernel.org cc: ecryptfs@vger.kernel.org cc: linux-ext4@vger.kernel.org cc: linux-f2fs-devel@lists.sourceforge.net cc: linux-nfs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-ima-devel@lists.sourceforge.net
2015-10-06NFS: Fix a tracepoint NULL-pointer dereferenceAnna Schumaker
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file enabled produces a NULL-pointer dereference when calculating fileid and filehandle of the opened file. Fix this by checking if state is NULL before trying to use the inode pointer. Reported-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02nfs4: reset states to use open_stateid when returning delegation voluntarilyJeff Layton
When the client goes to return a delegation, it should always update any nfs4_state currently set up to use that delegation stateid to instead use the open stateid. It already does do this in some cases, particularly in the state recovery code, but not currently when the delegation is voluntarily returned (e.g. in advance of a RENAME). This causes the client to try to continue using the delegation stateid after the DELEGRETURN, e.g. in LAYOUTGET. Set the nfs4_state back to using the open stateid in nfs4_open_delegation_recall, just before clearing the NFS_DELEGATED_STATE bit. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4: Fix a nograce recovery hangBenjamin Coddington
Since commit 5cae02f42793130e1387f4ec09c4d07056ce9fa5 an OPEN_CONFIRM should have a privileged sequence in the recovery case to allow nograce recovery to proceed for NFSv4.0. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FHTrond Myklebust
We need to warn against broken NFSv4.1 servers that try to hand out delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4: Don't try to reclaim unused state ownersTrond Myklebust
Currently, we don't test if the state owner is in use before we try to recover it. The problem is that if the refcount is zero, then the state owner will be waiting on the lru list for garbage collection. The expectation in that case is that if you bump the refcount, then you must also remove the state owner from the lru list. Otherwise the call to nfs4_put_state_owner will corrupt that list by trying to add our state owner a second time. Avoid the whole problem by just skipping state owners that hold no state. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFS: Fix a write performance regressionTrond Myklebust
If all other conditions in nfs_can_extend_write() are met, and there are no locks, then we should be able to assume close-to-open semantics and the ability to extend our write to cover the whole page. With this patch, the xfstests generic/074 test completes in 242s instead of >1400s on my test rig. Fixes: bd61e0a9c852 ("locks: convert posix locks to file_lock_context") Cc: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFS: Fix up page writeback accountingTrond Myklebust
Currently, we are crediting all the calls to nfs_writepages_callback() (i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from being inconsistent with the behaviour of the equivalent readpage/readpages accounting, this also means that we cannot distinguish between bulk writes and single page writebacks (which confuses the 'nfsiostat -p' tool). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-29fs: Drop unlikely before IS_ERR(_OR_NULL)Viresh Kumar
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-23NFS41: make close wait for layoutreturnPeng Tao
If we send a layoutreturn asynchronously before close, the close might reach server first and layoutreturn would fail with BADSTATEID because there is nothing keeping the layout stateid alive. Also do not pretend sending layoutreturn if we are not. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-22NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is setKinglong Mee
When lseg's commit_through_mds is set, pnfs client always WARN once in nfs_direct_select_verf after checking ds_cinfo.nbuckets. nfs should use the DS verf except commit_through_mds is set for layout segment where nbuckets is zero. [17844.666094] ------------[ cut here ]------------ [17844.667071] WARNING: CPU: 0 PID: 21758 at /root/source/linux-pnfs/fs/nfs/direct.c:174 nfs_direct_select_verf+0x5a/0x70 [nfs]() [17844.668650] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul auth_rpcgss crc32_pclmul crc32c_intel nfs_acl ghash_clmulni_intel lockd vmw_balloon xor vmw_vmci grace raid6_pq shpchp sunrpc parport_pc i2c_piix4 parport vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache] [17844.686676] CPU: 0 PID: 21758 Comm: kworker/0:1 Tainted: G W OE 4.3.0-rc1-pnfs+ #245 [17844.687352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [17844.698502] Workqueue: nfsiod rpc_async_release [sunrpc] [17844.699212] 0000000000000009 0000000043e58010 ffff8800454fbc10 ffffffff813680c4 [17844.699990] ffff8800454fbc48 ffffffff8108b49d ffff88004eb20000 ffff88004eb20000 [17844.700844] ffff880062e26000 0000000000000000 0000000000000001 ffff8800454fbc58 [17844.701637] Call Trace: [17844.725252] [<ffffffff813680c4>] dump_stack+0x19/0x25 [17844.732693] [<ffffffff8108b49d>] warn_slowpath_common+0x7d/0xb0 [17844.733855] [<ffffffff8108b5da>] warn_slowpath_null+0x1a/0x20 [17844.735015] [<ffffffffa04a27ca>] nfs_direct_select_verf+0x5a/0x70 [nfs] [17844.735999] [<ffffffffa04a2b83>] nfs_direct_set_hdr_verf+0x23/0x90 [nfs] [17844.736846] [<ffffffffa04a2e17>] nfs_direct_write_completion+0x227/0x260 [nfs] [17844.737782] [<ffffffffa04a433c>] nfs_pgio_release+0x1c/0x20 [nfs] [17844.738597] [<ffffffffa0502df3>] pnfs_generic_rw_release+0x23/0x30 [nfsv4] [17844.739486] [<ffffffffa01cbbea>] rpc_free_task+0x2a/0x70 [sunrpc] [17844.740326] [<ffffffffa01cbcd5>] rpc_async_release+0x15/0x20 [sunrpc] [17844.741173] [<ffffffff810a387c>] process_one_work+0x21c/0x4c0 [17844.741984] [<ffffffff810a37cd>] ? process_one_work+0x16d/0x4c0 [17844.742837] [<ffffffff810a3b6a>] worker_thread+0x4a/0x440 [17844.743639] [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0 [17844.744399] [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0 [17844.745176] [<ffffffff810a8d75>] kthread+0xf5/0x110 [17844.745927] [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240 [17844.747105] [<ffffffff8172ce1f>] ret_from_fork+0x3f/0x70 [17844.747856] [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240 [17844.748642] ---[ end trace 336a2845d42b83f0 ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFSv4.x/pnfs: Don't try to recover stateids twice in layoutgetTrond Myklebust
If the current open or layout stateid doesn't match the stateid used in the layoutget RPC call, then don't try to recover it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFSv4: Recovery of recalled read delegations is brokenTrond Myklebust
When a read delegation is being recalled, and we're reclaiming the cached opens, we need to make sure that we only reclaim read-only modes. A previous attempt to do this, relied on retrieving the delegation type from the nfs4_opendata structure. Unfortunately, as Kinglong pointed out, this field can only be set when performing reboot recovery. Furthermore, if we call nfs4_open_recover(), then we end up clobbering the state->flags for all modes that we're not recovering... The fix is to have the delegation recall code pass this information to the recovery call, and then refactor the recovery code so that nfs4_open_delegation_recall() does not need to call nfs4_open_recover(). Reported-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 39f897fdbd46 ("NFSv4: When returning a delegation, don't...") Tested-by: Kinglong Mee <kinglongmee@gmail.com> Cc: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFS: Fix an infinite loop when layoutget fail with BAD_STATEIDKinglong Mee
If layouget fail with BAD_STATEID, restart should not using the old stateid. But, nfs client choose the layout stateid at first, and then the open stateid. To avoid the infinite loop of using bad stateid for layoutget, this patch sets the layout flag'ss NFS_LAYOUT_INVALID_STID bit to skip choosing the bad layout stateid. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFS: Do cleanup before resetting pageio read/write to mdsKinglong Mee
There is a reference leak of layout segment after resetting pageio read/write to mds. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-17nfs/filelayout: Fix NULL reference caused by double freeing of fh_arrayKinglong Mee
If filelayout_decode_layout fail, _filelayout_free_lseg will causes a double freeing of fh_array. [ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files] [ 1179.281010] PGD 0 [ 1179.281443] Oops: 0000 [#1] [ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache] [ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G OE 4.3.0-rc1-pnfs+ #244 [ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000 [ 1179.285668] RIP: 0010:[<ffffffffa027222d>] [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files] [ 1179.286612] RSP: 0018:ffff88003e3c77f8 EFLAGS: 00010202 [ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000 [ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0 [ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000 [ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8 [ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88 [ 1179.290791] FS: 00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000 [ 1179.291580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0 [ 1179.292731] Stack: [ 1179.293195] ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868 [ 1179.293676] ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800 [ 1179.294151] 00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100 [ 1179.294623] Call Trace: [ 1179.295092] [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files] [ 1179.295625] [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0 [ 1179.296133] [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4] [ 1179.296632] [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4] [ 1179.297134] [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4] [ 1179.297632] [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs] [ 1179.298158] [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files] [ 1179.298834] [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs] [ 1179.299385] [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs] [ 1179.299872] [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs] [ 1179.300362] [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs] [ 1179.300907] [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0 [ 1179.301391] [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs] [ 1179.301867] [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs] [ 1179.302330] [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280 [ 1179.302784] [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280 [ 1179.303413] [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0 [ 1179.303855] [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50 [ 1179.304286] [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0 [ 1179.304711] [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs] [ 1179.305132] [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs] [ 1179.305540] [<ffffffff811e343c>] __vfs_read+0xcc/0x100 [ 1179.305936] [<ffffffff811e3d15>] vfs_read+0x85/0x130 [ 1179.306326] [<ffffffff811e4a98>] SyS_read+0x58/0xd0 [ 1179.306708] [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76 [ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85 [ 1179.308357] RIP [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files] [ 1179.309177] RSP <ffff88003e3c77f8> [ 1179.309582] CR2: 0000000000000000 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-17nfs: fix v4.2 SEEK on files over 2 gigsJ. Bruce Fields
We're incorrectly assigning a loff_t return to an int. If SEEK_HOLE or SEEK_DATA returns an offset over 2^31 then the application will see a weird lseek() result (usually -EIO). Cc: stable@vger.kernel.org Fixes: bdcc2cd14e4e "NFSv4.2: handle NFS-specific llseek errors" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-17nfs: fix pg_test page count calculationPeng Tao
We really want sizeof(struct page *) instead. Otherwise we limit maximum IO size to 64 pages rather than 512 pages on a 64bit system. Fixes 2e11f829(nfs: cap request size to fit a kmalloced page array). Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Fixes: 2e11f8296d22 ("nfs: cap request size to fit a kmalloced page array") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-17Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x ↵Olga Kornievskaia
mount A test case is as the description says: open(foobar, O_WRONLY); sleep() --> reboot the server close(foobar) The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few line before going to restart, there is clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags). NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes out state and when we go to close it, “call_close” doesn’t get set as state flag is not set and CLOSE doesn’t go on the wire. Signed-off-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-07Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data" * tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits) NFSv4: Respect the server imposed limit on how many changes we may cache NFSv4: Express delegation limit in units of pages Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" NFS: Optimise away the close-to-open getattr if there is no cached data NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error() nfs: Remove unneeded checking of the return value from scnprintf nfs: Fix truncated client owner id without proto type NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid() NFSv4.1/flexfiles: Fix freeing of mirrors NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file NFSv4.1/pnfs: Handle LAYOUTGET return values correctly NFSv4.1/pnfs: Don't ask for a read layout for an empty file. NFSv4.1: Fix a protocol issue with CLOSE stateids NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors SUNRPC: Prevent SYN+SYNACK+RST storms SUNRPC: xs_reset_transport must mark the connection as disconnected NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload ...
2015-09-07NFSv4: Respect the server imposed limit on how many changes we may cacheTrond Myklebust
The NFSv4 delegation spec allows the server to tell a client to limit how much data it cache after the file is closed. In return, the server guarantees enough free space to avoid ENOSPC situations, etc. Prior to this patch, we assumed we could always cache aggressively after close. Unfortunately, this causes problems with servers that set the limit to 0 and therefore do not offer any ENOSPC guarantees. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-07NFSv4: Express delegation limit in units of pagesTrond Myklebust
Since we're tracking modifications to the page cache on a per-page basis, it makes sense to express the limit to how much we may cache in units of pages. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-05Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Nothing major, but: - Add Jeff Layton as an nfsd co-maintainer: no change to existing practice, just an acknowledgement of the status quo. - Two patches ("nfsd: ensure that...") for a race overlooked by the state locking rewrite, causing a crash noticed by multiple users. - Lots of smaller bugfixes all over from Kinglong Mee. - From Jeff, some cleanup of server rpc code in preparation for possible shift of nfsd threads to workqueues" * tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits) nfsd: deal with DELEGRETURN racing with CB_RECALL nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case nfsd: ensure that delegation stateid hash references are only put once nfsd: ensure that the ol stateid hash reference is only put once net: sunrpc: fix tracepoint Warning: unknown op '->' nfsd: allow more than one laundry job to run at a time nfsd: don't WARN/backtrace for invalid container deployment. fs: fix fs/locks.c kernel-doc warning nfsd: Add Jeff Layton as co-maintainer NFSD: Return word2 bitmask if setting security label in OPEN/CREATE NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1 nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL. nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug NFSD: Store parent's stat in a separate value nfsd: Fix two typos in comments lockd: NLM grace period shouldn't block NFSv4 opens nfsd: include linux/nfs4.h in export.h sunrpc: Switch to using hash list instead single list sunrpc/nfsd: Remove redundant code by exports seq_operations functions sunrpc: Store cache_detail in seq_file's private directly ...
2015-09-04Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"Trond Myklebust
This reverts commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf. This commit causes a NFSv4 regression in that close()+unlink() can end up failing. The reason is that we no longer have a guarantee that the CLOSE has completed on the server, meaning that the subsequent call to REMOVE may fail with NFS4ERR_FILE_OPEN if the server implements Windows unlink() semantics. Reported-by: <Olga Kornievskaia <aglo@umich.edu> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-04NFS: Optimise away the close-to-open getattr if there is no cached dataTrond Myklebust
If there is no cached data, then there is no need to track the file change attribute on close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cbTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()Trond Myklebust
When I/O cannot complete due to a fatal error on the DS, ensure that we invalidate the corresponding layout segment and return it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "This first core part of the block IO changes contains: - Cleanup of the bio IO error signaling from Christoph. We used to rely on the uptodate bit and passing around of an error, now we store the error in the bio itself. - Improvement of the above from myself, by shrinking the bio size down again to fit in two cachelines on x86-64. - Revert of the max_hw_sectors cap removal from a revision again, from Jeff Moyer. This caused performance regressions in various tests. Reinstate the limit, bump it to a more reasonable size instead. - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me. Most devices have huge trim limits, which can cause nasty latencies when deleting files. Enable the admin to configure the size down. We will look into having a more sane default instead of UINT_MAX sectors. - Improvement of the SGP gaps logic from Keith Busch. - Enable the block core to handle arbitrarily sized bios, which enables a nice simplification of bio_add_page() (which is an IO hot path). From Kent. - Improvements to the partition io stats accounting, making it faster. From Ming Lei. - Also from Ming Lei, a basic fixup for overflow of the sysfs pending file in blk-mq, as well as a fix for a blk-mq timeout race condition. - Ming Lin has been carrying Kents above mentioned patches forward for a while, and testing them. Ming also did a few fixes around that. - Sasha Levin found and fixed a use-after-free problem introduced by the bio->bi_error changes from Christoph. - Small blk cgroup cleanup from Viresh Kumar" * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits) blk: Fix bio_io_vec index when checking bvec gaps block: Replace SG_GAPS with new queue limits mask block: bump BLK_DEF_MAX_SECTORS to 2560 Revert "block: remove artifical max_hw_sectors cap" blk-mq: fix race between timeout and freeing request blk-mq: fix buffer overflow when reading sysfs file of 'pending' Documentation: update notes in biovecs about arbitrarily sized bios block: remove bio_get_nr_vecs() fs: use helper bio_add_page() instead of open coding on bi_io_vec block: kill merge_bvec_fn() completely md/raid5: get rid of bio_fits_rdev() md/raid5: split bio for chunk_aligned_read block: remove split code in blkdev_issue_{discard,write_same} btrfs: remove bio splitting and merge_bvec_fn() calls bcache: remove driver private bio splitting code block: simplify bio_add_page() block: make generic_make_request handle arbitrarily sized bios blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) block: don't access bio->bi_error after bio_put() block: shrink struct bio down to 2 cache lines again ...
2015-09-01nfs: Remove unneeded checking of the return value from scnprintfKinglong Mee
The return value from scnprintf always less than the buffer length. So, result >= len always false. This patch removes those checking. int vscnprintf(char *buf, size_t size, const char *fmt, va_list args) { int i; i = vsnprintf(buf, size, fmt, args); if (likely(i < size)) return i; if (size != 0) return size - 1; return 0; } Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01nfs: Fix truncated client owner id without proto typeKinglong Mee
The length of "Linux NFSv4.0 " is 14, not 10. Without this patch, I get a truncated client owner id as, "Linux NFSv4.0 ::1/::1" With this patch, "Linux NFSv4.0 ::1/::1 tcp" Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalidTrond Myklebust
If a read-write layout has an invalid mirror, then we should mark it as invalid, and return it. If a read-only layout has an invalid mirror, then mark it as invalid and check if there is still at least one valid mirror before we return it. Note: Also fix incorrect use of pnfs_generic_mark_devid_invalid(). We really want nfs4_mark_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are validTrond Myklebust
Unlike read layouts, the writeable layout cannot fall back to using only one of the mirrors. It need to write to all of them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()Trond Myklebust
Unlike the files layout, flexfiles does not test for the NFS_DEVICEID_INVALID flag. Instead it relies on NFS_DEVICEID_UNAVAILABLE. Fix is to replace with nfs4_mark_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: Fix freeing of mirrorsTrond Myklebust
Mirrors are now shared objects, so we should not be freeing them directly inside ff_layout_free_lseg(). We should already be doing the right thing in _ff_layout_free_lseg(), so just let it handle things. Also ensure that ff_layout_free_mirror() frees the RPC credential if it is set. Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of fileTrond Myklebust
If we have a read layout, then sanity check the minimal layout length so that it does not extend beyond the end of file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1/pnfs: Handle LAYOUTGET return values correctlyTrond Myklebust
According to RFC5661 section 18.43.3, if the server cannot satisfy the loga_minlength argument to LAYOUTGET, there are 2 cases: 1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER 2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1/pnfs: Don't ask for a read layout for an empty file.Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-30NFSv4.1: Fix a protocol issue with CLOSE stateidsTrond Myklebust
According to RFC5661 Section 18.2.4, CLOSE is supposed to return the zero stateid. This means that nfs_clear_open_stateid_locked() cannot assume that the result stateid will always match the 'other' field of the existing open stateid when trying to determine a race with a parallel OPEN. Instead, we look at the argument, and check for matches. Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-30NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errorsTrond Myklebust
If the file was fenced and/or has been deleted on the DS, then we want to retry pNFS after a layoutreturn with error report. If the server cannot fix the problem, then we rely on it to tell us so in the response to the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payloadTrond Myklebust
The "FIXME" is outdated. Flexfiles does add a payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFSv4.1/flexfiles: Fix a protocol error in layoutreturnTrond Myklebust
According to the flexfiles protocol, the layoutreturn should specify an array of errors in the following format: struct ff_ioerr4 { offset4 ffie_offset; length4 ffie_length; stateid4 ffie_stateid; device_error4 ffie_errors<>; }; This patch fixes up the code to ensure that our ffie_errors is indeed encoded as an array (albeit with only a single entry). Reported-by: Tom Haynes <thomas.haynes@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1Kinglong Mee
Client sends a SETATTR request after OPEN for updating attributes. For create file with S_ISGID is set, the S_ISGID in SETATTR will be ignored at nfs server as chmod of no PERMISSION. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFS: Get suppattr_exclcreat when getting server capabilitiesKinglong Mee
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode depends on suppattr_exclcreat attribut. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>