aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2011-05-25ext4: add new function ext4_block_zero_page_range()Allison Henderson
This patch modifies the existing ext4_block_truncate_page() function which was used by the truncate code path, and which zeroes out block unaligned data, by adding a new length parameter, and renames it to ext4_block_zero_page_rage(). This function can now be used to zero out the head of a block, the tail of a block, or the middle of a block. The ext4_block_truncate_page() function is now a wrapper to ext4_block_zero_page_range(). [ext4 punch hole patch series 2/5 v7] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>
2011-05-25ext4: add flag to ext4_has_free_blocksAllison Henderson
This patch adds an allocation request flag to the ext4_has_free_blocks function which enables the use of reserved blocks. This will allow a punch hole to proceed even if the disk is full. Punching a hole may require additional blocks to first split the extents. Because ext4_has_free_blocks is a low level function, the flag needs to be passed down through several functions listed below: ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth ext4_ext_split ext4_ext_new_meta_block ext4_mb_new_blocks ext4_claim_free_blocks ext4_has_free_blocks [ext4 punch hole patch series 1/5 v7] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>
2011-05-24ext4: reserve inodes and feature code for 'quota' featureAditya Kali
I am working on patch to add quota as a built-in feature for ext4 filesystem. The implementation is based on the design given at https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4. This patch reserves the inode numbers 3 and 4 for quota purposes and also reserves EXT4_FEATURE_RO_COMPAT_QUOTA feature code. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-24ext4: add support for multiple mount protectionJohann Lombardi
Prevent an ext4 filesystem from being mounted multiple times. A sequence number is stored on disk and is periodically updated (every 5 seconds by default) by a mounted filesystem. At mount time, we now wait for s_mmp_update_interval seconds to make sure that the MMP sequence does not change. In case of failure, the nodename, bdevname and the time at which the MMP block was last updated is displayed. Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-24ext4: ensure f_bfree returned by ext4_statfs() is non-negativeKazuya Mio
I found the issue that the number of free blocks went negative. # stat -f /mnt/mp1/ File: "/mnt/mp1/" ID: e175ccb83a872efe Namelen: 255 Type: ext2/ext3 Block size: 4096 Fundamental block size: 4096 Blocks: Total: 258022 Free: -15 Available: -13122 Inodes: Total: 65536 Free: 63029 f_bfree in struct statfs will go negative when the filesystem has few free blocks. Because the number of dirty blocks is bigger than the number of free blocks in the following two cases. CASE 1: ext4_da_writepages mpage_da_map_and_submit ext4_map_blocks ext4_ext_map_blocks ext4_mb_new_blocks ext4_mb_diskspace_used percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len); <--- interrupt statfs systemcall ---> ext4_da_update_reserve_space percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + ei->i_allocated_meta_blocks); CASE 2: ext4_write_begin __block_write_begin ext4_map_blocks ext4_ext_map_blocks ext4_mb_new_blocks ext4_mb_diskspace_used percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len); <--- interrupt statfs systemcall ---> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks); To avoid the issue, this patch ensures that f_bfree is non-negative. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
2011-05-24ext4: protect bb_first_free in ext4_trim_all_free() with group lockLukas Czerner
We should protect reading bd_info->bb_first_free with the group lock because otherwise we might miss some free blocks. This is not a big deal at all, but the change to do right thing is really simple, so lets do that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-24ext4: only load buddy bitmap in ext4_trim_fs() when it is neededLukas Czerner
Currently we are loading buddy ext4_mb_load_buddy() for every block group we are going through in ext4_trim_fs() in many cases just to find out that there is not enough space to be bothered with. As Amir Goldstein suggested we can use bb_free information directly from ext4_group_info. This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather get the ext4_group_info via ext4_get_group_info() and use the bb_free information directly from that. This avoids unnecessary call to load buddy in the case the group does not have enough free space to trim. Loading buddy is now moved to ext4_trim_all_free(). Tested by me with xfstests 251. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-24ext4: fix waiting and sending of a barrier in ext4_sync_file()Jan Kara
jbd2_log_start_commit() returns 1 only when we really start a transaction. But we also need to wait for a transaction when the commit is already running. Fix this problem by waiting for transaction commit unconditionally (which is just a quick check if the transaction is already committed). Also we have to be more careful with sending of a barrier because when transaction is being committed in parallel to ext4_sync_file() running, we cannot be sure that the barrier the journalling code sends happens after we wrote all the data for fsync (note that not every data writeout needs to trigger metadata changes thus commit of some metadata changes can be running while other data is still written out). So use jbd2_will_send_data_barrier() helper to detect the common cases when we can be sure barrier will be issued by the commit code and issue the barrier ourselves in the remaining cases. Reported-by: Edward Goggin <egoggin@vmware.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-24ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctlyYongqiang Yang
To get delayed-extent information, ext4_ext_fiemap_cb() looks up pagecache, it thus collects information starting from a page's head block. If blocksize < pagesize, the beginning blocks of a page may lies before the request range. So ext4_ext_fiemap_cb() should proceed ignoring them, because they has been handled before. If no mapped buffer in the range is found in the 1st page, we need to look up the 2nd page, otherwise delayed-extents after a hole will be ignored. Without this patch, xfstests 225 will hung on ext4 with 1K block. Reported-by: Amir Goldstein <amir73il@users.sourceforge.net> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-23ext4: use truncate_setsize() unconditionallyTheodore Ts'o
In commit c8d46e41 (ext4: Add flag to files with blocks intentionally past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate() before calling vmtruncate(). This caused any allocated but unwritten blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE flag to be dropped. This was done to make to make sure that EOFBLOCKS_FL would not be cleared while still leaving blocks past i_size allocated. This was not necessary, since ext4_truncate() guarantees that blocks past i_size will be dropped, even in the case where truncate() has increased i_size before calling ext4_truncate(). So fix this by removing the EOFBLOCKS_FL special case treatment in ext4_setattr(). In addition, use truncate_setsize() followed by a call to ext4_truncate() instead of using vmtruncate(). This is more efficient since it skips the call to inode_newsize_ok(), which has been checked already by inode_change_ok(). This is also in a win in the case where EOFBLOCKS_FL is set since it avoids calling ext4_truncate() twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-22ext4: fix unbalanced up_write() in ext4_ext_truncate() error pathEric Gouriou
ext4_ext_truncate() should not invoke up_write(&EXT4_I(inode)->i_data_sem) when ext4_orphan_add() returns an error, as it hasn't performed a down_write() yet. This trivial patch fixes this by moving the up_write() invocation above the out_stop label. Signed-off-by: Eric Gouriou <egouriou@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-22ext4: count hits/misses of extent cache and expose in sysfsVivek Haldar
The number of hits and misses for each filesystem is exposed in /sys/fs/ext4/<dev>/extent_cache_{hits, misses}. Tested: fsstress, manual checks. Signed-off-by: Vivek Haldar <haldar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-22ext4: make ext4_split_extent() handle error correctlyYongqiang Yang
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>
2011-05-22ext4: don't show mount options in /proc/mounts if there is no journalTheodore Ts'o
After creating an ext4 file system without a journal: # mke2fs -t ext4 -O ^has_journal /dev/sda # mount -t ext4 /dev/sda /test the /proc/mounts will show: "/dev/sda /test ext4 rw,relatime,user_xattr,acl,barrier=1,data=writeback 0 0" which can fool users into thinking that the fs is using writeback mode. So don't set the writeback option when the journal has not been enabled; we don't depend on the writeback option being set, since ext4_should_writeback_data() in ext4_jbd2.h tests to see if the journal is not present before returning true. Reported-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-20ext4: fix possible use-after-free in ext4_remove_li_request()Lukas Czerner
We need to take reference to the s_li_request after we take a mutex, because it might be freed since then, hence result in accessing old already freed memory. Also we should protect the whole ext4_remove_li_request() because ext4_li_info might be in the process of being freed in ext4_lazyinit_thread(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2011-05-20ext4: fix the mount option "init_itable=n" to work as expected for n=0Lukas Czerner
For some reason, when we set the mount option "init_itable=0" it behaves as we would set init_itable=20 which is not right at all. Basically when we set it to zero we are saying to lazyinit thread not to wait between zeroing the inode table (except of cond_resched()) so this commit fixes that and removes the unnecessary condition. The 'n' should be also properly used on remount. When the n is not set at all, it means that the default miltiplier EXT4_DEF_LI_WAIT_MULT is set instead. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Eric Sandeen <sandeen@redhat.com>
2011-05-20ext4: Remove unnecessary wait_event ext4_run_lazyinit_thread()Lukas Czerner
For some reason we have been waiting for lazyinit thread to start in the ext4_run_lazyinit_thread() but it is not needed since it was jus unnecessary complexity, so get rid of it. We can also remove li_task and li_wait_task since it is not used anymore. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2011-05-20ext4: Use schedule_timeout_interruptible() for waiting in lazyinit threadLukas Czerner
In order to make lazyinit eat approx. 10% of io bandwidth at max, we are sleeping between zeroing each single inode table. For that purpose we are using timer which wakes up thread when it expires. It is set via add_timer() and this may cause troubles in the case that thread has been woken up earlier and in next iteration we call add_timer() on still running timer hence hitting BUG_ON in add_timer(). We could fix that by using mod_timer() instead however we can use schedule_timeout_interruptible() for waiting and hence simplifying things a lot. This commit exchange the old "waiting mechanism" with simple schedule_timeout_interruptible(), setting the time to sleep. Hence we do not longer need li_wait_daemon waiting queue and others, so get rid of it. Addresses-Red-Hat-Bugzilla: #699708 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2011-05-18ext4: wait for writeback to complete while making pages writableDarrick J. Wong
In order to stabilize pages during disk writes, ext4_page_mkwrite must wait for writeback operations to complete before making a page writable. Furthermore, the function must return locked pages, and recheck the writeback status if the page lock is ever dropped. The "someone could wander in" part of this patch was suggested by Chris Mason. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-18ext4: clean up some wait_on_page_writeback callsDarrick J. Wong
wait_on_page_writeback already checks the writeback bit, so callers of it needn't do that test. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-18ext4: don't warn about mnt_count if it has been disabledTao Ma
Currently, if we mkfs a new ext4 volume with s_max_mnt_count set to zero, and mount it for the first time, we will get the warning: maximal mount count reached, running e2fsck is recommended It is really misleading. So change the check so that it won't warn in that case. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-16ext4: ext4_ext_convert_to_initialized bug found in extended FSX testingAllison Henderson
This patch addresses bugs found while testing punch hole with the fsx test. The patch corrects the number of blocks that are zeroed out while splitting an extent, and also corrects the return value to return the number of blocks split out, instead of the number of blocks zeroed out. This patch has been tested in addition to the following patches: [Ext4 punch hole v7] [XFS Tests Punch Hole 1/1 v2] Add Punch Hole Testing to FSX The test ran successfully for 24 hours. Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-16ext4: fix oops in ext4_quota_off()Amir Goldstein
If quota is not enabled when ext4_quota_off() is called, we must not dereference quota file inode since it is NULL. Check properly for this. This fixes a bug in commit 21f976975cbe (ext4: remove unnecessary [cm]time update of quota file), which was merged for 2.6.39-rc3. Reported-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-15ext4: don't dereference null pointer when make_indexed_dir() failsAllison Henderson
Fix for a null pointer bug found while running punch hole tests Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: remove alloc_sempAmir Goldstein
After taking care of all group init races, all that remains is to remove alloc_semp from ext4_allocation_context and ext4_buddy structs. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: teach ext4_mb_init_cache() to skip uptodate buddy cachesAmir Goldstein
After online resize which adds new groups, some of the groups in a buddy page may be initialized and uptodate, while other (new ones) may be uninitialized. The indication for init of new block groups is when ext4_mb_init_cache() is called with an uptodate buddy page. In this case, initialized groups on that buddy page must be skipped when initializing the buddy cache. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: synchronize ext4_mb_init_group() with buddy page lockAmir Goldstein
The old routines ext4_mb_[get|put]_buddy_cache_lock(), which used to take grp->alloc_sem for all groups on the buddy page have been replaced with the routines ext4_mb_[get|put]_buddy_page_lock(). The new routines take both buddy and bitmap page locks to protect against concurrent init of groups on the same buddy page. The GROUP_NEED_INIT flag is tested again under page lock to check if the group was initialized by another caller. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: implement ext4_add_groupblocks() by freeing blocksAmir Goldstein
The old imlementation used to take grp->alloc_sem and set the GROUP_NEED_INIT flag, so that the buddy cache would be reloaded. The new implementation updates the buddy cache by freeing the added blocks and making them available for use, so there is no need to reload the buddy cache and there is no need to take grp->alloc_sem. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: remove unneeded ext4_journal_get_undo_accessTheodore Ts'o
The block allocation code used to use jbd2_journal_get_undo_access as a way to make changes that wouldn't show up until the commit took place. The new multi-block allocation code has a its own way of preventing newly freed blocks from getting reused until the commit takes place (it avoids updating the buddy bitmaps until the commit is done), so we don't need to use jbd2_journal_get_undo_access(), which has extra overhead compared to jbd2_journal_get_write_access(). There was one last vestigal use of ext4_journal_get_undo_access() in ext4_add_groupblocks(); change it to use ext4_journal_get_write_access() and then remove the ext4_journal_get_undo_access() support. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: move ext4_add_groupblocks() to mballoc.cAmir Goldstein
In preparation for the next patch, the function ext4_add_groupblocks() is moved to mballoc.c, where it could use some static functions. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: remove redundant #ifdef in super.cAmerigo Wang
There is already an #ifdef CONFIG_QUOTA some lines above, so this one is totally useless. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: remove redundant check for first_not_zeroed in ext4_register_li_requestTao Ma
We have checked first_not_zeroed == ngroups already above, so remove this redundant check. sbi->s_li_request = NULL above is also removed since it is NULL already. Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: use s_inodes_per_block directly in __ext4_get_inode_locTao Ma
In __ext4_get_inode_loc, we calculate inodes_per_block every time by EXT4_BLOCK_SIZE(sb) / EXT4_INODE_SIZE(sb). AFAICS, this function is a hot path for ext4, so we'd better use s_inodes_per_block directly instead of calculating every time. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09ext4: use EXT4FS_DEBUG instead of EXT4_DEBUG in fsync.cTao Ma
We have EXT4FS_DEBUG for some old debug and CONFIG_EXT4_DEBUG for the new mballoc debug, but there isn't any EXT4_DEBUG. As CONFIG_EXT4_DEBUG seems to be only used in mballoc, use EXT4FS_DEBUG in fsync.c. [ It doesn't really matter; although I'm including this commit for consistency's sake. The whole point of the #ifdef's is to disable the debugging code. In general you're not going to want to enable all of the code protected by EXT4FS_DEBUG at the same time. -- Ted ] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03ext4: reimplement convert and split_unwrittenYongqiang Yang
Reimplement ext4_ext_convert_to_initialized() and ext4_split_unwritten_extents() using ext4_split_extent() Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03ext4: add ext4_split_extent_at() and ext4_split_extent()Yongqiang Yang
Add two functions: ext4_split_extent_at(), which splits an extent into two extents at given logical block, and ext4_split_extent() which splits an extent into three extents. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03ext4: add a function merging extents right and leftYongqiang Yang
1) Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right(). 2) Add a new function ext4_ext_try_to_merge() which tries to merge an extent both left and right. 3) Use the new function in ext4_ext_convert_unwritten_endio() and ext4_ext_insert_extent(). Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03ext4: fix deadlock in ext4_symlink() in ENOSPC conditionsJan Kara
ext4_symlink() cannot call __page_symlink() with transaction open. __page_symlink() calls ext4_write_begin() which can wait for transaction commit if we are running out of space thus causing a deadlock. Also error recovery in ext4_truncate_failed_write() does not count with the transaction being already started (although I'm not aware of any particular deadlock here). Fix the problem by stopping a transaction before calling __page_symlink() (we have to be careful and put inode to orphan list so that it gets deleted in case of crash) and starting another one after __page_symlink() returns for addition of symlink into a directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03ext4: Fix fs corruption when make_indexed_dir() failsJan Kara
When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated block for index tree root, we did not properly mark all changed buffers dirty. This lead to only some of these buffers being written out and thus effectively corrupting the directory. Fix the issue by marking all changed data dirty even in the error failure case. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03ext4: set extents flag when migrating file to use extentsTheodore Ts'o
Fix a typo that was introduced in commit 07a038245b (in 2.6.36) which caused the extents flag not to be set at the conclusion of converting an inode to use extents. Reported-by: Peter Uchno <peter.uchno@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-01ext4: remove dead code in ext4_has_free_blocks()Shaohua Li
percpu_counter_sum_positive() never returns a negative value. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-30ext4: ignore errors when issuing discardsTheodore Ts'o
This is an effective revert of commit a30eec2a8: "ext4: stop issuing discards if not supported by device". The problem is that there are some devices that may return errors in response to a discard request some times but not others. (One example would be a hybrid dm device which concatenates an SSD and an HDD device). By this logic, I also removed the error checking from ext4's FITRIM code; so that an error from a discard will not stop the FITRIM from trying to trim the rest of the file system. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-30ext4: don't set PageUptodate in ext4_end_bio()Curt Wohlgemuth
In the bio completion routine, we should not be setting PageUptodate at all -- it's set at sys_write() time, and is unaffected by success/failure of the write to disk. This can cause a page corruption bug when the file system's block size is less than the architecture's VM page size. if we have only written a single block -- we might end up setting the page's PageUptodate flag, indicating that page is completely read into memory, which may not be true. This could cause subsequent reads to get bad data. This commit also takes the opportunity to clean up error handling in ext4_end_bio(), and remove some extraneous code: - fixes ext4_end_bio() to set AS_EIO in the page->mapping->flags on error, which was left out by mistake. This is needed so that fsync() will return an error if there was an I/O error. - remove the clear_buffer_dirty() call on unmapped buffers for each page. - consolidate page/buffer error handling in a single section. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Jim Meyering <jim@meyering.net> Reported-by: Hugh Dickins <hughd@google.com> Cc: Mingming Cao <cmm@us.ibm.com>
2011-04-18ext4: check for ext[23] file system features when mounting as ext[23]Theodore Ts'o
Provide better emulation for ext[23] mode by enforcing that the file system does not have any unsupported file system features as defined by ext[23] when emulating the ext[23] file system driver when CONFIG_EXT4_USE_FOR_EXT23 is defined. This causes the file system type information in /proc/mounts to be correct for the automatically mounted root file system. This also means that "mount -t ext2 /dev/sda /mnt" will fail if /dev/sda contains an ext3 or ext4 file system, just as one would expect if the original ext2 file system driver were in use. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-16ext4: release page cache in ext4_mb_load_buddy error pathYang Ruirui
Add missing page_cache_release in the error path of ext4_mb_load_buddy Signed-off-by: Yang Ruirui <ruirui.r.yang@tieto.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-04-11Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix data corruption regression by reverting commit 6de9843dab3f ext4: Allow indirect-block file to grow the file size to max file size ext4: allow an active handle to be started when freezing ext4: sync the directory inode in ext4_sync_parent() ext4: init timer earlier to avoid a kernel panic in __save_error_info jbd2: fix potential memory leak on transaction commit ext4: fix a double free in ext4_register_li_request ext4: fix credits computing for indirect mapped files ext4: remove unnecessary [cm]time update of quota file jbd2: move bdget out of critical section
2011-04-10ext4: fix data corruption regression by reverting commit 6de9843dab3fTheodore Ts'o
Revert commit 6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it caused a data corruption regression with BitTorrent downloads. Thanks to Damien for discovering and bisecting to find the problem commit. https://bugzilla.kernel.org/show_bug.cgi?id=32972 Reported-by: Damien Grassart <damien@grassart.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10ext4: Allow indirect-block file to grow the file size to max file sizeKazuya Mio
We can create 4402345721856 byte file with indirect block mapping. However, if we grow an indirect-block file to the size with ftruncate(), we can see an ext4 warning. The following patch fixes this problem. How to reproduce: # dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=4402345721856 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000221428 s, 0.0 kB/s # tail -n 1 /var/log/messages Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block 1074791436 > max in inode 12 Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10ext4: allow an active handle to be started when freezingYongqiang Yang
ext4_journal_start_sb() should not prevent an active handle from being started due to s_frozen. Otherwise, deadlock is easy to happen, below is a situation. ================================================ freeze | truncate ================================================ | ext4_ext_truncate() freeze_super() | starts a handle sets s_frozen | | ext4_ext_truncate() | holds i_data_sem ext4_freeze() | waits for updates | | ext4_free_blocks() | calls dquot_free_block() | | dquot_free_blocks() | calls ext4_dirty_inode() | | ext4_dirty_inode() | trys to start an active | handle | | block due to s_frozen ================================================ Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Amir Goldstein <amir73il@users.sf.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2011-04-10ext4: sync the directory inode in ext4_sync_parent()Curt Wohlgemuth
ext4 has taken the stance that, in the absence of a journal, when an fsync/fdatasync of an inode is done, the parent directory should be sync'ed if this inode entry is new. ext4_sync_parent(), which implements this, does indeed sync the dirent pages for parent directories, but it does not sync the directory *inode*. This patch fixes this. Also now return error status from ext4_sync_parent(). I tested this using a power fail test, which panics a machine running a file server getting requests from a client. Without this patch, on about every other test run, the server is missing many, many files that had been synced. With this patch, on > 6 runs, I see zero files being lost. Google-Bug-Id: 4179519 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>