| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
iomap: build the block based code conditionally Only build the block based iomap code if CONFIG_BLOCK is set. Currently that is always the case, but it will change soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-29-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> | 4 年前 | |
iomap: account for unaligned end offsets when truncating read range stable inclusion from stable-v6.6.120 commit 011e356fe41e2d72a68468fec00ca70dd9b61769 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=011e356fe41e2d72a68468fec00ca70dd9b61769 -------------------------------- [ Upstream commit 9d875e0eef8ec15b6b1da0cb9a0f8ed13efee89e ] The end position to start truncating from may be at an offset into a block, which under the current logic would result in overtruncation. Adjust the calculation to account for unaligned end offsets. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-3-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 011e356fe41e2d72a68468fec00ca70dd9b61769) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
iomap: fix submission side handling of completion side errors stable inclusion from stable-v6.6.128 commit 8f3d79abdec0164fd09a31a9f70069b5bdf4f2ec category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8f3d79abdec0164fd09a31a9f70069b5bdf4f2ec -------------------------------- commit 8f3d79abdec0164fd09a31a9f70069b5bdf4f2ec upstream. [ Upstream commit 4ad357e39b2ecd5da7bcc7e840ee24d179593cd5 ] The "if (dio->error)" in iomap_dio_bio_iter exists to stop submitting more bios when a completion already return an error. Commit cfe057f7db1f ("iomap_dio_actor(): fix iov_iter bugs") made it revert the iov by "copied", which is very wrong given that we've already consumed that range and submitted a bio for it. Fixes: cfe057f7db1f ("iomap_dio_actor(): fix iov_iter bugs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
fs: Move many prototypes to pagemap.h These functions are page cache functionality and don't need to be declared in fs.h. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> | 4 年前 | |
iomap: write iomap validity checks A recent multithreaded write data corruption has been uncovered in the iomap write code. The core of the problem is partial folio writes can be flushed to disk while a new racing write can map it and fill the rest of the page: writeback new write allocate blocks blocks are unwritten submit IO ..... map blocks iomap indicates UNWRITTEN range loop { lock folio copyin data ..... IO completes runs unwritten extent conv blocks are marked written <iomap now stale> get next folio } Now add memory pressure such that memory reclaim evicts the partially written folio that has already been written to disk. When the new write finally gets to the last partial page of the new write, it does not find it in cache, so it instantiates a new page, sees the iomap is unwritten, and zeros the part of the page that it does not have data from. This overwrites the data on disk that was originally written. The full description of the corruption mechanism can be found here: https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/ To solve this problem, we need to check whether the iomap is still valid after we lock each folio during the write. We have to do it after we lock the page so that we don't end up with state changes occurring while we wait for the folio to be locked. Hence we need a mechanism to be able to check that the cached iomap is still valid (similar to what we already do in buffered writeback), and we need a way for ->begin_write to back out and tell the high level iomap iterator that we need to remap the remaining write range. The iomap needs to grow some storage for the validity cookie that the filesystem provides to travel with the iomap. XFS, in particular, also needs to know some more information about what the iomap maps (attribute extents rather than file data extents) to for the validity cookie to cover all the types of iomaps we might need to validate. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> | 3 年前 | |
iomap: switch iomap_seek_data to use iomap_iter Rewrite iomap_seek_data to use iomap_iter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> | 4 年前 | |
mm/swap: consider max pages in iomap_swapfile_add_extent When the max pages (last_page in the swap header + 1) is smaller than the total pages (inode size) of the swapfile, iomap_swapfile_activate overwrites sis->max with total pages. However, frontswap_map is a swap page state bitmap allocated using the initial sis->max page count read from the swap header. If swapfile activation increases sis->max, it's possible for the frontswap code to walk off the end of the bitmap, thereby corrupting kernel memory. [djwong: modify the description a bit; the original paragraph reads: "However, frontswap_map is allocated using max pages. When test and clear the sis offset, which is larger than max pages, of frontswap_map in __frontswap_invalidate_page(), neighbors of frontswap_map may be overwritten, i.e., slab is polluted." Note also that this bug resulted in a behavioral change: activating a swap file that was formatted and later extended results in all pages being activated, not the number of pages recorded in the swap header.] This fixes the issue by considering the limitation of max pages of swap info in iomap_swapfile_add_extent(). To reproduce the case, compile kernel with slub RED ZONE, then run test: $ sudo stress-ng -a 1 -x softlockup,resources -t 72h --metrics --times \ --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \ --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \ --temp-path /root/tmpdir/stress-ng/ We'll get the error log as below: [ 1151.015141] ============================================================================= [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten [ 1151.017486] ----------------------------------------------------------------------------- [ 1151.017486] [ 1151.018997] Disabling lock debugging due to kernel taint [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816 [ 1151.022538] __slab_alloc+0xe/0x20 [ 1151.023069] __kmalloc_node+0xfd/0x4b0 [ 1151.023704] __do_sys_swapon+0xcf6/0x1170 [ 1151.024346] do_syscall_64+0x33/0x40 [ 1151.024925] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041 [ 1151.026889] kfree+0x276/0x2b0 [ 1151.027405] put_cred_rcu+0xa1/0xc0 [ 1151.027949] rcu_do_batch+0x17d/0x410 [ 1151.028566] rcu_core+0x14e/0x2b0 [ 1151.029084] __do_softirq+0x101/0x29e [ 1151.029645] asm_call_irq_on_stack+0x12/0x20 [ 1151.030381] do_softirq_own_stack+0x37/0x40 [ 1151.031037] do_softirq.part.15+0x2b/0x30 [ 1151.031710] __local_bh_enable_ip+0x4b/0x50 [ 1151.032412] copy_fpstate_to_sigframe+0x111/0x360 [ 1151.033197] __setup_rt_frame+0xce/0x480 [ 1151.033809] arch_do_signal+0x1a3/0x250 [ 1151.034463] exit_to_user_mode_prepare+0xcf/0x110 [ 1151.035242] syscall_exit_to_user_mode+0x27/0x190 [ 1151.035970] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201 [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000 [ 1151.038323] [ 1151.039683] Redzone 000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ [ 1151.041180] Object 000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 1151.042714] Redzone 0000000084e43932: 00 00 00 c0 cc cc cc cc ........ [ 1151.044120] Padding 000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G B 5.10.50+ #7 [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 1151.048633] Call Trace: [ 1151.049072] dump_stack+0x57/0x6a [ 1151.049585] check_bytes_and_report+0xed/0x110 [ 1151.050320] check_object+0x1eb/0x290 [ 1151.050924] ? __x64_sys_swapoff+0x39a/0x540 [ 1151.051646] free_debug_processing+0x151/0x350 [ 1151.052333] __slab_free+0x21a/0x3a0 [ 1151.052938] ? _cond_resched+0x2d/0x40 [ 1151.053529] ? __vunmap+0x1de/0x220 [ 1151.054139] ? __x64_sys_swapoff+0x39a/0x540 [ 1151.054796] ? kfree+0x276/0x2b0 [ 1151.055307] kfree+0x276/0x2b0 [ 1151.055832] __x64_sys_swapoff+0x39a/0x540 [ 1151.056466] do_syscall_64+0x33/0x40 [ 1151.057084] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1151.057866] RIP: 0033:0x150340b0ffb7 [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d. [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8 [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7 [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860 [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094 [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc [ 1151.066413] [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed Fixes: 67482129cdab ("iomap: add a swapfile activation function") Fixes: a45c0eccc564 ("iomap: move the swapfile code into a separate file") Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> | 4 年前 | |
iomap: Add DIO tracepoints Add trace_iomap_dio_rw_begin, trace_iomap_dio_rw_queued and trace_iomap_dio_complete tracepoint. trace_iomap_dio_rw_queued is mostly only to know that the request was queued and -EIOCBQUEUED was returned. It is mostly trace_iomap_dio_rw_begin & trace_iomap_dio_complete which has all the details. <example output log> a.out-2073 [006] 134.225717: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x0 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags DIO_FORCE_WAIT aio 1 a.out-2073 [006] 134.226234: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096 a.out-2074 [006] 136.225975: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 1 a.out-2074 [006] 136.226173: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 ksoftirqd/3-31 [003] 136.226389: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096 a.out-2075 [003] 141.674969: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags aio 1 a.out-2075 [003] 141.676085: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 kworker/2:0-27 [002] 141.676432: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096 a.out-2077 [006] 143.443746: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 1 a.out-2077 [006] 143.443866: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 ksoftirqd/5-41 [005] 143.444134: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096 a.out-2078 [007] 146.716833: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 0 a.out-2078 [007] 146.717639: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096 a.out-2079 [006] 148.972605: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 0 a.out-2079 [006] 148.973099: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: line up strings all prettylike] Signed-off-by: Darrick J. Wong <djwong@kernel.org> | 3 年前 | |
fs: iomap: Atomic write support mainline inclusion from mainline-v6.12-rc1 commit 9e0933c21c128d6d8ac4d8aae0babaf9a43100b8 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8862 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e0933c21c128d6d8ac4d8aae0babaf9a43100b8 -------------------------------- Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC flag set. Initially FSes (XFS) should only support writing a single FS block atomically. As with any atomic write, we should produce a single bio which covers the complete write length. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> [djwong: clarify a couple of things in the docs] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Conflicts: Documentation/filesystems/iomap/operations.rst Signed-off-by: Long Li <leo.lilong@huawei.com> | 1 个月前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 4 年前 | ||
| 2 个月前 | ||
| 25 天前 | ||
| 4 年前 | ||
| 3 年前 | ||
| 4 年前 | ||
| 4 年前 | ||
| 3 年前 | ||
| 1 个月前 |