| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
9p: fix cache/debug options printing in v9fs_show_options stable inclusion from stable-v6.6.120 commit f1fe47f592d3ff19bf203084b4b47aee70e3f08f category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f1fe47f592d3ff19bf203084b4b47aee70e3f08f -------------------------------- [ Upstream commit f0445613314f474c1a0ec6fa8a5cd153a618f1b6 ] commit 4eb3117888a92 changed the cache= option to accept either string shortcuts or bitfield values. It also changed /proc/mounts to emit the option as the hexadecimal numeric value rather than the shortcut string. However, by printing "cache=%x" without the leading 0x, shortcuts such as "cache=loose" will emit "cache=f" and 'f' is not a string that is parseable by kstrtoint(), so remounting may fail if a remount with "cache=f" is attempted. debug=%x has had the same problem since options have been displayed in c4fac9100456 ("9p: Implement show_options") Fix these by adding the 0x prefix to the hexadecimal value shown in /proc/mounts. Fixes: 4eb3117888a92 ("fs/9p: Rework cache modes and add new options to Documentation") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <54b93378-dcf1-4b04-922d-c8b4393da299@redhat.com> [Dominique: use %#x at Al Viro's suggestion, also handle debug] Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit f1fe47f592d3ff19bf203084b4b47aee70e3f08f) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
fs/adfs: validate nzones in adfs_validate_bblk() stable inclusion from stable-v6.6.141 commit a11372a8b1ceaa5e950a84b3b5fbf8228f25e277 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a11372a8b1ceaa5e950a84b3b5fbf8228f25e277 -------------------------------- commit a11372a8b1ceaa5e950a84b3b5fbf8228f25e277 upstream. [ Upstream commit dd9d3e16c2d5fa166e13dce07413be51f42c8f5d ] Reject ADFS disc records with a zero zone count during boot block validation, before the disc record is used. When nzones is 0, adfs_read_map() passes it to kmalloc_array(0, ...) which returns ZERO_SIZE_PTR, and adfs_map_layout() then writes to dm[-1], causing an out-of-bounds write before the allocated buffer. adfs_validate_dr0() already rejects nzones != 1 for old-format images. Add the equivalent check to adfs_validate_bblk() for new-format images so that a crafted image with nzones == 0 is rejected at probe time. Found by syzkaller. Fixes: f6f14a0d71b0 ("fs/adfs: map: move map-specific sb initialisation to map.c") Signed-off-by: Bae Yeonju <iwasbaeyz@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
affs: don't write overlarge OFS data block size fields stable inclusion from stable-v6.6.87 commit 54fd5a5b7583f1964a79e9efe02e30e97b61eb95 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID6BMS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=54fd5a5b7583f1964a79e9efe02e30e97b61eb95 -------------------------------- [ Upstream commit 011ea742a25a77bac3d995f457886a67d178c6f0 ] If a data sector on an OFS floppy contains a value > 0x1e8 (the largest amount of data that fits in the sector after its header), then an Amiga reading the file can return corrupt data, by taking the overlarge size at its word and reading past the end of the buffer it read the disk sector into! The cause: when affs_write_end_ofs() writes data to an OFS filesystem, the new size field for a data block was computed by adding the amount of data currently being written (into the block) to the existing value of the size field. This is correct if you're extending the file at the end, but if you seek backwards in the file and overwrite _existing_ data, it can lead to the size field being larger than the maximum legal value. This commit changes the calculation so that it sets the size field to the max of its previous size and the position within the block that we just wrote up to. Signed-off-by: Simon Tatham <anakin@pobox.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 54fd5a5b7583f1964a79e9efe02e30e97b61eb95) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 7 个月前 | |
afs: Fix potential null pointer dereference in afs_put_server stable inclusion from stable-v6.6.109 commit cab278cead49a547ac84c3e185f446f381303eae category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8521 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cab278cead49a547ac84c3e185f446f381303eae -------------------------------- commit 9158c6bb245113d4966df9b2ba602197a379412e upstream. afs_put_server() accessed server->debug_id before the NULL check, which could lead to a null pointer dereference. Move the debug_id assignment, ensuring we never dereference a NULL server pointer. Fixes: 2757a4dc1849 ("afs: Fix access after dec in put functions") Cc: stable@vger.kernel.org Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit cab278cead49a547ac84c3e185f446f381303eae) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
Merge tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull autofs fixes from Christian Brauner: "This fixes a memory leak in autofs reported by syzkaller and a missing conversion from uninterruptible to interruptible wake up when autofs is in catatonic mode" * tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: autofs: use wake_up() instead of wake_up_interruptible(() autofs: fix memory leak of waitqueues in autofs_catatonic_mode | 2 年前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
bfs: Reconstruct file type when loading from disk stable inclusion from stable-v6.6.120 commit a9f626396bfe66f49b743601e862767928237cc0 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a9f626396bfe66f49b743601e862767928237cc0 -------------------------------- [ Upstream commit 34ab4c75588c07cca12884f2bf6b0347c7a13872 ] syzbot is reporting that S_IFMT bits of inode->i_mode can become bogus when the S_IFMT bits of the 32bits "mode" field loaded from disk are corrupted or when the 32bits "attributes" field loaded from disk are corrupted. A documentation says that BFS uses only lower 9 bits of the "mode" field. But I can't find an explicit explanation that the unused upper 23 bits (especially, the S_IFMT bits) are initialized with 0. Therefore, ignore the S_IFMT bits of the "mode" field loaded from disk. Also, verify that the value of the "attributes" field loaded from disk is either BFS_VREG or BFS_VDIR (because BFS supports only regular files and the root directory). Reported-by: syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://patch.msgid.link/fabce673-d5b9-4038-8287-0fd65d80203b@I-love.SAKURA.ne.jp Reviewed-by: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit a9f626396bfe66f49b743601e862767928237cc0) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
| 24 天前 | ||
cachefiles: Fix the incorrect return value in cachefiles_ondemand_fd_write_iter() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICJHNT -------------------------------- In __cachefiles_write(), if the return value of the write operation > 0, it is set to 0. This makes it impossible to distinguish scenarios where a partial write has occurred, and causes the outer function cachefiles_ondemand_fd_write_iter() to unconditionally return the full length specified by user space. Fix it by modifying "ret" to reflect the actual number of bytes written. Furthermore, returning a value greater than 0 from __cachefiles_write() does not affect other call paths, such as fscache_write_to_cache() and fscache_write(). Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 11 个月前 | |
ceph: fix a buffer leak in __ceph_setxattr() stable inclusion from stable-v6.6.141 commit 4bfdcefdaa6092a06cacd59389c7756b36e6de8c category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4bfdcefdaa6092a06cacd59389c7756b36e6de8c -------------------------------- commit 5d3cc36b4e77a27ce7b686b7c59c7072bcb3fa8e upstream. The old_blob in __ceph_setxattr() can store ci->i_xattrs.prealloc_blob value during the retry. However, it is never called the ceph_buffer_put() for the old_blob object. This patch fixes the issue of the buffer leak. Cc: stable@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
fs: move file_start_write() into vfs_iter_write() mainline inclusion from mainline-v6.8-rc1 commit 269aed7014b3db9acdbc5a5e163d8a6c62e0e770 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=269aed7014b3db9acdbc5a5e163d8a6c62e0e770 -------------------------------- All the callers of vfs_iter_write() call file_start_write() just before calling vfs_iter_write() except for target_core_file's fd_do_rw(). Move file_start_write() from the callers into vfs_iter_write(). fd_do_rw() calls vfs_iter_write() with a non-regular file, so file_start_write() is a no-op. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-11-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> | 1 年前 | |
configfs: Do not override creating attribute file failure in populate_attrs() stable inclusion from stable-v6.6.95 commit 0df5e4c7de27dde3601e132632bb094afd464339 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8365 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0df5e4c7de27dde3601e132632bb094afd464339 -------------------------------- commit f830edbae247b89228c3e09294151b21e0dc849c upstream. populate_attrs() may override failure for creating attribute files by success for creating subsequent bin attribute files, and have wrong return value. Fix by creating bin attribute files under successfully creating attribute files. Fixes: 03607ace807b ("configfs: implement binary attributes") Cc: stable@vger.kernel.org Reviewed-by: Joel Becker <jlbec@evilplan.org> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20250507-fix_configfs-v3-2-fe2d96de8dc4@quicinc.com Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 0df5e4c7de27dde3601e132632bb094afd464339) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 4 个月前 | |
cramfs: Verify inode mode when loading from disk stable inclusion from stable-v6.6.113 commit ab0d0138803cd4b6179cb0d0725df6e52d35ab92 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8637 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ab0d0138803cd4b6179cb0d0725df6e52d35ab92 -------------------------------- [ Upstream commit 7f9d34b0a7cb93d678ee7207f0634dbf79e47fe5 ] The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/429b3ef1-13de-4310-9a8e-c2dc9a36234a@I-love.SAKURA.ne.jp Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit ab0d0138803cd4b6179cb0d0725df6e52d35ab92) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
fscrypt: Don't use problematic non-inline crypto engines mainline inclusion from mainline-v6.16-rc7 commit b41c1d8d07906786c60893980d52688f31d114a6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICQ8LW Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b41c1d8d07906786c60893980d52688f31d114a6 -------------------------------- Make fscrypt no longer use Crypto API drivers for non-inline crypto engines, even when the Crypto API prioritizes them over CPU-based code (which unfortunately it often does). These drivers tend to be really problematic, especially for fscrypt's workload. This commit has no effect on inline crypto engines, which are different and do work well. Specifically, exclude drivers that have CRYPTO_ALG_KERN_DRIVER_ONLY or CRYPTO_ALG_ALLOCATES_MEMORY set. (Later, CRYPTO_ALG_ASYNC should be excluded too. That's omitted for now to keep this commit backportable, since until recently some CPU-based code had CRYPTO_ALG_ASYNC set.) There are two major issues with these drivers: bugs and performance. First, these drivers tend to be buggy. They're fundamentally much more error-prone and harder to test than the CPU-based code. They often don't get tested before kernel releases, and even if they do, the crypto self-tests don't properly test these drivers. Released drivers have en/decrypted or hashed data incorrectly. These bugs cause issues for fscrypt users who often didn't even want to use these drivers, e.g.: - https://github.com/google/fscryptctl/issues/32 - https://github.com/google/fscryptctl/issues/9 - https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com These drivers have also similarly caused issues for dm-crypt users, including data corruption and deadlocks. Since Linux v5.10, dm-crypt has disabled most of them by excluding CRYPTO_ALG_ALLOCATES_MEMORY. Second, these drivers tend to be *much* slower than the CPU-based code. This may seem counterintuitive, but benchmarks clearly show it. There's a *lot* of overhead associated with going to a hardware driver, off the CPU, and back again. To prove this, I gathered as many systems with this type of crypto engine as I could, and I measured synchronous encryption of 4096-byte messages (which matches fscrypt's workload): Intel Emerald Rapids server: AES-256-XTS: xts-aes-vaes-avx512 16171 MB/s [CPU-based, Vector AES] qat_aes_xts 289 MB/s [Offload, Intel QuickAssist] Qualcomm SM8650 HDK: AES-256-XTS: xts-aes-ce 4301 MB/s [CPU-based, ARMv8 Crypto Extensions] xts-aes-qce 73 MB/s [Offload, Qualcomm Crypto Engine] i.MX 8M Nano LPDDR4 EVK: AES-256-XTS: xts-aes-ce 647 MB/s [CPU-based, ARMv8 Crypto Extensions] xts(ecb-aes-caam) 20 MB/s [Offload, CAAM] AES-128-CBC-ESSIV: essiv(cbc-aes-caam,sha256-lib) 23 MB/s [Offload, CAAM] STM32MP157F-DK2: AES-256-XTS: xts-aes-neonbs 13.2 MB/s [CPU-based, ARM NEON] xts(stm32-ecb-aes) 3.1 MB/s [Offload, STM32 crypto engine] AES-128-CBC-ESSIV: essiv(cbc-aes-neonbs,sha256-lib) 14.7 MB/s [CPU-based, ARM NEON] essiv(stm32-cbc-aes,sha256-lib) 3.2 MB/s [Offload, STM32 crypto engine] Adiantum: adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon) 52.8 MB/s [CPU-based, ARM scalar + NEON] So, there was no case in which the crypto engine was even *close* to being faster. On the first three, which have AES instructions in the CPU, the CPU was 30 to 55 times faster (!). Even on STM32MP157F-DK2 which has a Cortex-A7 CPU that doesn't have AES instructions, AES was over 4 times faster on the CPU. And Adiantum encryption, which is what actually should be used on CPUs like that, was over 17 times faster. Other justifications that have been given for these non-inline crypto engines (almost always coming from the hardware vendors, not actual users) don't seem very plausible either: - The crypto engine throughput could be improved by processing multiple requests concurrently. Currently irrelevant to fscrypt, since it doesn't do that. This would also be complex, and unhelpful in many cases. 2 of the 4 engines I tested even had only one queue. - Some of the engines, e.g. STM32, support hardware keys. Also currently irrelevant to fscrypt, since it doesn't support these. Interestingly, the STM32 driver itself doesn't support this either. - Free up CPU for other tasks and/or reduce energy usage. Not very plausible considering the "short" message length, driver overhead, and scheduling overhead. There's just very little time for the CPU to do something else like run another task or enter low-power state, before the message finishes and it's time to process the next one. - Some of these engines resist power analysis and electromagnetic attacks, while the CPU-based crypto generally does not. In theory, this sounds great. In practice, if this benefit requires the use of an off-CPU offload that massively regresses performance and has a low-quality, buggy driver, the price for this hardening (which is not relevant to most fscrypt users, and tends to be incomplete) is just too high. Inline crypto engines are much more promising here, as are on-CPU solutions like RISC-V High Assurance Cryptography. Fixes: b30ab0e03407 ("ext4 crypto: add ext4 encryption facilities") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704070322.20692-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Conflicts: Documentation/filesystems/fscrypt.rst fs/crypto/fscrypt_private.h fs/crypto/hkdf.c [Context changed] Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> | 10 个月前 | |
debugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str() stable inclusion from stable-v6.6.141 commit 6325eea40a959bfa44cdf00717930d4729fe8131 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6325eea40a959bfa44cdf00717930d4729fe8131 -------------------------------- commit 6325eea40a959bfa44cdf00717930d4729fe8131 upstream. [ Upstream commit 4afc929c0f74c4f22b055a82b371d50586da58ca ] The EXPORT_SYMBOL_GPL() for debugfs_create_str was placed incorrectly away from the function definition. Move it immediately below the debugfs_create_str() function where it belongs. Fixes: d60b59b96795 ("debugfs: Export debugfs_create_str symbol") Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260323085930.88894-3-hanguidong02@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ... | 2 年前 | |
dlm: validate length in dlm_search_rsb_tree mainline inclusion from mainline-v7.0-rc1 commit 080e5563f878c64e697b89e7439d730d0daad882 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14672 CVE: CVE-2026-43125 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=080e5563f878c64e697b89e7439d730d0daad882 -------------------------------- The len parameter in dlm_dump_rsb_name() is not validated and comes from network messages. When it exceeds DLM_RESNAME_MAXLEN, it can cause out-of-bounds write in dlm_search_rsb_tree(). Add length validation to prevent potential buffer overflow. Signed-off-by: Ezrak1e <ezrakiez@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Conflicts: fs/dlm/lock.c [Context conflicts due to commit 6c648035cbe7 ("dlm: switch to use rhashtable for rsbs") not merge.] Signed-off-by: Gu Bowen <gubowen5@huawei.com> | 25 天前 | |
fs: Create a generic is_dot_dotdot() utility stable inclusion from stable-v6.6.54 commit ef83620438d7b08cd66269c2262fcc2007cd5454 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAZ3K2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ef83620438d7b08cd66269c2262fcc2007cd5454 -------------------------------- commit 42c3732fa8073717dd7d924472f1c0bc5b452fdc upstream. De-duplicate the same functionality in several places by hoisting the is_dot_dotdot() utility function into linux/fs.h. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wen Zhiwei <wenzhiwei@kylinos.cn> | 1 年前 | |
efivarfs: fix error propagation in efivar_entry_get() mainline inclusion from mainline-v6.19-rc8 commit 4b22ec1685ce1fc0d862dcda3225d852fb107995 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13706 CVE: CVE-2026-23156 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4b22ec1685ce1fc0d862dcda3225d852fb107995 -------------------------------- efivar_entry_get() always returns success even if the underlying __efivar_entry_get() fails, masking errors. This may result in uninitialized heap memory being copied to userspace in the efivarfs_file_read() path. Fix it by returning the error from __efivar_entry_get(). Fixes: 2d82e6227ea1 ("efi: vars: Move efivar caching layer into efivarfs") Cc: <stable@vger.kernel.org> # v6.1+ Signed-off-by: Kohei Enju <kohei@enjuk.jp> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> | 3 个月前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
erofs: fix "BUG: Bad page state in z_erofs_do_read_page" stable inclusion from stable-v6.6.131 commit cfadf46a67b6810db4d4de83778341b28509a222 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cfadf46a67b6810db4d4de83778341b28509a222 -------------------------------- commit cfadf46a67b6810db4d4de83778341b28509a222 upstream. It's actually a stable-only issue from backporting 9e2f9d34dd12 ("erofs: handle overlapped pclusters out of crafted images properly") We missed to update oldpage after pcl->compressed_bvecs[nr].page is updated, so that the following cmpxchg() will fail; the original upstream commit doesn't behave like this due to new features and refactoring. This backport issue only impacts some specific crafted images and normal filesystems won't be impacted at all. Fixes: 1bf7e414cac3 ("erofs: handle overlapped pclusters out of crafted images properly") # 6.6.y Closes: https://syzkaller.appspot.com/bug?extid=b6353e35ae2bab997538 Reported-and-tested-by: syzbot+b6353e35ae2bab997538@syzkaller.appspotmail.com [1] [1] https://lore.kernel.org/r/69c3b299.a70a0220.234938.004b.GAE@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
exfat: fix remount failure in different process environments stable inclusion from stable-v6.6.120 commit 44c8dccb09a20567742605f0c71cac78b9c07836 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=44c8dccb09a20567742605f0c71cac78b9c07836 -------------------------------- [ Upstream commit 51fc7b4ce10ccab8ea5e4876bcdc42cf5202a0ef ] The kernel test robot reported that the exFAT remount operation failed. The reason for the failure was that the process's umask is different between mount and remount, causing fs_fmask and fs_dmask are changed. Potentially, both gid and uid may also be changed. Therefore, when initializing fs_context for remount, inherit these mount options from the options used during mount. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 44c8dccb09a20567742605f0c71cac78b9c07836) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
exportfs: remove kernel-doc warnings in exportfs Remove kernel-doc warning in exportfs: fs/exportfs/expfs.c:395: warning: Function parameter or member 'parent' not described in 'exportfs_encode_inode_fh' Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> | 2 年前 | |
ext2: reject inodes with zero i_nlink and valid mode in ext2_iget() stable inclusion from stable-v6.6.140 commit 32e0b925572686399243834ec99e2a9d85c62eae category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=32e0b925572686399243834ec99e2a9d85c62eae -------------------------------- commit 25947cc5b2374cd5bf627fe3141496444260d04f upstream. ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is zero or i_dtime is set, treating them as deleted. However, the case of i_nlink == 0 with a non-zero mode and zero dtime slips through. Since ext2 has no orphan list, such a combination can only result from filesystem corruption - a legitimate inode deletion always sets either i_dtime or clears i_mode before freeing the inode. A crafted image can exploit this gap to present such an inode to the VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via ext2_unlink(), ext2_rename() and ext2_rmdir(): WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336 CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1 Call Trace: <TASK> inode_dec_link_count include/linux/fs.h:2518 [inline] ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295 vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477 do_unlinkat+0x53e/0x730 fs/namei.c:4541 __x64_sys_unlink+0xc6/0x110 fs/namei.c:4587 do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336 CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1 Call Trace: <TASK> inode_dec_link_count include/linux/fs.h:2518 [inline] ext2_rename+0x35e/0x850 fs/ext2/namei.c:374 vfs_rename+0xf2f/0x2060 fs/namei.c:5021 do_renameat2+0xbe2/0xd50 fs/namei.c:5178 __x64_sys_rename+0x7e/0xa0 fs/namei.c:5223 do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336 CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1 Call Trace: <TASK> inode_dec_link_count include/linux/fs.h:2518 [inline] ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311 vfs_rmdir+0x204/0x690 fs/namei.c:4348 do_rmdir+0x372/0x3e0 fs/namei.c:4407 __x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577 do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Extend the existing i_nlink == 0 check to also catch this case, reporting the corruption via ext2_error() and returning -EFSCORRUPTED. This rejects the inode at load time and prevents it from reaching any of the namei.c paths. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://patch.msgid.link/20260404152011.2590197-1-kovalev@altlinux.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
ext4: handle wraparound when searching for blocks for indirect mapped blocks mainline inclusion from mainline-v7.0-rc4 commit bb81702370fad22c06ca12b6e1648754dbc37e0f category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15219 CVE: CVE-2026-45899 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bb81702370fad22c06ca12b6e1648754dbc37e0f -------------------------------- Commit 4865c768b563 ("ext4: always allocate blocks only from groups inode can use") restricts what blocks will be allocated for indirect block based files to block numbers that fit within 32-bit block numbers. However, when using a review bot running on the latest Gemini LLM to check this commit when backporting into an LTS based kernel, it raised this concern: If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal group was populated via stream allocation from s_mb_last_groups), then start will be >= ngroups. Does this allow allocating blocks beyond the 32-bit limit for indirect block mapped files? The commit message mentions that ext4_mb_scan_groups_linear() takes care to not select unsupported groups. However, its loop uses group = *start, and the very first iteration will call ext4_mb_scan_group() with this unsupported group because next_linear_group() is only called at the end of the iteration. After reviewing the code paths involved and considering the LLM review, I determined that this can happen when there is a file system where some files/directories are extent-mapped and others are indirect-block mapped. To address this, add a safety clamp in ext4_mb_scan_groups(). Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use") Cc: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20260326045834.1175822-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Long Li <leo.lilong@huawei.com> | 5 天前 | |
f2fs: fix false alarm of lockdep on cp_global_sem lock stable inclusion from stable-v6.6.141 commit 8358a142f2a1876f929ef1da25c0cedaf59b4caa category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8358a142f2a1876f929ef1da25c0cedaf59b4caa -------------------------------- commit 8358a142f2a1876f929ef1da25c0cedaf59b4caa upstream. [ Upstream commit 6a5e3de9c2bb0b691d16789a5d19e9276a09b308 ] lockdep reported a potential deadlock: a) TCMU device removal context: - call del_gendisk() to get q->q_usage_counter - call start_flush_work() to get work_completion of wb->dwork b) f2fs writeback context: - in wb_workfn(), which holds work_completion of wb->dwork - call f2fs_balance_fs() to get sbi->gc_lock c) f2fs vfs_write context: - call f2fs_gc() to get sbi->gc_lock - call f2fs_write_checkpoint() to get sbi->cp_global_sem d) f2fs mount context: - call recover_fsync_data() to get sbi->cp_global_sem - call f2fs_check_and_fix_write_pointer() to call blkdev_report_zones() that goes down to blk_mq_alloc_request and get q->q_usage_counter Original callstack is in Closes tag. However, I think this is a false alarm due to before mount returns successfully (context d), we can not access file therein via vfs_write (context c). Let's introduce per-sb cp_global_sem_key, and assign the key for cp_global_sem, so that lockdep can recognize cp_global_sem from different super block correctly. A lot of work are done by Shin'ichiro Kawasaki, thanks a lot for the work. Fixes: c426d99127b1 ("f2fs: Check write pointer consistency of open zones") Cc: stable@kernel.org Reported-and-tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/20260218125237.3340441-1-shinichiro.kawasaki@wdc.com Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ re-anchored lockdep_register_key after init_f2fs_rwsem and placed lockdep_unregister_key before kfree(sbi) in f2fs_put_super instead of kill_f2fs_super ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
fat: avoid parent link count underflow in rmdir mainline inclusion from mainline-v7.0-rc1 commit 8cafcb881364af5ef3a8b9fed4db254054033d8a category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13898 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8cafcb881364af5ef3a8b9fed4db254054033d8a -------------------------------- Corrupted FAT images can leave a directory inode with an incorrect i_nlink (e.g. 2 even though subdirectories exist). rmdir then unconditionally calls drop_nlink(dir) and can drive i_nlink to 0, triggering the WARN_ON in drop_nlink(). Add a sanity check in vfat_rmdir() and msdos_rmdir(): only drop the parent link count when it is at least 3, otherwise report a filesystem error. Link: https://lkml.kernel.org/r/20260101111148.1437-1-zhiyuzhang999@gmail.com Fixes: 9a53c3a783c2 ("[PATCH] r/o bind mounts: unlink: monitor i_nlink") Signed-off-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Reported-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/aVN06OKsKxZe6-Kv@casper.infradead.org/T/#t Tested-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> | 3 个月前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
fscache: clean up for fscache_clear_volume_priv hulk inclusion category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- In the previous patch, we introduced mutex_lock to prevent concurrency between cachefiles_free_volume() and cachefiles_withdraw_volume(). Now in fscache_clear_volume_priv(), there is no need to increase fscache_volume->n_accesses to prevent concurrency. Remove the related code. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 1 年前 | |
| 1 天前 | ||
gfs2: prevent NULL pointer dereference during unmount stable inclusion from stable-v6.6.141 commit 233a0945a4b1dbe3f38c30afb7d05b76c67f1193 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=233a0945a4b1dbe3f38c30afb7d05b76c67f1193 -------------------------------- commit 233a0945a4b1dbe3f38c30afb7d05b76c67f1193 upstream. [ Upstream commit 74b4dbb946060a3233604d91859a9abd3708141d ] When flushing out outstanding glock work during an unmount, gfs2_log_flush() can be called when sdp->sd_jdesc has already been deallocated and sdp->sd_jdesc is NULL. Commit 35264909e9d1 ("gfs2: Fix NULL pointer dereference in gfs2_log_flush") added a check for that to gfs2_log_flush() itself, but it missed the sdp->sd_jdesc dereference in gfs2_log_release(). Fix that. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202604071139.HNJiCaAi-lkp@intel.com/ Fixes: 35264909e9d1 ("gfs2: Fix NULL pointer dereference in gfs2_log_flush") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits() stable inclusion from stable-v6.6.115 commit cfafefcb0e1fc60135f7040f4aed0a4aef4f76ca category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8637 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cfafefcb0e1fc60135f7040f4aed0a4aef4f76ca -------------------------------- [ Upstream commit 2048ec5b98dbdfe0b929d2e42dc7a54c389c53dd ] The syzbot reported issue in hfs_find_set_zero_bits(): ===================================================== BUG: KMSAN: uninit-value in hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45 hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45 hfs_vbm_search_free+0x13c/0x5b0 fs/hfs/bitmap.c:151 hfs_extend_file+0x6a5/0x1b00 fs/hfs/extent.c:408 hfs_get_block+0x435/0x1150 fs/hfs/extent.c:353 __block_write_begin_int+0xa76/0x3030 fs/buffer.c:2151 block_write_begin fs/buffer.c:2262 [inline] cont_write_begin+0x10e1/0x1bc0 fs/buffer.c:2601 hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52 cont_expand_zero fs/buffer.c:2528 [inline] cont_write_begin+0x35a/0x1bc0 fs/buffer.c:2591 hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52 hfs_file_truncate+0x1d6/0xe60 fs/hfs/extent.c:494 hfs_inode_setattr+0x964/0xaa0 fs/hfs/inode.c:654 notify_change+0x1993/0x1aa0 fs/attr.c:552 do_truncate+0x28f/0x310 fs/open.c:68 do_ftruncate+0x698/0x730 fs/open.c:195 do_sys_ftruncate fs/open.c:210 [inline] __do_sys_ftruncate fs/open.c:215 [inline] __se_sys_ftruncate fs/open.c:213 [inline] __x64_sys_ftruncate+0x11b/0x250 fs/open.c:213 x64_sys_call+0xfe3/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:78 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:4154 [inline] slab_alloc_node mm/slub.c:4197 [inline] __kmalloc_cache_noprof+0x7f7/0xed0 mm/slub.c:4354 kmalloc_noprof include/linux/slab.h:905 [inline] hfs_mdb_get+0x1cc8/0x2a90 fs/hfs/mdb.c:175 hfs_fill_super+0x3d0/0xb80 fs/hfs/super.c:337 get_tree_bdev_flags+0x6e3/0x920 fs/super.c:1681 get_tree_bdev+0x38/0x50 fs/super.c:1704 hfs_get_tree+0x35/0x40 fs/hfs/super.c:388 vfs_get_tree+0xb0/0x5c0 fs/super.c:1804 do_new_mount+0x738/0x1610 fs/namespace.c:3902 path_mount+0x6db/0x1e90 fs/namespace.c:4226 do_mount fs/namespace.c:4239 [inline] __do_sys_mount fs/namespace.c:4450 [inline] __se_sys_mount+0x6eb/0x7d0 fs/namespace.c:4427 __x64_sys_mount+0xe4/0x150 fs/namespace.c:4427 x64_sys_call+0xfa7/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:166 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f CPU: 1 UID: 0 PID: 12609 Comm: syz.1.2692 Not tainted 6.16.0-syzkaller #0 PREEMPT(none) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 ===================================================== The HFS_SB(sb)->bitmap buffer is allocated in hfs_mdb_get(): HFS_SB(sb)->bitmap = kmalloc(8192, GFP_KERNEL); Finally, it can trigger the reported issue because kmalloc() doesn't clear the allocated memory. If allocated memory contains only zeros, then everything will work pretty fine. But if the allocated memory contains the "garbage", then it can affect the bitmap operations and it triggers the reported issue. This patch simply exchanges the kmalloc() on kzalloc() with the goal to guarantee the correctness of bitmap operations. Because, newly created allocation bitmap should have all available blocks free. Potentially, initialization bitmap's read operation could not fill the whole allocated memory and "garbage" in the not initialized memory will be the reason of volume coruptions and file system driver bugs. Reported-by: syzbot <syzbot+773fa9d79b29bd8b6831@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=773fa9d79b29bd8b6831 Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250820230636.179085-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit cfafefcb0e1fc60135f7040f4aed0a4aef4f76ca) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
hfsplus: fix held lock freed on hfsplus_fill_super() stable inclusion from stable-v6.6.140 commit 3ca80e3012c8be85b4f8d0d20eac8d3b17ff257e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3ca80e3012c8be85b4f8d0d20eac8d3b17ff257e -------------------------------- commit 3ca80e3012c8be85b4f8d0d20eac8d3b17ff257e upstream. [ Upstream commit 90c500e4fd83fa33c09bc7ee23b6d9cc487ac733 ] hfsplus_fill_super() calls hfs_find_init() to initialize a search structure, which acquires tree->tree_lock. If the subsequent call to hfsplus_cat_build_key() fails, the function jumps to the out_put_root error label without releasing the lock. The later cleanup path then frees the tree data structure with the lock still held, triggering a held lock freed warning. Fix this by adding the missing hfs_find_exit(&fd) call before jumping to the out_put_root error label. This ensures that tree->tree_lock is properly released on the error path. The bug was originally detected on v6.13-rc1 using an experimental static analysis tool we are developing, and we have verified that the issue persists in the latest mainline kernel. The tool is specifically designed to detect memory management issues. It is currently under active development and not yet publicly available. We confirmed the bug by runtime testing under QEMU with x86_64 defconfig, lockdep enabled, and CONFIG_HFSPLUS_FS=y. To trigger the error path, we used GDB to dynamically shrink the max_unistr_len parameter to 1 before hfsplus_asc2uni() is called. This forces hfsplus_asc2uni() to naturally return -ENAMETOOLONG, which propagates to hfsplus_cat_build_key() and exercises the faulty error path. The following warning was observed during mount: ========================= WARNING: held lock freed! 7.0.0-rc3-00016-gb4f0dd314b39 #4 Not tainted ------------------------- mount/174 is freeing memory ffff888103f92000-ffff888103f92fff, with a lock still held there! ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0 2 locks held by mount/174: #0: ffff888103f960e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.constprop.0+0x167/0xa40 #1: ffff888103f920b0 (&tree->tree_lock){+.+.}-{4:4}, at: hfsplus_find_init+0x154/0x1e0 stack backtrace: CPU: 2 UID: 0 PID: 174 Comm: mount Not tainted 7.0.0-rc3-00016-gb4f0dd314b39 #4 PREEMPT(lazy) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x82/0xd0 debug_check_no_locks_freed+0x13a/0x180 kfree+0x16b/0x510 ? hfsplus_fill_super+0xcb4/0x18a0 hfsplus_fill_super+0xcb4/0x18a0 ? __pfx_hfsplus_fill_super+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? bdev_open+0x65f/0xc30 ? srso_return_thunk+0x5/0x5f ? pointer+0x4ce/0xbf0 ? trace_contention_end+0x11c/0x150 ? __pfx_pointer+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? bdev_open+0x79b/0xc30 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? vsnprintf+0x6da/0x1270 ? srso_return_thunk+0x5/0x5f ? __mutex_unlock_slowpath+0x157/0x740 ? __pfx_vsnprintf+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? mark_held_locks+0x49/0x80 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? irqentry_exit+0x17b/0x5e0 ? trace_irq_disable.constprop.0+0x116/0x150 ? __pfx_hfsplus_fill_super+0x10/0x10 ? __pfx_hfsplus_fill_super+0x10/0x10 get_tree_bdev_flags+0x302/0x580 ? __pfx_get_tree_bdev_flags+0x10/0x10 ? vfs_parse_fs_qstr+0x129/0x1a0 ? __pfx_vfs_parse_fs_qstr+0x3/0x10 vfs_get_tree+0x89/0x320 fc_mount+0x10/0x1d0 path_mount+0x5c5/0x21c0 ? __pfx_path_mount+0x10/0x10 ? trace_irq_enable.constprop.0+0x116/0x150 ? trace_irq_enable.constprop.0+0x116/0x150 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? kmem_cache_free+0x307/0x540 ? user_path_at+0x51/0x60 ? __x64_sys_mount+0x212/0x280 ? srso_return_thunk+0x5/0x5f __x64_sys_mount+0x212/0x280 ? __pfx___x64_sys_mount+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? trace_irq_enable.constprop.0+0x116/0x150 ? srso_return_thunk+0x5/0x5f do_syscall_64+0x111/0x680 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ffacad55eae Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 8 RSP: 002b:00007fff1ab55718 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffacad55eae RDX: 000055740c64e5b0 RSI: 000055740c64e630 RDI: 000055740c651ab0 RBP: 000055740c64e380 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000055740c64e5b0 R14: 000055740c651ab0 R15: 000055740c64e380 </TASK> After applying this patch, the warning no longer appears. Fixes: 89ac9b4d3d1a ("hfsplus: fix longname handling") CC: stable@vger.kernel.org Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Tested-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
um: hostfs: avoid issues on inode number reuse by host stable inclusion from stable-v6.6.87 commit 4ee8160c47e0c0413300df2f7d1423ea275d0477 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID6BMS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4ee8160c47e0c0413300df2f7d1423ea275d0477 -------------------------------- [ Upstream commit 0bc754d1e31f40f4a343b692096d9e092ccc0370 ] Some file systems (e.g. ext4) may reuse inode numbers once the inode is not in use anymore. Usually hostfs will keep an FD open for each inode, but this is not always the case. In the case of sockets, this cannot even be done properly. As such, the following sequence of events was possible: * application creates and deletes a socket * hostfs creates/deletes the socket on the host * inode is still in the hostfs cache * hostfs creates a new file * ext4 on the outside reuses the inode number * hostfs finds the socket inode for the newly created file * application receives -ENXIO when opening the file As mentioned, this can only happen if the deleted file is a special file that is never opened on the host (i.e. no .open fop). As such, to prevent issues, it is sufficient to check that the inode has the expected type. That said, also add a check for the inode birth time, just to be on the safe side. Fixes: 74ce793bcbde ("hostfs: Fix ephemeral inodes") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Mickaël Salaün <mic@digikod.net> Tested-by: Mickaël Salaün <mic@digikod.net> Link: https://patch.msgid.link/20250214092822.1241575-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 4ee8160c47e0c0413300df2f7d1423ea275d0477) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 7 个月前 | |
fs/hpfs: Fix error code for new_inode() failure in mkdir/create/mknod/symlink stable inclusion from stable-v6.6.117 commit c34d6dd9ab35516f1419aaaa390fa5ffb24bffbf category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8763 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c34d6dd9ab35516f1419aaaa390fa5ffb24bffbf -------------------------------- [ Upstream commit 32058c38d3b79a28963a59ac0353644dc24775cd ] The function call new_inode() is a primitive for allocating an inode in memory, rather than planning disk space for it. Therefore, -ENOMEM should be returned as the error code rather than -ENOSPC. To be specific, new_inode()'s call path looks like this: new_inode new_inode_pseudo alloc_inode ops->alloc_inode (hpfs_alloc_inode) alloc_inode_sb kmem_cache_alloc_lru Therefore, the failure of new_inode() indicates a memory presure issue (-ENOMEM), not a lack of disk space. However, the current implementation of hpfs_mkdir/create/mknod/symlink incorrectly returns -ENOSPC when new_inode() fails. This patch fix this by set err to -ENOMEM before the goto statement. BTW, we also noticed that other nested calls within these four functions, like hpfs_alloc_f/dnode and hpfs_add_dirent, might also fail due to memory presure. But similarly, only -ENOSPC is returned. Addressing these will involve code modifications in other functions, and we plan to submit dedicated patches for these issues in the future. For this patch, we focus on new_inode(). Signed-off-by: Yikang Yue <yikangy2@illinois.edu> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit c34d6dd9ab35516f1419aaaa390fa5ffb24bffbf) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
mm: update memfd seal write check to include F_SEAL_WRITE stable inclusion from stable-v6.6.103 commit 87a75f68eaba1a2edcd415e6a21acafe96487dcc category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8365 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=87a75f68eaba1a2edcd415e6a21acafe96487dcc -------------------------------- From: Lorenzo Stoakes <lstoakes@gmail.com> [ Upstream commit 28464bbb2ddc199433383994bcb9600c8034afa1 ] The seal_check_future_write() function is called by shmem_mmap() or hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd sealed this way. The F_SEAL_WRITE flag is not checked here, as that is handled via the mapping->i_mmap_writable mechanism and so any attempt at a mapping would fail before this could be run. However we intend to change this, meaning this check can be performed for F_SEAL_WRITE mappings also. The logic here is equally applicable to both flags, so update this function to accommodate both and rename it accordingly. Link: https://lkml.kernel.org/r/913628168ce6cce77df7d13a63970bae06a526e0.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 87a75f68eaba1a2edcd415e6a21acafe96487dcc) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 4 个月前 | |
iomap: fix submission side handling of completion side errors stable inclusion from stable-v6.6.128 commit 8f3d79abdec0164fd09a31a9f70069b5bdf4f2ec category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8f3d79abdec0164fd09a31a9f70069b5bdf4f2ec -------------------------------- commit 8f3d79abdec0164fd09a31a9f70069b5bdf4f2ec upstream. [ Upstream commit 4ad357e39b2ecd5da7bcc7e840ee24d179593cd5 ] The "if (dio->error)" in iomap_dio_bio_iter exists to stop submitting more bios when a completion already return an error. Commit cfe057f7db1f ("iomap_dio_actor(): fix iov_iter bugs") made it revert the iov by "copied", which is very wrong given that we've already consumed that range and submitted a bio for it. Fixes: cfe057f7db1f ("iomap_dio_actor(): fix iov_iter bugs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
isofs: validate block number from NFS file handle in isofs_export_iget stable inclusion from stable-v6.6.140 commit bb0988ed4f2e26d59bbb58f644cb3a55b7521e21 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bb0988ed4f2e26d59bbb58f644cb3a55b7521e21 -------------------------------- commit 24376458138387fb251e782e624c7776e9826796 upstream. isofs_fh_to_dentry() and isofs_fh_to_parent() pass an attacker- controlled block number (ifid->block or ifid->parent_block) from the NFS file handle to isofs_export_iget(), which only rejects block == 0 before calling isofs_iget() and ultimately sb_bread(). A crafted file handle with fh_len sufficient to pass the check added by commit 0405d4b63d08 ("isofs: Prevent the use of too small fid") can still drive the server to read any in-range block on the backing device as if it were an iso_directory_record. That earlier fix was assigned CVE-2025-37780. sb_bread() on an out-of-range block returns NULL cleanly via the EIO path, so there is no memory-safety violation. For in-range reads of adjacent-partition data on the same block device, the unrelated bytes end up in iso_inode_info fields that reach the NFS client as dentry metadata. The deployment surface (isofs exported over NFS from loop-mounted images) is narrow and requires an authenticated NFS peer, but the malformed-file-handle class is reportable as hardening next to the existing CVE-2025-37780 fix. Reject block >= ISOFS_SB(sb)->s_nzones in isofs_export_iget() so the check covers both isofs_fh_to_dentry() and isofs_fh_to_parent() call sites with a single line. Fixes: 0405d4b63d08 ("isofs: Prevent the use of too small fid") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260419212155.2169382-3-michael.bommarito@gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
jbd2: gracefully abort on checkpointing state corruptions stable inclusion from stable-v6.6.131 commit d536a00f1b4512fc99098404a054eb6dd27f80cc category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d536a00f1b4512fc99098404a054eb6dd27f80cc -------------------------------- commit bac3190a8e79beff6ed221975e0c9b1b5f2a21da upstream. This patch targets two internal state machine invariants in checkpoint.c residing inside functions that natively return integer error codes. - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE and a graceful journal abort, returning -EFSCORRUPTED. - In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for an unexpected buffer_jwrite state. If the warning triggers, we explicitly drop the just-taken get_bh() reference and call __flush_batch() to safely clean up any previously queued buffers in the j_chkpt_bhs array, preventing a memory leak before returning -EFSCORRUPTED. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260311041548.159424-1-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
jffs2: check jffs2_prealloc_raw_node_refs() result in few other places stable inclusion from stable-v6.6.95 commit cd42ddddd70abc7127c12b96c8c85dbd080ea56f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICLHX6 CVE: CVE-2025-38328 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cd42ddddd70abc7127c12b96c8c85dbd080ea56f -------------------------------- commit 2b6d96503255a3ed676cd70f8368870c6d6a25c6 upstream. Fuzzing hit another invalid pointer dereference due to the lack of checking whether jffs2_prealloc_raw_node_refs() completed successfully. Subsequent logic implies that the node refs have been allocated. Handle that. The code is ready for propagating the error upwards. KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 PID: 5835 Comm: syz-executor145 Not tainted 5.10.234-syzkaller #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:jffs2_link_node_ref+0xac/0x690 fs/jffs2/nodelist.c:600 Call Trace: jffs2_mark_erased_block fs/jffs2/erase.c:460 [inline] jffs2_erase_pending_blocks+0x688/0x1860 fs/jffs2/erase.c:118 jffs2_garbage_collect_pass+0x638/0x1a00 fs/jffs2/gc.c:253 jffs2_reserve_space+0x3f4/0xad0 fs/jffs2/nodemgmt.c:167 jffs2_write_inode_range+0x246/0xb50 fs/jffs2/write.c:362 jffs2_write_end+0x712/0x1110 fs/jffs2/file.c:302 generic_perform_write+0x2c2/0x500 mm/filemap.c:3347 __generic_file_write_iter+0x252/0x610 mm/filemap.c:3465 generic_file_write_iter+0xdb/0x230 mm/filemap.c:3497 call_write_iter include/linux/fs.h:2039 [inline] do_iter_readv_writev+0x46d/0x750 fs/read_write.c:740 do_iter_write+0x18c/0x710 fs/read_write.c:866 vfs_writev+0x1db/0x6a0 fs/read_write.c:939 do_pwritev fs/read_write.c:1036 [inline] __do_sys_pwritev fs/read_write.c:1083 [inline] __se_sys_pwritev fs/read_write.c:1078 [inline] __x64_sys_pwritev+0x235/0x310 fs/read_write.c:1078 do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 2f785402f39b ("[JFFS2] Reduce visibility of raw_node_ref to upper layers of JFFS2 code.") Fixes: f560928baa60 ("[JFFS2] Allocate node_ref for wasted space when skipping to page boundary") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Qingshuang Fu <fuqingshuang@kylinos.cn> | 7 个月前 | |
jfs: nlink overflow in jfs_rename stable inclusion from stable-v6.6.128 commit f70fcbc2ac7c24f087a2c895c5753aa730b1e479 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f70fcbc2ac7c24f087a2c895c5753aa730b1e479 -------------------------------- commit f70fcbc2ac7c24f087a2c895c5753aa730b1e479 upstream. [ Upstream commit 9218dc26fd922b09858ecd3666ed57dfd8098da8 ] If nlink is maximal for a directory (-1) and inside that directory you perform a rename for some child directory (not moving from the parent), then the nlink of the first directory is first incremented and later decremented. Normally this is fine, but when nlink = -1 this causes a wrap around to 0, and then drop_nlink issues a warning. After applying the patch syzbot no longer issues any warnings. I also ran some basic fs tests to look for any regressions. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Reported-by: syzbot+9131ddfd7870623b719f@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=9131ddfd7870623b719f Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
kernfs: Move dput() outside of the RCU section. mainline inclusion from mainline-v6.15-rc1 commit c5020c5be9d266f66fa5ba3286f0e8d2d2265970 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/8967 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c5020c5be9d266f66fa5ba3286f0e8d2d2265970 --------------------------- Al Viro pointed out that dput() might sleep and must not be invoked within an RCU section. Keep only find_next_ancestor() winthin the RCU section. Correct the wording in the comment. Fixes: 6ef5b6fae3040 ("kernfs: Drop kernfs_rwsem while invoking lookup_positive_unlocked().") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250221084232.xksA_IQ4@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Zeng <jason.zeng@intel.com> | 1 个月前 | |
lockd: fix vfs_test_lock() calls stable inclusion from stable-v6.6.120 commit 46b9fd1433d2674e36cd78dcb8e8765106330d38 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b9fd1433d2674e36cd78dcb8e8765106330d38 -------------------------------- [ Upstream commit a49a2a1baa0c553c3548a1c414b6a3c005a8deba ] Usage of vfs_test_lock() is somewhat confused. Documentation suggests it is given a "lock" but this is not the case. It is given a struct file_lock which contains some details of the sort of lock it should be looking for. In particular passing a "file_lock" containing fl_lmops or fl_ops is meaningless and possibly confusing. This is particularly problematic in lockd. nlmsvc_testlock() receives an initialised "file_lock" from xdr-decode, including manager ops and an owner. It then mistakenly passes this to vfs_test_lock() which might replace the owner and the ops. This can lead to confusion when freeing the lock. The primary role of the 'struct file_lock' passed to vfs_test_lock() is to report a conflicting lock that was found, so it makes more sense for nlmsvc_testlock() to pass "conflock", which it uses for returning the conflicting lock. With this change, freeing of the lock is not confused and code in __nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified. Documentation for vfs_test_lock() is improved to reflect its real purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the future. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com Signed-off-by: NeilBrown <neil@brown.name> Fixes: 20fa19027286 ("nfs: add export operations") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ adapted fl->c.flc_* field references to flat fl->fl_* naming convention ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 46b9fd1433d2674e36cd78dcb8e8765106330d38) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
mfs: refactor the event post check function kylin inclusion category: cleanup bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15610 ------------------ Refactor the event post check function, make the code more concise. Signed-off-by: Gang He <hegang@kylinos.cn> | 5 天前 | |
minix: Add required sanity checking to minix_check_superblock() stable inclusion from stable-v6.6.128 commit 2bb588cede1c1969e49c0a2822c8cb8b346b7682 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2bb588cede1c1969e49c0a2822c8cb8b346b7682 -------------------------------- commit 2bb588cede1c1969e49c0a2822c8cb8b346b7682 upstream. [ Upstream commit 8c97a6ddc95690a938ded44b4e3202f03f15078c ] The fs/minix implementation of the minix filesystem does not currently support any other value for s_log_zone_size than 0. This is also the only value supported in util-linux; see mkfs.minix.c line 511. In addition, this patch adds some sanity checking for the other minix superblock fields, and moves the minix_blocks_needed() checks for the zmap and imap also to minix_check_super_block(). This also closes a related syzbot bug report. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251208153947.108343-1-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+5ad0824204c7bf9b67f2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5ad0824204c7bf9b67f2 Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
netfs: Fix potential uninitialised var in netfs_extract_user_iter() stable inclusion from stable-v6.6.141 commit f9957ea12103da2683e2a8c82bd1e605585d4859 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f9957ea12103da2683e2a8c82bd1e605585d4859 -------------------------------- commit 7e3d8db899d54af39fafb2eb3392b0cdae9973b5 upstream. In netfs_extract_user_iter(), if it's given a zero-length iterator, it will fall through the loop without setting ret, and so the error handling behaviour will be undefined, depending on whether ret happens to be negative. The value of ret then propagates back up the callstack. Fix this by presetting ret to 0. Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-9-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
nfs/blocklayout: Fix compilation error ( make W=1) in bl_write_pagelist() stable inclusion from stable-v6.6.141 commit a7fd0d0cb43f3b930e31575154feba8b3261e389 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a7fd0d0cb43f3b930e31575154feba8b3261e389 -------------------------------- commit a7fd0d0cb43f3b930e31575154feba8b3261e389 upstream. [ Upstream commit f83c8dda456ce4863f346aa26d88efa276eda35d ] Clang compiler is not happy about set but unused variable (when dprintk() is no-op): .../blocklayout/blocklayout.c:384:9: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] Remove a leftover from the previous cleanup. Fixes: 3a6fd1f004fc ("pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist") Acked-by: Anna Schumaker <anna.schumkaer@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
NFSv4.2: remove MODULE_LICENSE in non-modules Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> | 3 年前 | |
nfsd: fix return error code for nfsd_map_name_to_[ug]id stable inclusion from stable-v6.6.128 commit 13c1f31f777cdf30ab12f39a34500ab8f8a44a85 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=13c1f31f777cdf30ab12f39a34500ab8f8a44a85 -------------------------------- commit 13c1f31f777cdf30ab12f39a34500ab8f8a44a85 upstream. [ Upstream commit 404d779466646bf1461f2090ff137e99acaecf42 ] idmap lookups can time out while the cache is waiting for a userspace upcall reply. In that case cache_check() returns -ETIMEDOUT to callers. The nfsd_map_name_to_[ug]id functions currently proceed with attempting to map the id to a kuid despite a potentially temporary failure to perform the idmap lookup. This results in the code returning the error NFSERR_BADOWNER which can cause client operations to return to userspace with failure. Fix this by returning the failure status before attempting kuid mapping. This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients can retry the operation instead of aborting it. Fixes: 65e10f6d0ab0 ("nfsd: Convert idmap to use kuids and kgids") Cc: stable@vger.kernel.org Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty() stable inclusion from stable-v6.6.141 commit b88f905d4449b70da6bda547be546e365e44352e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b88f905d4449b70da6bda547be546e365e44352e -------------------------------- commit b88f905d4449b70da6bda547be546e365e44352e upstream. [ Upstream commit be3e5d10643d3be1cbac9d9939f220a99253f980 ] nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead blocks by comparing it with the current block number bd_blocknr. If they differ, the block is considered dead and skipped. However, bd_oblocknr should never be 0 since block 0 typically stores the primary superblock and is never a valid GC target block. A corrupted ioctl request with bd_oblocknr set to 0 causes the comparison to incorrectly match when the lookup returns -ENOENT and sets bd_blocknr to 0, bypassing the dead block check and calling nilfs_bmap_mark() on a non-existent block. This causes nilfs_btree_do_lookup() to return -ENOENT, triggering the WARN_ON(ret == -ENOENT). Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at the beginning of each iteration. [ryusuke: slightly modified the commit message and comments for accuracy] Fixes: 7942b919f732 ("nilfs2: ioctl operations") Reported-by: syzbot+98a040252119df0506f8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=98a040252119df0506f8 Suggested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Reported-by: syzbot+466a45fcfb0562f5b9a0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=466a45fcfb0562f5b9a0 Cc: Junjie Cao <junjie.cao@linux.dev> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
fs/nls: Fix inconsistency between utf8_to_utf32() and utf32_to_utf8() stable inclusion from stable-v6.6.120 commit 4543d9ccd99e77a7697a52be73c0d297ccf6628e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4543d9ccd99e77a7697a52be73c0d297ccf6628e -------------------------------- [ Upstream commit c36f9d7b2869a003a2f7d6ff2c6bac9e62fd7d68 ] After commit 25524b619029 ("fs/nls: Fix utf16 to utf8 conversion"), the return values of utf8_to_utf32() and utf32_to_utf8() are inconsistent when encountering an error: utf8_to_utf32() returns -1, while utf32_to_utf8() returns errno codes. Fix this inconsistency by modifying utf8_to_utf32() to return errno codes as well. Fixes: 25524b619029 ("fs/nls: Fix utf16 to utf8 conversion") Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20251129111535.8984-1-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 4543d9ccd99e77a7697a52be73c0d297ccf6628e) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
fsnotify: fix inode reference leak in fsnotify_recalc_mask() stable inclusion from stable-v6.12.91 commit 8c8afa6444e6bdc145d2bf2f3aeeca6da3e36b42 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15692 Reference: https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8c8afa6444e6bdc145d2bf2f3aeeca6da3e36b42 -------------------------------- [ Upstream commit 4aca914ac152f5d055ddcb36704d1e539ac08977 ] fsnotify_recalc_mask() fails to handle the return value of __fsnotify_recalc_mask(), which may return an inode pointer that needs to be released via fsnotify_drop_object() when the connector's HAS_IREF flag transitions from set to cleared. This manifests as a hung task with the following call trace: INFO: task umount:1234 blocked for more than 120 seconds. Call Trace: __schedule schedule fsnotify_sb_delete generic_shutdown_super kill_anon_super cleanup_mnt task_work_run do_exit do_group_exit The race window that triggers the iref leak: Thread A (adding mark) Thread B (removing mark) ────────────────────── ──────────────────────── fsnotify_add_mark_locked(): fsnotify_add_mark_list(): spin_lock(conn->lock) add mark_B(evictable) to list spin_unlock(conn->lock) return /* ---- gap: no lock held ---- */ fsnotify_detach_mark(mark_A): spin_lock(mark_A->lock) clear ATTACHED flag on mark_A spin_unlock(mark_A->lock) fsnotify_put_mark(mark_A) fsnotify_recalc_mask(): spin_lock(conn->lock) __fsnotify_recalc_mask(): /* mark_A skipped: ATTACHED cleared */ /* only mark_B(evictable) remains */ want_iref = false has_iref = true /* not yet cleared */ -> HAS_IREF transitions true -> false -> returns inode pointer spin_unlock(conn->lock) /* BUG: return value discarded! * iput() and fsnotify_put_sb_watched_objects() * are never called */ Fix this by deferring the transition true -> false of HAS_IREF flag from fsnotify_recalc_mask() (Thread A) to fsnotify_put_mark() (thread B). Fixes: c3638b5b1374 ("fsnotify: allow adding an inode mark without pinning inode") Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/CAOQ4uxiPsbHb0o5voUKyPFMvBsDkG914FYDcs4C5UpBMNm0Vcg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: fs/notify/mark.c [Commit 35ceae44742e ("fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()") has not mergfed, not affect to this patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 3 天前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
| 16 天前 | ||
ocfs2: validate group add input before caching stable inclusion from stable-v6.6.141 commit b9ae3942deec4c9e3fa2070521f90910f7490011 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b9ae3942deec4c9e3fa2070521f90910f7490011 -------------------------------- commit b9ae3942deec4c9e3fa2070521f90910f7490011 upstream. [ Upstream commit 70b672833f4025341c11b22c7f83778a5cd611bc ] [BUG] OCFS2_IOC_GROUP_ADD can trigger a BUG_ON in ocfs2_set_new_buffer_uptodate(): kernel BUG at fs/ocfs2/uptodate.c:509! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:ocfs2_set_new_buffer_uptodate+0x194/0x1e0 fs/ocfs2/uptodate.c:509 Code: ffffe88f 42b9fe4c 89e64889 dfe8b4df Call Trace: ocfs2_group_add+0x3f1/0x1510 fs/ocfs2/resize.c:507 ocfs2_ioctl+0x309/0x6e0 fs/ocfs2/ioctl.c:887 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 x64_sys_call+0x1144/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7bbfb55a966d [CAUSE] ocfs2_group_add() calls ocfs2_set_new_buffer_uptodate() on a user-controlled group block before ocfs2_verify_group_and_input() validates that block number. That helper is only valid for newly allocated metadata and asserts that the block is not already present in the chosen metadata cache. The code also uses INODE_CACHE(inode) even though the group descriptor belongs to main_bm_inode and later journal accesses use that cache context instead. [FIX] Validate the on-disk group descriptor before caching it, then add it to the metadata cache tracked by INODE_CACHE(main_bm_inode). Keep the validation failure path separate from the later cleanup path so we only remove the buffer from that cache after it has actually been inserted. This keeps the group buffer lifetime consistent across validation, journaling, and cleanup. Link: https://lkml.kernel.org/r/20260410020209.3786348-1-gality369@gmail.com Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START stable inclusion from stable-v6.6.141 commit 131ea3e57fc22936ed0e2c8330f2e36106172f51 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=131ea3e57fc22936ed0e2c8330f2e36106172f51 -------------------------------- commit 131ea3e57fc22936ed0e2c8330f2e36106172f51 upstream. [ Upstream commit 0621c385fda1376e967f37ccd534c26c3e511d14 ] omfs_fill_super() rejects oversized s_sys_blocksize values (> PAGE_SIZE), but it does not reject values smaller than OMFS_DIR_START (0x1b8 = 440). Later, omfs_make_empty() uses sbi->s_sys_blocksize - OMFS_DIR_START as the length argument to memset(). Since s_sys_blocksize is u32, a crafted filesystem image with s_sys_blocksize < OMFS_DIR_START causes an unsigned underflow there, wrapping to a value near 2^32. That drives a ~4 GiB memset() from bh->b_data + OMFS_DIR_START and overwrites kernel memory far beyond the backing block buffer. Add the corresponding lower-bound check alongside the existing upper-bound check in omfs_fill_super(), so that malformed images are rejected during superblock validation before any filesystem data is processed. Fixes: a3ab7155ea21 ("omfs: add directory routines") Signed-off-by: Hyungjung Joo <jhj140711@gmail.com> Link: https://patch.msgid.link/20260317054827.1822061-1-jhj140711@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
openpromfs: finish conversion to the new mount API stable inclusion from stable-v6.6.33 commit d142957377c291478524269495929cb1e31920ba category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAD6H2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d142957377c291478524269495929cb1e31920ba -------------------------------- [ Upstream commit 8f27829974b025d4df2e78894105d75e3bf349f0 ] The original mount API conversion inexplicably left out the change from ->remount_fs to ->reconfigure; do that now. Fixes: 7ab2fa7693c3 ("vfs: Convert openpromfs to use the new mount API") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/90b968aa-c979-420f-ba37-5acc3391b28f@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 1 年前 | |
orangefs: fix xattr related buffer overflow... stable inclusion from stable-v6.6.117 commit e09a096104fc65859422817fb2211f35855983fe category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8763 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e09a096104fc65859422817fb2211f35855983fe -------------------------------- [ Upstream commit 025e880759c279ec64d0f754fe65bf45961da864 ] Willy Tarreau <w@1wt.eu> forwarded me a message from Disclosure <disclosure@aisle.com> with the following warning: > The helper xattr_key() uses the pointer variable in the loop condition > rather than dereferencing it. As key is incremented, it remains non-NULL > (until it runs into unmapped memory), so the loop does not terminate on > valid C strings and will walk memory indefinitely, consuming CPU or hanging > the thread. I easily reproduced this with setfattr and getfattr, causing a kernel oops, hung user processes and corrupted orangefs files. Disclosure sent along a diff (not a patch) with a suggested fix, which I based this patch on. After xattr_key started working right, xfstest generic/069 exposed an xattr related memory leak that lead to OOM. xattr_key returns a hashed key. When adding xattrs to the orangefs xattr cache, orangefs used hash_add, a kernel hashing macro. hash_add also hashes the key using hash_log which resulted in additions to the xattr cache going to the wrong hash bucket. generic/069 tortures a single file and orangefs does a getattr for the xattr "security.capability" every time. Orangefs negative caches on xattrs which includes a kmalloc. Since adds to the xattr cache were going to the wrong bucket, every getattr for "security.capability" resulted in another kmalloc, none of which were ever freed. I changed the two uses of hash_add to hlist_add_head instead and the memory leak ceased and generic/069 quit throwing furniture. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Stanislav Fort of Aisle Research <stanislav.fort@aisle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit e09a096104fc65859422817fb2211f35855983fe) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
ovl: Fix uninit-value in ovl_fill_real stable inclusion from stable-v6.6.128 commit 43b6f69e18063ae01911990f8726e9927e070ebf category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=43b6f69e18063ae01911990f8726e9927e070ebf -------------------------------- commit 43b6f69e18063ae01911990f8726e9927e070ebf upstream. [ Upstream commit 1992330d90dd766fcf1730fd7bf2d6af65370ac4 ] Syzbot reported a KMSAN uninit-value issue in ovl_fill_real. This iusse's call chain is: __do_sys_getdents64() -> iterate_dir() ... -> ext4_readdir() -> fscrypt_fname_alloc_buffer() // alloc -> fscrypt_fname_disk_to_usr // write without tail '\0' -> dir_emit() -> ovl_fill_real() // read by strcmp() The string is used to store the decrypted directory entry name for an encrypted inode. As shown in the call chain, fscrypt_fname_disk_to_usr() write it without null-terminate. However, ovl_fill_real() uses strcmp() to compare the name against "..", which assumes a null-terminated string and may trigger a KMSAN uninit-value warning when the buffer tail contains uninit data. Reported-by: syzbot+d130f98b2c265fae5297@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d130f98b2c265fae5297 Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs") Signed-off-by: Qing Wang <wangqing7171@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260128132406.23768-2-amir73il@gmail.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
mm/cma_folio: make CMA folio allocation per-task opt-in via procfs euleros inclusion category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/9121 --------------------------------------------- Add task_struct::folio_cma_enabled and expose it through /proc/<pid>/folio_cma_enabled. alloc_cma_folio_mpol() now consults current->folio_cma_enabled before attempting a CMA folio allocation; if unset, the request falls back to the normal page allocator. This lets administrators opt individual processes into CMA folio usage instead of having it enabled globally for all eligible allocations. Signed-off-by: Ni Cunshu <nicunshu@huawei.com> | 1 个月前 | |
pstore/ram: fix resource leak when ioremap() fails mainline inclusion from mainline-v7.1-rc1 commit 2ddb69f686ef7a621645e97fc7329c50edf5d0e5 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15064 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ddb69f686ef7a621645e97fc7329c50edf5d0e5 ---------------------------------------------------------------------- In persistent_ram_iomap(), ioremap() or ioremap_wc() may return NULL on failure. Currently, if this happens, the function returns NULL without releasing the memory region acquired by request_mem_region(). This leads to a resource leak where the memory region remains reserved but unusable. Additionally, the caller persistent_ram_buffer_map() handles NULL correctly by returning -ENOMEM, but without this check, a NULL return combined with request_mem_region() succeeding leaves resources in an inconsistent state. This is the ioremap() counterpart to commit 05363abc7625 ("pstore: ram_core: fix incorrect success return when vmap() fails") which fixed a similar issue in the vmap() path. Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory") Signed-off-by: Cole Leavitt <cole@unwrap.rs> Link: https://patch.msgid.link/20260225235406.11790-1-cole@unwrap.rs Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> | 1 个月前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
quota: Fix race of dquot_scan_active() with quota deactivation stable inclusion from stable-v6.6.141 commit 6678dde265708003c2b42551af4a2e3cb05decd5 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6678dde265708003c2b42551af4a2e3cb05decd5 -------------------------------- commit 6678dde265708003c2b42551af4a2e3cb05decd5 upstream. [ Upstream commit e93ab401da4b2e2c1b8ef2424de2f238d51c8b2d ] dquot_scan_active() can race with quota deactivation in quota_release_workfn() like: CPU0 (quota_release_workfn) CPU1 (dquot_scan_active) ============================== ============================== spin_lock(&dq_list_lock); list_replace_init( &releasing_dquots, &rls_head); /* dquot X on rls_head, dq_count == 0, DQ_ACTIVE_B still set */ spin_unlock(&dq_list_lock); synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); list_for_each_entry(dquot, &inuse_list, dq_inuse) { /* finds dquot X */ dquot_active(X) -> true atomic_inc(&X->dq_count); } spin_unlock(&dq_list_lock); spin_lock(&dq_list_lock); dquot = list_first_entry(&rls_head); WARN_ON_ONCE(atomic_read(&dquot->dq_count)); The problem is not only a cosmetic one as under memory pressure the caller of dquot_scan_active() can end up working on freed dquot. Fix the problem by making sure the dquot is removed from releasing list when we acquire a reference to it. Fixes: 869b6ea1609f ("quota: Fix slow quotaoff") Reported-by: Sam Sun <samsun1006219@gmail.com> Link: https://lore.kernel.org/all/CAEkJfYPTt3uP1vAYnQ5V2ZWn5O9PLhhGi5HbOcAzyP9vbXyjeg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
ramfs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-69-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
reiserfs: fix uninit-value in comp_keys stable inclusion from stable-v6.6.47 commit ca9b877a2e2c04fa28e2aa17ce14d0633cbb73b4 bugzilla: https://gitee.com/openeuler/kernel/issues/IAHMJO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ca9b877a2e2c04fa28e2aa17ce14d0633cbb73b4 -------------------------------- [ Upstream commit dd8f87f21dc3da2eaf46e7401173f935b90b13a8 ] The cpu_key was not initialized in reiserfs_delete_solid_item(), which triggered this issue. Reported-and-tested-by: <syzbot+b3b14fb9f8a14c5d0267@syzkaller.appspotmail.com> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_9EA7E746DE92DBC66049A62EDF6ED64CA706@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> | 1 年前 | |
fs/resctrl: Add boiler plate for external resctrl code mainline inclusion from mainline-v6.16-rc1 commit bff70402d6d67843fe319338e4c56e1cba13fbd8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/8967 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bff70402d6d67843fe319338e4c56e1cba13fbd8 --------------------------- Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com Conflicts: arch/Kconfig arch/x86/Kconfig fs/Makefile [jz: rename upstream fs/resctrl to fs/resctrl_x86, and use CONFIG_RESCTRL_FS_X86 to differentiate with mpam resctrl] Signed-off-by: Jason Zeng <jason.zeng@intel.com> | 1 个月前 | |
fs/resctrl: Move RMID initialization to first mount mainline inclusion from mainline-v7.0-rc1 commit d0891647fbc6e931f27517364cbc4ee1811d76db category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/8967 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0891647fbc6e931f27517364cbc4ee1811d76db -------------------------------- L3 monitor features are enumerated during resctrl initialization and rmid_ptrs[] that tracks all RMIDs and depends on the number of supported RMIDs is allocated during this time. Telemetry monitor features are enumerated during first resctrl mount and may support a different number of RMIDs compared to L3 monitor features. Delay allocation and initialization of rmid_ptrs[] until first mount. Since the number of RMIDs cannot change on later mounts, keep the same set of rmid_ptrs[] until resctrl_exit(). This is required because the limbo handler keeps running after resctrl is unmounted and needs to access rmid_ptrs[] as it keeps tracking busy RMIDs after unmount. Rename routines to match what they now do: dom_data_init() -> setup_rmid_lru_list() dom_data_exit() -> free_rmid_lru_list() Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com Signed-off-by: Jason Zeng <jason.zeng@intel.com> | 1 个月前 | |
romfs: check sb_set_blocksize() return value stable inclusion from stable-v6.6.127 commit cbd9931e6456822067725354d83446c5bb813030 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9250 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cbd9931e6456822067725354d83446c5bb813030 -------------------------------- [ Upstream commit ab7ad7abb3660c58ffffdf07ff3bb976e7e0afa0 ] romfs_fill_super() ignores the return value of sb_set_blocksize(), which can fail if the requested block size is incompatible with the block device's configuration. This can be triggered by setting a loop device's block size larger than PAGE_SIZE using ioctl(LOOP_SET_BLOCK_SIZE, 32768), then mounting a romfs filesystem on that device. When sb_set_blocksize(sb, ROMBSIZE) is called with ROMBSIZE=4096 but the device has logical_block_size=32768, bdev_validate_blocksize() fails because the requested size is smaller than the device's logical block size. sb_set_blocksize() returns 0 (failure), but romfs ignores this and continues mounting. The superblock's block size remains at the device's logical block size (32768). Later, when sb_bread() attempts I/O with this oversized block size, it triggers a kernel BUG in folio_set_bh(): kernel BUG at fs/buffer.c:1582! BUG_ON(size > PAGE_SIZE); Fix by checking the return value of sb_set_blocksize() and failing the mount with -EINVAL if it returns 0. Reported-by: syzbot+9c4e33e12283d9437c25@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9c4e33e12283d9437c25 Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20260113084037.1167887-1-kartikey406@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit cbd9931e6456822067725354d83446c5bb813030) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 1 个月前 | |
smb/client: fix possible infinite loop and oob read in symlink_data() stable inclusion from stable-v6.6.141 commit b41598bf54b3fe528994e573df6008f8f4d0a4f4 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15669 CVE: CVE-2026-52967 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b41598bf54b3fe528994e573df6008f8f4d0a4f4 -------------------------------- commit 7d9a7f1f96cd617ee9e75bb22217c709038e26b8 upstream. On 32-bit architectures, the infinite loop is as follows: len = p->ErrorDataLength == 0xfffffff8 u8 *next = p->ErrorContextData + len next == p On 32-bit architectures, the out-of-bounds read is as follows: len = p->ErrorDataLength == 0xfffffff0 u8 *next = p->ErrorContextData + len next == (u8 *)p - 8 Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Cc: stable@vger.kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: fs/smb/client/smb2file.c [ Context conflict ] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 3 天前 | |
Squashfs: check metadata block offset is within range stable inclusion from stable-v6.6.130 commit 6b847d65f5b0065e02080c61fad93d57d6686383 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14019 CVE: CVE-2026-23388 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6b847d65f5b0065e02080c61fad93d57d6686383 -------------------------------- commit fdb24a820a5832ec4532273282cbd4f22c291a0d upstream. Syzkaller reports a "general protection fault in squashfs_copy_data" This is ultimately caused by a corrupted index look-up table, which produces a negative metadata block offset. This is subsequently passed to squashfs_copy_data (via squashfs_read_metadata) where the negative offset causes an out of bounds access. The fix is to check that the offset is within range in squashfs_read_metadata. This will trap this and other cases. Link: https://lkml.kernel.org/r/20260217050955.138351-1-phillip@squashfs.org.uk Fixes: f400e12656ab ("Squashfs: cache operations") Reported-by: syzbot+a9747fe1c35a5b115d3f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/699234e2.a70a0220.2c38d7.00e2.GAE@google.com/ Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Long Li <leo.lilong@huawei.com> | 2 个月前 | |
kernfs: Use RCU to access kernfs_node::name. mainline inclusion from mainline-v6.15-rc1 commit 741c10b096bc4dd79cd9f215b6ef173bb953e75c category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/8967 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=741c10b096bc4dd79cd9f215b6ef173bb953e75c --------------------------- Using RCU lifetime rules to access kernfs_node::name can avoid the trouble with kernfs_rename_lock in kernfs_name() and kernfs_path_from_node() if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is usefull as it allows to implement kernfs_path_from_node() only with RCU protection and avoiding kernfs_rename_lock. The lock is only required if the __parent node can be changed and the function requires an unchanged hierarchy while it iterates from the node to its parent. The change is needed to allow the lookup of the node's path (kernfs_path_from_node()) from context which runs always with disabled preemption and or interrutps even on PREEMPT_RT. The problem is that kernfs_rename_lock becomes a sleeping lock on PREEMPT_RT. I went through all ::name users and added the required access for the lookup with a few extensions: - rdtgroup_pseudo_lock_create() drops all locks and then uses the name later on. resctrl supports rename with different parents. Here I made a temporal copy of the name while it is used outside of the lock. - kernfs_rename_ns() accepts NULL as new_parent. This simplifies sysfs_move_dir_ns() where it can set NULL in order to reuse the current name. - kernfs_rename_ns() is only using kernfs_rename_lock if the parents are different. All users use either kernfs_rwsem (for stable path view) or just RCU for the lookup. The ::name uses always RCU free. Use RCU lifetime guarantees to access kernfs_node::name. Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot+6ea37e2e6ffccf41a7e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/67251dc6.050a0220.529b6.015e.GAE@google.com/ Reported-by: Hillf Danton <hdanton@sina.com> Closes: https://lore.kernel.org/20241102001224.2789-1-hdanton@sina.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250213145023.2820193-7-bigeasy@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: fs/kernfs/file.c [jz: - resolve context conflict - add asm/sections.h header file inclusion because is_kernel_rodata() need this. Upstream doesn't need this inclusion because its linux/security.h indirectly included asm/sections.h - also adapt code in fs/resctrl] Signed-off-by: Jason Zeng <jason.zeng@intel.com> | 1 个月前 | |
sysv: don't call sb_bread() with pointers_lock held stable inclusion from stable-v6.6.27 commit 89e8524135a3902e7563a5a59b7b5ec1bf4904ac bugzilla: https://gitee.com/openeuler/kernel/issues/I9MPZ8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=89e8524135a3902e7563a5a59b7b5ec1bf4904ac -------------------------------- [ Upstream commit f123dc86388cb669c3d6322702dc441abc35c31e ] syzbot is reporting sleep in atomic context in SysV filesystem [1], for sb_bread() is called with rw_spinlock held. A "write_lock(&pointers_lock) => read_lock(&pointers_lock) deadlock" bug and a "sb_bread() with write_lock(&pointers_lock)" bug were introduced by "Replace BKL for chain locking with sysvfs-private rwlock" in Linux 2.5.12. Then, "[PATCH] err1-40: sysvfs locking fix" in Linux 2.6.8 fixed the former bug by moving pointers_lock lock to the callers, but instead introduced a "sb_bread() with read_lock(&pointers_lock)" bug (which made this problem easier to hit). Al Viro suggested that why not to do like get_branch()/get_block()/ find_shared() in Minix filesystem does. And doing like that is almost a revert of "[PATCH] err1-40: sysvfs locking fix" except that get_branch() from with find_shared() is called without write_lock(&pointers_lock). Reported-by: syzbot <syzbot+69b40dc5fd40f32c199f@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=69b40dc5fd40f32c199f Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/0d195f93-a22a-49a2-0020-103534d6f7f6@I-love.SAKURA.ne.jp Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> | 2 年前 | |
eventfs: Use list_add_tail_rcu() for SRCU-protected children list stable inclusion from stable-v6.6.141 commit 958e032618c8016efa1f5993b7cdd8297dd900d6 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=958e032618c8016efa1f5993b7cdd8297dd900d6 -------------------------------- commit 958e032618c8016efa1f5993b7cdd8297dd900d6 upstream. [ Upstream commit f67950b2887fa10df50c4317a1fe98a65bc6875b ] Commit d2603279c7d6 ("eventfs: Use list_del_rcu() for SRCU protected list variable") converted the removal side to pair with the list_for_each_entry_srcu() walker in eventfs_iterate(). The insertion in eventfs_create_dir() was left as a plain list_add_tail(), which on weakly-ordered architectures can expose a new entry to the SRCU reader before its list pointers and fields are observable. Use list_add_tail_rcu() so the publication pairs with the existing list_del_rcu() and list_for_each_entry_srcu(). Fixes: 43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260418152251.199343-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [ adapted scoped_guard(mutex, &eventfs_mutex) block to explicit mutex_lock()/mutex_unlock() pair ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
mm: add KABI_* macros to preserve KABI hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A7I ------------------------------------------------- The shrinker patchset changed the kAPI. Add KABI markups to prevent CRC symbol changes. Signed-off-by: Mauro Carvalho Chehab <m.chehab@huawei.com> | 8 个月前 | |
udf: fix partition descriptor append bookkeeping stable inclusion from stable-v6.6.140 commit 058b451b1039f056d1362c4fec2229e522366ab0 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=058b451b1039f056d1362c4fec2229e522366ab0 -------------------------------- commit 058b451b1039f056d1362c4fec2229e522366ab0 upstream. [ Upstream commit 08841b06fa64d8edbd1a21ca6e613420c90cc4b8 ] Mounting a crafted UDF image with repeated partition descriptors can trigger a heap out-of-bounds write in part_descs_loc[]. handle_partition_descriptor() deduplicates entries by partition number, but appended slots never record partnum. As a result duplicate Partition Descriptors are appended repeatedly and num_part_descs keeps growing. Once the table is full, the growth path still sizes the allocation from partnum even though inserts are indexed by num_part_descs. If partnum is already aligned to PART_DESC_ALLOC_STEP, ALIGN(partnum, step) can keep the old capacity and the next append writes past the end of the table. Store partnum in the appended slot and size growth from the next append count so deduplication and capacity tracking follow the same model. Fixes: ee4af50ca94f ("udf: Fix mounting of Win7 created UDF filesystems") Cc: stable@vger.kernel.org Signed-off-by: Seohyeon Maeng <bioloidgp@gmail.com> Link: https://patch.msgid.link/20260310081652.21220-1-bioloidgp@gmail.com Signed-off-by: Jan Kara <jack@suse.cz> [ replaced kzalloc_objs() helper with equivalent kcalloc() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ... | 2 年前 | |
unicode: Fix utf8_load() error path stable inclusion from stable-v6.6.64 commit c4b6c1781f6cc4e2283120ac8d873864b8056f21 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBL4B6 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c4b6c1781f6cc4e2283120ac8d873864b8056f21 -------------------------------- [ Upstream commit 156bb2c569cd869583c593d27a5bd69e7b2a4264 ] utf8_load() requests the symbol "utf8_data_table" and then checks if the requested UTF-8 version is supported. If it's unsupported, it tries to put the data table using symbol_put(). If an unsupported version is requested, symbol_put() fails like this: kernel BUG at kernel/module/main.c:786! RIP: 0010:__symbol_put+0x93/0xb0 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x65/0x80 ? __symbol_put+0x93/0xb0 ? exc_invalid_op+0x51/0x70 ? __symbol_put+0x93/0xb0 ? asm_exc_invalid_op+0x1a/0x20 ? __pfx_cmp_name+0x10/0x10 ? __symbol_put+0x93/0xb0 ? __symbol_put+0x62/0xb0 utf8_load+0xf8/0x150 That happens because symbol_put() expects the unique string that identify the symbol, instead of a pointer to the loaded symbol. Fix that by using such string. Fixes: 2b3d04787012 ("unicode: Add utf8-data module") Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20240902225511.757831-2-andrealmeid@igalia.com Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wen Zhiwei <wenzhiwei@kylinos.cn> | 1 年前 | |
vboxsf: Add __nonstring annotations for unterminated strings mainline inclusion from mainline-v6.14-rc7 commit 986a6f5eacb900ea0f6036ef724b26e76be40f65 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICCVOJ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=986a6f5eacb900ea0f6036ef724b26e76be40f65 -------------------------------- When a character array without a terminating NUL character has a static initializer, GCC 15's -Wunterminated-string-initialization will only warn if the array lacks the "nonstring" attribute[1]. Mark the arrays with __nonstring to and correctly identify the char array as "not a C string" and thereby eliminate the warning. This effectively reverts the change in 4e7487245abc ("vboxsf: fix building with GCC 15"), to add the annotation that has other uses (i.e. warning if the string is ever used with C string APIs). Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Cc: Hans de Goede <hdegoede@redhat.com> Cc: Brahmajit Das <brahmajit.xyz@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250310222530.work.374-kees@kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 986a6f5eacb900ea0f6036ef724b26e76be40f65) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 1 年前 | |
fsverity: use register_sysctl_init() to avoid kmemleak warning stable inclusion from stable-v6.6.34 commit d171c85d74c6fdc84b8082cb034c230b353b8f6a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAD6H2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d171c85d74c6fdc84b8082cb034c230b353b8f6a -------------------------------- commit ee5814dddefbaa181cb247a75676dd5103775db1 upstream. Since the fsverity sysctl registration runs as a builtin initcall, there is no corresponding sysctl deregistration and the resulting struct ctl_table_header is not used. This can cause a kmemleak warning just after the system boots up. (A pointer to the ctl_table_header is stored in the fsverity_sysctl_header static variable, which kmemleak should detect; however, the compiler can optimize out that variable.) Avoid the kmemleak warning by using register_sysctl_init() which is intended for use by builtin initcalls and uses kmemleak_not_leak(). Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/r/CAHj4cs8DTSvR698UE040rs_pX1k-WVe7aR6N2OoXXuhXJPDC-w@mail.gmail.com Cc: stable@vger.kernel.org Reviewed-by: Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/r/20240501025331.594183-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 1 年前 | |
| 4 天前 | ||
iomap: pass the length of the dirty region to ->map_blocks mainline inclusion from mainline-v6.9-rc1 commit 19871b5c7a003946d3cd4209a348ab7c0df5dbad category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9DN5Z CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19871b5c7a003946d3cd4209a348ab7c0df5dbad -------------------------------- Let the file system know how much dirty data exists at the passed in offset. This allows file systems to allocate the right amount of space that actually is written back if they can't eagerly convert (e.g. because they don't support unwritten extents). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-15-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> | 2 年前 | |
fs/resctrl: Add boiler plate for external resctrl code mainline inclusion from mainline-v6.16-rc1 commit bff70402d6d67843fe319338e4c56e1cba13fbd8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/8967 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bff70402d6d67843fe319338e4c56e1cba13fbd8 --------------------------- Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com Conflicts: arch/Kconfig arch/x86/Kconfig fs/Makefile [jz: rename upstream fs/resctrl to fs/resctrl_x86, and use CONFIG_RESCTRL_FS_X86 to differentiate with mpam resctrl] Signed-off-by: Jason Zeng <jason.zeng@intel.com> | 1 个月前 | |
Revert "Kconfig: regularize selection of CONFIG_BINFMT_ELF" hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8S6IA ------------------------------------------------- This reverts commit 41026c343540e33627e23c8a91ebb679a7c0f89c. Since commit 7de3ab4c3dd9 ("arm64: introduce binfmt_elf32.c"), whether it is ILP32 or AARCH32_EL0, fs/compat_binfmt.c is not used for arm64. So it is not reasonable to be default y for COMPAT_BINFMT_ELF and revert it to to make all architectures to individually select it. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> | 2 年前 | |
fs/resctrl: Add boiler plate for external resctrl code mainline inclusion from mainline-v6.16-rc1 commit bff70402d6d67843fe319338e4c56e1cba13fbd8 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/8967 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bff70402d6d67843fe319338e4c56e1cba13fbd8 --------------------------- Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com Conflicts: arch/Kconfig arch/x86/Kconfig fs/Makefile [jz: rename upstream fs/resctrl to fs/resctrl_x86, and use CONFIG_RESCTRL_FS_X86 to differentiate with mpam resctrl] Signed-off-by: Jason Zeng <jason.zeng@intel.com> | 1 个月前 | |
fs: Initial atomic write support mainline inclusion from mainline-v6.10-rc3 commit c34fc6f26ab86d03a2d47446f42b6cd492dfdc56 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8862 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c34fc6f26ab86d03a2d47446f42b6cd492dfdc56 -------------------------------- An atomic write is a write issued with torn-write protection, meaning that for a power failure or any other hardware failure, all or none of the data from the write will be stored, but never a mix of old and new data. Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the write is to be issued with torn-write prevention, according to special alignment and length rules. For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for iocb->ki_flags field to indicate the same. A call to statx will give the relevant atomic write info for a file: - atomic_write_unit_min - atomic_write_unit_max - atomic_write_segments_max Both min and max values must be a power-of-2. Applications can avail of atomic write feature by ensuring that the total length of a write is a power-of-2 in size and also sized between atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications must ensure that the write is at a naturally-aligned offset in the file wrt the total write length. The value in atomic_write_segments_max indicates the upper limit for IOV_ITER iovcnt. Add file mode flag FMODE_CAN_ATOMIC_WRITE, so files which do not have the flag set will have RWF_ATOMIC rejected and not just ignored. Add a type argument to kiocb_set_rw_flags() to allows reads which have RWF_ATOMIC set to be rejected. Helper function generic_atomic_write_valid() can be used by FSes to verify compliant writes. There we check for iov_iter type is for ubuf, which implies iovcnt==1 for pwritev2(), which is an initial restriction for atomic_write_segments_max. Initially the only user will be bdev file operations write handler. We will rely on the block BIO submission path to ensure write sizes are compliant for the bdev, so we don't need to check atomic writes sizes yet. Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com> jpg: merge into single patch and much rewrite Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-4-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: fs/read_write.c include/linux/fs.h include/uapi/linux/fs.h io_uring/rw.c Signed-off-by: Long Li <leo.lilong@huawei.com> | 1 个月前 | |
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure() mainline inclusion from mainline-v6.8-rc1 commit 4f0b9194bc119a9850a99e5e824808e2f468c348 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8403 CVE: NA Reference: https://github.com/torvalds/linux/commit/4f0b9194bc119a9850a99e5e824808e2f468c348 -------------------------------- commit 4f0b9194bc119a9850a99e5e824808e2f468c348 upstream The call to the inode_init_security_anon() LSM hook is not the sole reason to use anon_inode_getfile_secure() or anon_inode_getfd_secure(). For example, the functions also allow one to create a file with non-zero size, without needing a full-blown filesystem. In this case, you don't need a "secure" version, just unique inodes; the current name of the functions is confusing and does not explain well the difference with the more "standard" anon_inode_getfile() and anon_inode_getfd(). Of course, there is another side of the coin; neither io_uring nor userfaultfd strictly speaking need distinct inodes, and it is not that clear anymore that anon_inode_create_get{file,fd}() allow the LSM to intercept and block the inode's creation. If one was so inclined, anon_inode_getfile_secure() and anon_inode_getfd_secure() could be kept, using the shared inode or a new one depending on CONFIG_SECURITY. However, this is probably overkill, and potentially a cause of bugs in different configurations. Therefore, just add a comment to io_uring and userfaultfd explaining the choice of the function. While at it, remove the export for what is now anon_inode_create_getfd(). There is no in-tree module that uses it, and the old name is gone anyway. If anybody actually needs the symbol, they can ask or they can just use anon_inode_create_getfile(), which will be exported very soon for use in KVM. [Backport changes] 1. In file "io_uring/io_uring.c" function io_uring_get_file() retaines the changes made in commit 6fc19b3 and matches the latest changes made in upstream commit 09d1c6a, in which instead of returning variable assigned with older struct, function directly returns struct anon_inode_create_getfile which also reflects the final version. Suggested-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: PvsNarasimha <PVS.NarasimhaRao@amd.com> | 3 个月前 | |
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ... | 2 年前 | |
backing-file: convert to using fops->splice_write mainline inclusion from mainline-v6.11-rc6 commit 996b37da1e0f51314d4186b326742c2a95a9f0dd category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=996b37da1e0f51314d4186b326742c2a95a9f0dd -------------------------------- Filesystems may define their own splice write. Therefore, use the file fops instead of invoking iter_file_splice_write() directly. Signed-off-by: Ed Tsai <ed.tsai@mediatek.com> Link: https://lore.kernel.org/r/20240708072208.25244-1-ed.tsai@mediatek.com Fixes: 5ca73468612d ("fuse: implement splice read/write passthrough") Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> | 1 年前 | |
fs: drop the timespec64 argument from update_time Now that all of the update_time operations are prepared for it, we can drop the timespec64 argument from the update_time operation. Do that and remove it from some associated functions like inode_update_time and inode_needs_update_time. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
regset: use kvzalloc() for regset_get_alloc() stable inclusion from stable-v6.6.140 commit abc6bdcbc0456014a74a49a8319e1450d45dda5c category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=abc6bdcbc0456014a74a49a8319e1450d45dda5c -------------------------------- commit 6b839b3b76cf17296ebd4a893841f32cae08229c upstream. While browsing through ChromeOS crash reports, I found one with an allocation failure that looked like this: chrome: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=urgent,mems_allowed=0 CPU: 7 PID: 3295 Comm: chrome Not tainted 5.15.133-20574-g8044615ac35c #1 (HASH:1162 1) Hardware name: Google Lazor (rev3 - 8) with KB Backlight (DT) Call trace: ... warn_alloc+0x104/0x174 __alloc_pages+0x5f0/0x6e4 kmalloc_order+0x44/0x98 kmalloc_order_trace+0x34/0x124 __kmalloc+0x228/0x36c __regset_get+0x68/0xcc regset_get_alloc+0x1c/0x28 elf_core_dump+0x3d8/0xd8c do_coredump+0xeb8/0x1378 get_signal+0x14c/0x804 ... An order 7 allocation is (1 << 7) contiguous pages, or 512K. It's not a surprise that this allocation failed on a system that's been running for a while. More digging showed that it was fairly easy to see the order 7 allocation by just sending a SIGQUIT to chrome (or other processes) to generate a core dump. The actual amount being allocated was 279,584 bytes and it was for "core_note_type" NT_ARM_SVE. There was quite a bit of discussion [1] on the mailing lists in response to my v1 patch attempting to switch to vmalloc. The overall conclusion was that we could likely reduce the 279,584 byte allocation by quite a bit and Mark Brown has sent a patch to that effect [2]. However even with the 279,584 byte allocation gone there are still 65,552 byte allocations. These are just barely more than the 65,536 bytes and thus would require an order 5 allocation. An order 5 allocation is still something to avoid unless necessary and nothing needs the memory here to be contiguous. Change the allocation to kvzalloc() which should still be efficient for small allocations but doesn't force the memory subsystem to work hard (and maybe fail) at getting a large contiguous chunk. [1] https://lore.kernel.org/r/20240201171159.1.Id9ad163b60d21c9e56c2d686b0cc9083a8ba7924@changeid [2] https://lore.kernel.org/r/20240203-arm64-sve-ptrace-regset-size-v1-1-2c3ba1386b9e@kernel.org Link: https://lkml.kernel.org/r/20240205092626.v2.1.Id9ad163b60d21c9e56c2d686b0cc9083a8ba7924@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Wen Yang <wen.yang@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4 mainline inclusion from mainline-v6.13-rc1 commit 4e6e8c2b757f382684abc4765202cd25c221dea1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC76OZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e6e8c2b757f382684abc4765202cd25c221dea1 ---------------------------------------------------------------------- AT_HWCAP3 and AT_HWCAP4 were recently defined for use on PowerPC in commit 3281366a8e79 ("uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries"). Since we want to start using AT_HWCAP3 on arm64 add support for exposing both these new hwcaps via binfmt_elf. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Kees Cook <kees@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Hongye Lin <linhongye@h-partners.com> | 1 年前 | |
binfmt_elf: Introduce KUnit test Adds simple KUnit test for some binfmt_elf internals: specifically a regression test for the problem fixed by commit 8904d9cd90ee ("ELF: fix overflow in total mapping size calculation"). $ ./tools/testing/kunit/kunit.py run --arch x86_64 \ --kconfig_add CONFIG_IA32_EMULATION=y '*binfmt_elf' ... [19:41:08] ================== binfmt_elf (1 subtest) ================== [19:41:08] [PASSED] total_mapping_size_test [19:41:08] =================== [PASSED] binfmt_elf ==================== [19:41:08] ============== compat_binfmt_elf (1 subtest) =============== [19:41:08] [PASSED] total_mapping_size_test [19:41:08] ================ [PASSED] compat_binfmt_elf ================ [19:41:08] ============================================================ [19:41:08] Testing complete. Passed: 2, Failed: 0, Crashed: 0, Skipped: 0, Errors: 0 Cc: Eric Biederman <ebiederm@xmission.com> Cc: David Gow <davidgow@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Magnus Groß" <magnus.gross@rwth-aachen.de> Cc: kunit-dev@googlegroups.com Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> --- v1: https://lore.kernel.org/lkml/20220224054332.1852813-1-keescook@chromium.org v2: - improve commit log - fix comment URL (Daniel) - drop redundant KUnit Kconfig help info (Daniel) - note in Kconfig help that COMPAT builds add a compat test (David) | 4 年前 | |
binfmt_flat: Fix integer overflow bug on 32 bit systems stable inclusion from stable-v6.6.78 commit d17ca8f2dfcf423c439859995910a20e38b86f00 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBNTYM Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d17ca8f2dfcf423c439859995910a20e38b86f00 -------------------------------- commit 55cf2f4b945f6a6416cc2524ba740b83cc9af25a upstream. Most of these sizes and counts are capped at 256MB so the math doesn't result in an integer overflow. The "relocs" count needs to be checked as well. Otherwise on 32bit systems the calculation of "full_data" could be wrong. full_data = data_len + relocs * sizeof(unsigned long); Fixes: c995ee28d29d ("binfmt_flat: prevent kernel dammage from corrupted executable headers") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Nicolas Pitre <npitre@baylibre.com> Link: https://lore.kernel.org/r/5be17f6c-5338-43be-91ef-650153b975cb@stanley.mountain Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tengda Wu <wutengda2@huawei.com> | 1 年前 | |
binfmt_misc: restore write access before closing files opened by open_exec() mainline inclusion from mainline-v6.18-rc7 commit 90f601b497d76f40fa66795c3ecf625b6aced9fd category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/11533 CVE: CVE-2025-68239 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90f601b497d76f40fa66795c3ecf625b6aced9fd -------------------------------- bm_register_write() opens an executable file using open_exec(), which internally calls do_open_execat() and denies write access on the file to avoid modification while it is being executed. However, when an error occurs, bm_register_write() closes the file using filp_close() directly. This does not restore the write permission, which may cause subsequent write operations on the same file to fail. Fix this by calling exe_file_allow_write_access() before filp_close() to restore the write permission properly. Fixes: e7850f4d844e ("binfmt_misc: fix possible deadlock in bm_register_write") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Link: https://patch.msgid.link/20251105022923.1813587-1-zilin@seu.edu.cn Signed-off-by: Christian Brauner <brauner@kernel.org> Conflicts: fs/binfmt_misc.c [Context conflicts as exe_file_allow_write_access() is introduced in commit 0357ef03c94e ("fs: don't block write during exec on pre-content watched files"), which is not merged. Use allow_write_access() instead.] Signed-off-by: Pan Taixi <pantaixi1@huawei.com> | 5 个月前 | |
Merge branch 'akpm' (patches from Andrew) Merge yet more updates from Andrew Morton: - More MM work. 100ish more to go. Mike Rapoport's "mm: remove __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue - Various other little subsystems * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) lib/ubsan.c: fix gcc-10 warnings tools/testing/selftests/vm: remove duplicate headers selftests: vm: pkeys: fix multilib builds for x86 selftests: vm: pkeys: use the correct page size on powerpc selftests/vm/pkeys: override access right definitions on powerpc selftests/vm/pkeys: test correct behaviour of pkey-0 selftests/vm/pkeys: introduce a sub-page allocator selftests/vm/pkeys: detect write violation on a mapped access-denied-key page selftests/vm/pkeys: associate key on a mapped page and detect write violation selftests/vm/pkeys: associate key on a mapped page and detect access violation selftests/vm/pkeys: improve checks to determine pkey support selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: fix number of reserved powerpc pkeys selftests/vm/pkeys: introduce powerpc support selftests/vm/pkeys: introduce generic pkey abstractions selftests: vm: pkeys: use the correct huge page size selftests/vm/pkeys: fix alloc_random_pkey() to make it really random selftests/vm/pkeys: fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: fix pkey_disable_clear() selftests: vm: pkeys: add helpers for pkey bits ... | 5 年前 | |
fs/buffer: add alert in try_to_free_buffers() for folios without buffers stable inclusion from stable-v6.6.128 commit c1b6227555c52781178132b7a06466711855795c category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c1b6227555c52781178132b7a06466711855795c -------------------------------- commit c1b6227555c52781178132b7a06466711855795c upstream. [ Upstream commit b68f91ef3b3fe82ad78c417de71b675699a8467c ] try_to_free_buffers() can be called on folios with no buffers attached when filemap_release_folio() is invoked on a folio belonging to a mapping with AS_RELEASE_ALWAYS set but no release_folio operation defined. In such cases, folio_needs_release() returns true because of the AS_RELEASE_ALWAYS flag, but the folio has no private buffer data. This causes try_to_free_buffers() to call drop_buffers() on a folio with no buffers, leading to a null pointer dereference. Adding a check in try_to_free_buffers() to return early if the folio has no buffers attached, with WARN_ON_ONCE() to alert about the misconfiguration. This provides defensive hardening. Signed-off-by: Deepakkumar Karn <dkarn@redhat.com> Link: https://patch.msgid.link/20251211131211.308021-1-dkarn@redhat.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
vfs: Replace all non-returning strlcpy with strscpy strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <20230510221119.3508930-1-azeemshaikh38@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> | 3 年前 | |
binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4 mainline inclusion from mainline-v6.13-rc1 commit 4e6e8c2b757f382684abc4765202cd25c221dea1 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC76OZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e6e8c2b757f382684abc4765202cd25c221dea1 ---------------------------------------------------------------------- AT_HWCAP3 and AT_HWCAP4 were recently defined for use on PowerPC in commit 3281366a8e79 ("uapi/auxvec: Define AT_HWCAP3 and AT_HWCAP4 aux vector, entries"). Since we want to start using AT_HWCAP3 on arm64 add support for exposing both these new hwcaps via binfmt_elf. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Kees Cook <kees@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Hongye Lin <linhongye@h-partners.com> | 1 年前 | |
coredump: skip coredump during critical error if task facing critical RAS hulk inclusion category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8787 ------------------------------------------ After a critical RAS event is triggered, user data for this task becomes inaccessible. In critical error scenarios, user memory as a whole becomes inaccessible, and therefore, the coredump must be skipped. Furthermore, dump_vma_snapshot performs a copy operation specifically on those VMA(s) that are considered "potentially" containing an ELF header. This process involves multiple accesses to user memory. Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> | 3 个月前 | |
fs: d_path: include internal.h make W=1 warns about a missing prototype that is defined but not visible at point where simple_dname() is defined: fs/d_path.c:317:7: error: no previous prototype for 'simple_dname' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Message-Id: <20230516195444.551461-1-arnd@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 3 年前 | |
dax: skip read lock assertion for read-only filesystems stable inclusion from stable-v6.6.114 commit 627a7ebd8954ca2281c36ee706bff094c1949428 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8637 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=627a7ebd8954ca2281c36ee706bff094c1949428 -------------------------------- [ Upstream commit 154d1e7ad9e5ce4b2aaefd3862b3dba545ad978d ] The commit 168316db3583("dax: assert that i_rwsem is held exclusive for writes") added lock assertions to ensure proper locking in DAX operations. However, these assertions trigger false-positive lockdep warnings since read lock is unnecessary on read-only filesystems(e.g., erofs). This patch skips the read lock assertion for read-only filesystems, eliminating the spurious warnings while maintaining the integrity checks for writable filesystems. Fixes: 168316db3583 ("dax: assert that i_rwsem is held exclusive for writes") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Friendy Su <friendy.su@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 627a7ebd8954ca2281c36ee706bff094c1949428) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
dcache: Limit the minimal number of bucket to two mailist inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13644 Reference: https://lore.kernel.org/linux-fsdevel/20260130034853.215819-1-chengzhihao1@huawei.com/ -------------------------------- There is an OOB read problem on dentry_hashtable when user sets 'dhash_entries=1': BUG: unable to handle page fault for address: ffff888b30b774b0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page Oops: Oops: 0000 [#1] SMP PTI RIP: 0010:__d_lookup+0x56/0x120 Call Trace: d_lookup.cold+0x16/0x5d lookup_dcache+0x27/0xf0 lookup_one_qstr_excl+0x2a/0x180 start_dirop+0x55/0xa0 simple_start_creating+0x8d/0xa0 debugfs_start_creating+0x8c/0x180 debugfs_create_dir+0x1d/0x1c0 pinctrl_init+0x6d/0x140 do_one_initcall+0x6d/0x3d0 kernel_init_freeable+0x39f/0x460 kernel_init+0x2a/0x260 There will be only one bucket in dentry_hashtable when dhash_entries is set as one, and d_hash_shift is calculated as 32 by dcache_init(). Then, following process will access more than one buckets(which memory region is not allocated) in dentry_hashtable: d_lookup b = d_hash(hash) dentry_hashtable + ((u32)hashlen >> d_hash_shift) // The C standard defines the behavior of right shift amounts // exceeding the bit width of the operand as undefined. The // result of '(u32)hashlen >> d_hash_shift' becomes 'hashlen', // so 'b' will point to an unallocated memory region. hlist_bl_for_each_entry_rcu(b) hlist_bl_first_rcu(head) h->first // read OOB! Fix it by limiting the minimal number of dentry_hashtable bucket to two, so that 'd_hash_shift' won't exceeds the bit width of type u32. Fixes: ceb5bdc2d246 ("fs: dcache per-bucket dcache hash locking") Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> | 3 个月前 | |
Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ... | 2 年前 | |
fs/dirty_pages: dump the number of dirty pages for each inode hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8OHM4?from=project-issue CVE: NA ---------------------------------------- In order to analyse the IO performance when using buffer IO, it's useful to obtain the number of dirty pages for each inode in the filesystem. This feature depends on 'CONFIG_DIRTY_PAGES'. It creates 3 interfaces by using procfs. /proc/dirty/buffer_size for buffer allocation and release, /proc/dirty/page_threshold to filter result and /proc/dirty/dirty_list to get dirty pages. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 2 年前 | |
fs: drop_caches: draining pages before dropping caches We expect a file page access after dropping caches should be a major fault, but sometimes it's still a minor fault. That's because a file page can't be dropped if it's in a per-cpu pagevec. Draining all pages from per-cpu pagevec to lru list before trying to drop caches. Link: https://lkml.kernel.org/r/20230630092203.16080-1-andrew.yang@mediatek.com Signed-off-by: Andrew Yang <andrew.yang@mediatek.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> | 2 年前 | |
eventfd: prevent underflow for eventfd semaphores For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. An underflow can happen with EFD_SEMAPHORE eventfds in at least the following three subsystems: (1) virt/kvm/eventfd.c (2) drivers/vfio/virqfd.c (3) drivers/virt/acrn/irqfd.c where (2) and (3) are just modeled after (1). An eventfd must be specified for use with the KVM_IRQFD ioctl(). This can also be an EFD_SEMAPHORE eventfd. When the eventfd count is zero or has been decremented to zero an underflow can be triggered when the irqfd is shut down by raising the KVM_IRQFD_FLAG_DEASSIGN flag in the KVM_IRQFD ioctl(): // ctx->count == 0 kvm_vm_ioctl() -> kvm_irqfd() -> kvm_irqfd_deassign() -> irqfd_deactivate() -> irqfd_shutdown() -> eventfd_ctx_remove_wait_queue(&cnt) -> eventfd_ctx_do_read(&cnt) Userspace polling on the eventfd wouldn't notice the underflow because 1 is always returned as the value from eventfd_read() while ctx->count would've underflowed. It's not a huge deal because this should only be happening when the irqfd is shutdown but we should still fix it and avoid the spurious wakeup. Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dylan Yudaken <dylany@fb.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <tencent_7588DFD1F365950A757310D764517A14B306@qq.com> [brauner: rewrite commit message and add explanation how this underflow can happen] Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
eventpoll: fix ep_remove struct eventpoll / struct file UAF stable inclusion from stable-v6.18.33 commit ef4ca02e95363e78977ca04340d44fe3b4b2b81f category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15497 CVE: CVE-2026-46242 Reference: https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ef4ca02e95363e78977ca04340d44fe3b4b2b81f -------------------------------- [ Upstream commit a6dc643c69311677c574a0f17a3f4d66a5f3744b ] ep_remove() (via ep_remove_file()) cleared file->f_ep under file->f_lock but then kept using @file inside the critical section (is_file_epoll(), hlist_del_rcu() through the head, spin_unlock). A concurrent __fput() taking the eventpoll_release() fastpath in that window observed the transient NULL, skipped eventpoll_release_file() and ran to f_op->release / file_free(). For the epoll-watches-epoll case, f_op->release is ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which kfree()s the watched struct eventpoll. Its embedded ->refs hlist_head is exactly where epi->fllink.pprev points, so the subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed kmalloc-192 memory. In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot backing @file could be recycled by alloc_empty_file() -- reinitializing f_lock and f_ep -- while ep_remove() is still nominally inside that lock. The upshot is an attacker-controllable kmem_cache_free() against the wrong slab cache. Pin @file via epi_fget() at the top of ep_remove() and gate the critical section on the pin succeeding. With the pin held @file cannot reach refcount zero, which holds __fput() off and transitively keeps the watched struct eventpoll alive across the hlist_del_rcu() and the f_lock use, closing both UAFs. If the pin fails @file has already reached refcount zero and its __fput() is in flight. Because we bailed before clearing f_ep, that path takes the eventpoll_release() slow path into eventpoll_release_file() and blocks on ep->mtx until the waiter side's ep_clear_and_put() drops it. The bailed epi's share of ep->refcount stays intact, so the trailing ep_refcount_dec_and_test() in ep_clear_and_put() cannot free the eventpoll out from under eventpoll_release_file(); the orphaned epi is then cleaned up there. A successful pin also proves we are not racing eventpoll_release_file() on this epi, so drop the now-redundant re-check of epi->dying under f_lock. The cheap lockless READ_ONCE(epi->dying) fast-path bailout stays. Fixes: 58c9b016e128 ("epoll: use refcount to reduce ep_mutex contention") Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr> Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-6-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: fs/eventpoll.c [Cleanup argument(fput) not a function on file. Call fput() manually.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 25 天前 | |
exec: Fix incorrect type for ret stable inclusion from stable-v6.6.115 commit 813d3d18cfe4759796f096dbd85f9591a6b8aed7 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8637 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=813d3d18cfe4759796f096dbd85f9591a6b8aed7 -------------------------------- [ Upstream commit 5e088248375d171b80c643051e77ade6b97bc386 ] In the setup_arg_pages(), ret is declared as an unsigned long. The ret might take a negative value. Therefore, its type should be changed to int. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250825073609.219855-1-zhao.xichao@vivo.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 813d3d18cfe4759796f096dbd85f9591a6b8aed7) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling stable inclusion from stable-v6.6.143 commit b5fa9e32fb6718f70c986ee14dd5d01b4846f331 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15648 CVE: CVE-2026-52946 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b5fa9e32fb6718f70c986ee14dd5d01b4846f331 -------------------------------- commit 00633c4683828acd5256fa8d5163f440d74bbe71 upstream. A SOFTIRQ-safe to SOFTIRQ-unsafe lock order deadlock can occur in send_sigio() and send_sigurg() when a process group receives a signal. When FASYNC is configured for a process group (PIDTYPE_PGID), both functions use read_lock(&tasklist_lock) to traverse the task list. However, they are frequently called from softirq context: - send_sigio() via input_inject_event -> kill_fasync - send_sigurg() via tcp_check_urg -> sk_send_sigurg (NET_RX_SOFTIRQ) The deadlock is caused by the rwlock writer fairness mechanism: 1. CPU 0 (process context) holds read_lock(&tasklist_lock) in do_wait(). 2. CPU 1 (process context) attempts write_lock(&tasklist_lock) in fork() or exit() and spins, which blocks all new readers. 3. CPU 0 is interrupted by a softirq (e.g., TCP URG packet reception). 4. The softirq calls send_sigurg() and attempts to acquire read_lock(&tasklist_lock), deadlocking because CPU 1 is waiting. Since PID hashing and do_each_pid_task() traversals are already RCU-protected, the read_lock on tasklist_lock is no longer strictly required for safe traversal. Fix this by replacing tasklist_lock with rcu_read_lock(), aligning the process group signaling path with the single-PID path. This also mitigates a potential remote denial of service vector via TCP URG packets. Lockdep splat: ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [...] Chain exists of: &dev->event_lock --> &f_owner->lock --> tasklist_lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(tasklist_lock); local_irq_disable(); lock(&dev->event_lock); lock(&f_owner->lock); <Interrupt> lock(&dev->event_lock); *** DEADLOCK *** Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Link: https://patch.msgid.link/20260523135210.590928-1-w15303746062@163.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 3 天前 | |
fs: Annotate struct file_handle with __counted_by() and use struct_size() stable inclusion from stable-v6.6.47 commit 107449cfb2176318600e4763c9d9d267b32b636a bugzilla: https://gitee.com/openeuler/kernel/issues/IAHMJO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=107449cfb2176318600e4763c9d9d267b32b636a -------------------------------- [ Upstream commit 68d6f4f3fbd9b1baae53e7cf33fb3362b5a21494 ] Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version. [brauner@kernel.org: contains a fix by Edward for an OOB access] Reported-by: syzbot+4139435cb1b34cf759c2@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_A7845DD769577306D813742365E976E3A205@qq.com Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZgImCXTdGDTeBvSS@neat Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> | 1 年前 | |
alloc_fdtable(): change calling conventions. stable inclusion from stable-v6.6.103 commit 40b36d9a612b8c099b639fe5a10059ad3638d9d1 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9181 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=40b36d9a612b8c099b639fe5a10059ad3638d9d1 -------------------------------- [ Upstream commit 1d3b4bec3ce55e0c46cdce7d0402dbd6b4af3a3d ] First of all, tell it how many slots do we want, not which slot is wanted. It makes one caller (dup_fd()) more straightforward and doesn't harm another (expand_fdtable()). Furthermore, make it return ERR_PTR() on failure rather than returning NULL. Simplifies the callers. Simplify the size calculation, while we are at it - note that we always have slots_wanted greater than BITS_PER_LONG. What the rules boil down to is * use the smallest power of two large enough to give us that many slots * on 32bit skip 64 and 128 - the minimal capacity we want there is 256 slots (i.e. 1Kb fd array). * on 64bit don't skip anything, the minimal capacity is 128 - and we'll never be asked for 64 or less. 128 slots means 1Kb fd array, again. * on 128bit, if that ever happens, don't skip anything - we'll never be asked for 128 or less, so the fd array allocation will be at least 2Kb. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: fs/file.c [723cc66127e1 ("filescgroup: add adapter for legacy and misc cgroup") has introduced attitional logic, a simple adaptation has been made to the conflicts in dup_fd().] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 1 个月前 | |
fs: store real path instead of fake path in backing file f_path mainline inclusion from mainline-v6.7-rc1 commit def3ae83da02f87005210fa3d448c5dd37ba4105 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=def3ae83da02f87005210fa3d448c5dd37ba4105 -------------------------------- A backing file struct stores two path's, one "real" path that is referring to f_inode and one "fake" path, which should be displayed to users in /proc/<pid>/maps. There is a lot more potential code that needs to know the "real" path, then code that needs to know the "fake" path. Instead of code having to request the "real" path with file_real_path(), store the "real" path in f_path and require code that needs to know the "fake" path request it with file_user_path(). Replace the file_real_path() helper with a simple const accessor f_path(). After this change, file_dentry() is not expected to observe any files with overlayfs f_path and real f_inode, so the call to ->d_real() should not be needed. Leave the ->d_real() call for now and add an assertion in ovl_d_real() to catch if we made wrong assumptions. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsnw@mail.gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231009153712.1566422-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> | 1 年前 | |
cgroup/misc: support cgroup misc to control fd hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8YC6O ---------------------------------------------------------------------- Cgroup misc provides the resource limiting and tracking mechanism for the scalar resources, it is more stable and concise to control fd than filescgroup(legacy-filescontrol), which controls fd as independent cgroup, and filescgroup will be abandoned in the future. We can use "filecg_misc" in cmdline to select misc-filescontrol to control fd. Notice: There is a difference between legacy-filescontrol and misc-filescontrol. Using misc-filescontrol, fd is charged to cgroup in which it is used first until it is freed. Migrating a process to different cgroup does not move to the charge to the destination. However, using legacy-filescontrol, migrating a process to different cgroup moves to the charge to destination. In cgroup v2, "dfl_cftypes" of filescgroup has been removed, it means that filescgroup will never be supported in cgroup v2. Interfaces of misc-filescontrol are shown as below: misc.capacity A read-only flat-keyed file shown only in the root cgroup. Getting fd available on the platform. Constantly, it is unlimited(U64_MAX). $ cat misc.capacity fd 18446744073709551615 misc.current Getting the current usage of the fd in the cgroup and its children. $ cat misc.current fd 3 misc.max A read-write flat-keyed file shown in the non root cgroups. Get/set maximum usage of the fd in the cgroup and its children. $ cat misc.max fd max Limit can be set by:: # echo fd 1024 > misc.max Signed-off-by: Chen Ridong <chenridong@huawei.com> | 2 年前 | |
fs/filesystems: Fix potential unsigned integer underflow in fs_name() stable inclusion from stable-v6.6.94 commit 84ead78a3cf8a0ffad72b6c6e19891a41c90b91c category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8365 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=84ead78a3cf8a0ffad72b6c6e19891a41c90b91c -------------------------------- [ Upstream commit 1363c134ade81e425873b410566e957fecebb261 ] fs_name() has @index as unsigned int, so there is underflow risk for operation '@index--'. Fix by breaking the for loop when '@index == 0' which is also more proper than '@index <= 0' for unsigned integer comparison. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250410-fix_fs-v1-1-7c14ccc8ebaa@quicinc.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 84ead78a3cf8a0ffad72b6c6e19891a41c90b91c) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 5 个月前 | |
writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 stable inclusion from stable-v6.6.123 commit 73408fa92742b88433ee23dd3136908fef13c307 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9158 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=73408fa92742b88433ee23dd3136908fef13c307 -------------------------------- [ Upstream commit 543467d6fe97e27e22a26e367fda972dbefebbff ] When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback() schedules delayed work with a delay of 0, causing immediate execution. The function then reschedules itself with 0 delay again, creating an infinite busy loop that causes 100% kworker CPU usage. Fix by: - Only scheduling delayed work in wakeup_dirtytime_writeback() when dirtytime_expire_interval is non-zero - Cancelling the delayed work in dirtytime_interval_handler() when the interval is set to 0 - Adding a guard in start_dirtytime_writeback() for defensive coding Tested by booting kernel in QEMU with virtme-ng: - Before fix: kworker CPU spikes to ~73% - After fix: CPU remains at normal levels - Setting interval back to non-zero correctly resumes writeback Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written") Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227 Signed-off-by: Laveesh Bansal <laveeshb@laveeshbansal.com> Link: https://patch.msgid.link/20260106145059.543282-2-laveeshb@laveeshbansal.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> [ adapted system_percpu_wq to system_wq for the workqueue used in dirtytime_interval_handler() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 73408fa92742b88433ee23dd3136908fef13c307) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 1 个月前 | |
fs: factor out vfs_parse_monolithic_sep() helper Factor out vfs_parse_monolithic_sep() from generic_parse_monolithic(), so filesystems could use it with a custom option separator callback. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> | 2 年前 | |
erofs: add erofs_ondemand switch hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- In order to make the user state better control the characteristics of the on demand load mode, a dynamic switch about the on-demand mode is added. The switch named cachefiles_ondemand_enabled is closed by default, and is isolated by CONFIG_CACHEFILES_ONDEMAND. Users can open it by echo 1/on/ON/y/Y to /sys/module/fs_ctl/parameters/cachefiles_ondemand_enabled. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 1 年前 | |
ext4: journal_path mount options should follow links Before the commit 461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter") ext4 mount option journal_path did follow links in the provided path. Bring this behavior back by allowing to pass pathwalk flags to fs_lookup_param(). Fixes: 461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20221004135803.32283-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org | 3 年前 | |
switch the remnants of releasing the mountpoint away from fs_pin We used to need rather convoluted ordering trickery to guarantee that dput() of ex-mountpoints happens before the final mntput() of the same. Since we don't need that anymore, there's no point playing with fs_pin for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> | 6 年前 | |
fs: add <linux/init_task.h> for 'init_fs' stable inclusion from stable-v6.6.128 commit 5560116126da42de09c784f3cba55c8f6fbabd61 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5560116126da42de09c784f3cba55c8f6fbabd61 -------------------------------- commit 5560116126da42de09c784f3cba55c8f6fbabd61 upstream. [ Upstream commit 589cff4975afe1a4eaaa1d961652f50b1628d78d ] The init_fs symbol is defined in <linux/init_task.h> but was not included in fs/fs_struct.c so fix by adding the include. Fixes the following sparse warning: fs/fs_struct.c:150:18: warning: symbol 'init_fs' was not declared. Should it be static? Fixes: 3e93cd671813e ("Take fs_struct handling to new file") Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://patch.msgid.link/20260108115856.238027-1-ben.dooks@codethink.co.uk Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
fs: common implementation of file type Many file systems use a copy&paste implementation of dirent to on-disk file type conversions. Create a common implementation to be used by file systems with some useful conversion helpers to reduce open coded file type conversions in file system code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Jan Kara <jack@suse.cz> | 7 年前 | |
fscontext: do not consume log entries when returning -EMSGSIZE stable inclusion from stable-v6.6.113 commit 9f13f727bed6819f43e12c7d21ab6721c8d3f976 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8637 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9f13f727bed6819f43e12c7d21ab6721c8d3f976 -------------------------------- commit 72d271a7baa7062cb27e774ac37c5459c6d20e22 upstream. Userspace generally expects APIs that return -EMSGSIZE to allow for them to adjust their buffer size and retry the operation. However, the fscontext log would previously clear the message even in the -EMSGSIZE case. Given that it is very cheap for us to check whether the buffer is too small before we remove the message from the ring buffer, let's just do that instead. While we're at it, refactor some fscontext_read() into a separate helper to make the ring buffer logic a bit easier to read. Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context") Cc: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/20250807-fscontext-log-cleanups-v3-1-8d91d6242dc3@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 9f13f727bed6819f43e12c7d21ab6721c8d3f976) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
fs: port ->permission() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> | 3 年前 | |
fs: rename __mnt_{want,drop}_write*() helpers mainline inclusion from mainline-v6.7-rc1 commit 3e15dcf77b23b8e9b9b7f3c0d4def8fe9c12c534 category: feature bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e15dcf77b23b8e9b9b7f3c0d4def8fe9c12c534 -------------------------------- Before exporting these helpers to modules, make their names more meaningful. The names mnt_{get,put)_write_access*() were chosen, because they rhyme with the inode {get,put)_write_access() helpers, which have a very close meaning for the inode object. Suggested-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230817-anfechtbar-ruhelosigkeit-8c6cca8443fc@brauner/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Message-Id: <20230908132900.2983519-2-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> | 1 年前 | |
mount: add OPEN_TREE_NAMESPACE mainline inclusion from mainline-v7.0-rc1 commit 9b8a0ba68246a61d903ce62c35c303b1501df28b category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/9218 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b8a0ba68246a61d903ce62c35c303b1501df28b -------------------------------- When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts. On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful. This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly. With a large mount table and a system where thousands or ten-thousands of containers are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore. Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs. The caller can setns() into that mount namespace and perform any additionally required setup such as move_mount() detached mounts in there. This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root(). A caller may for example choose to create an extremely minimal rootfs: fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts. This also works with user namespaces: unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs. Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Conflicts: fs/internal.h fs/namespace.c fs/nsfs.c [1. This kernel version does not have mnt_add_to_ns(), as commit 2eea9ce4310d ("mounts: keep list of mounts in an rbtree") not merged. Implemented mnt_add_tree_to_ns(). 2. This kernel version does not have __ns_tree_add_raw(), as commit 885fc8ac0a4d ("nstree: make iterator generic") not merged. Not affect to this patch. 3. This kernel version does not have path_from_stashed(), as commit 07fd7c329839 ("libfs: add path_from_stashed()") not merged. Implemented similar logic in open_namespace_file(). 4. There are a few other minor conflicts that do not affect the patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 30 天前 | |
lsm: new security_file_ioctl_compat() hook stable inclusion from stable-v6.6.15 commit 820831de220c89cd618cf2529c3d31df9635f708 bugzilla: https://gitee.com/openeuler/kernel/issues/I99TJK Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=820831de220c89cd618cf2529c3d31df9635f708 -------------------------------- commit f1bb47a31dff6d4b34fb14e99850860ee74bb003 upstream. Some ioctl commands do not require ioctl permission, but are routed to other permissions such as FILE_GETATTR or FILE_SETATTR. This routing is done by comparing the ioctl cmd to a set of 64-bit flags (FS_IOC_*). However, if a 32-bit process is running on a 64-bit kernel, it emits 32-bit flags (FS_IOC32_*) for certain ioctl operations. These flags are being checked erroneously, which leads to these ioctl operations being routed to the ioctl permission, rather than the correct file permissions. This was also noted in a RED-PEN finding from a while back - "/* RED-PEN how should LSM module know it's handling 32bit? */". This patch introduces a new hook, security_file_ioctl_compat(), that is called from the compat ioctl syscall. All current LSMs have been changed to support this hook. Reviewing the three places where we are currently using security_file_ioctl(), it appears that only SELinux needs a dedicated compat change; TOMOYO and SMACK appear to be functional without any change. Cc: stable@vger.kernel.org Fixes: 0b24dcb7f2f7 ("Revert "selinux: simplify ioctl checking"") Signed-off-by: Alfred Piccioni <alpic@google.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: subject tweak, line length fixes, and alignment corrections] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> | 2 年前 | |
fs: Fix kernel-doc warnings These have a variety of causes and a corresponding variety of solutions. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230818200824.2720007-1-willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
cgroup/misc: support cgroup misc to control fd hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8YC6O ---------------------------------------------------------------------- Cgroup misc provides the resource limiting and tracking mechanism for the scalar resources, it is more stable and concise to control fd than filescgroup(legacy-filescontrol), which controls fd as independent cgroup, and filescgroup will be abandoned in the future. We can use "filecg_misc" in cmdline to select misc-filescontrol to control fd. Notice: There is a difference between legacy-filescontrol and misc-filescontrol. Using misc-filescontrol, fd is charged to cgroup in which it is used first until it is freed. Migrating a process to different cgroup does not move to the charge to the destination. However, using legacy-filescontrol, migrating a process to different cgroup moves to the charge to destination. In cgroup v2, "dfl_cftypes" of filescgroup has been removed, it means that filescgroup will never be supported in cgroup v2. Interfaces of misc-filescontrol are shown as below: misc.capacity A read-only flat-keyed file shown only in the root cgroup. Getting fd available on the platform. Constantly, it is unlimited(U64_MAX). $ cat misc.capacity fd 18446744073709551615 misc.current Getting the current usage of the fd in the cgroup and its children. $ cat misc.current fd 3 misc.max A read-write flat-keyed file shown in the non root cgroups. Get/set maximum usage of the fd in the cgroup and its children. $ cat misc.max fd max Limit can be set by:: # echo fd 1024 > misc.max Signed-off-by: Chen Ridong <chenridong@huawei.com> | 2 年前 | |
better lockdep annotations for simple_recursive_removal() stable inclusion from stable-v6.6.103 commit 847a204d3067487f116e4d89ffa60767d7f2fc0b category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8365 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=847a204d3067487f116e4d89ffa60767d7f2fc0b -------------------------------- [ Upstream commit 2a8061ee5e41034eb14170ec4517b5583dbeff9f ] We want a class that nests outside of I_MUTEX_NORMAL (for the sake of callbacks that might want to lock the victim) and inside I_MUTEX_PARENT (so that a variant of that could be used with parent of the victim held locked by the caller). In reality, simple_recursive_removal() * never holds two locks at once * holds the lock on parent of dentry passed to callback * is used only on the trees with fixed topology, so the depths are not changing. So the locking order is actually fine. AFAICS, the best solution is to assign I_MUTEX_CHILD to the locks grabbed by that thing. Reported-by: syzbot+169de184e9defe7fe709@syzkaller.appspotmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 847a204d3067487f116e4d89ffa60767d7f2fc0b) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 4 个月前 | |
lockd: fix vfs_test_lock() calls stable inclusion from stable-v6.6.120 commit 46b9fd1433d2674e36cd78dcb8e8765106330d38 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b9fd1433d2674e36cd78dcb8e8765106330d38 -------------------------------- [ Upstream commit a49a2a1baa0c553c3548a1c414b6a3c005a8deba ] Usage of vfs_test_lock() is somewhat confused. Documentation suggests it is given a "lock" but this is not the case. It is given a struct file_lock which contains some details of the sort of lock it should be looking for. In particular passing a "file_lock" containing fl_lmops or fl_ops is meaningless and possibly confusing. This is particularly problematic in lockd. nlmsvc_testlock() receives an initialised "file_lock" from xdr-decode, including manager ops and an owner. It then mistakenly passes this to vfs_test_lock() which might replace the owner and the ops. This can lead to confusion when freeing the lock. The primary role of the 'struct file_lock' passed to vfs_test_lock() is to report a conflicting lock that was found, so it makes more sense for nlmsvc_testlock() to pass "conflock", which it uses for returning the conflicting lock. With this change, freeing of the lock is not confused and code in __nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified. Documentation for vfs_test_lock() is improved to reflect its real purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the future. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com Signed-off-by: NeilBrown <neil@brown.name> Fixes: 20fa19027286 ("nfs: add export operations") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ adapted fl->c.flc_* field references to flat fl->fl_* naming convention ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 46b9fd1433d2674e36cd78dcb8e8765106330d38) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 2 个月前 | |
fs/mbcache: cancel shrink work before destroying the cache mainline inclusion from mainline-v7.1-rc1 commit d227786ab1119669df4dc333a61510c52047cce4 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15965 CVE: CVE-2026-53129 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d227786ab1119669df4dc333a61510c52047cce4 -------------------------------- mb_cache_destroy() calls shrinker_free() and then frees all cache entries and the cache itself, but it does not cancel the pending c_shrink_work work item first. If mb_cache_entry_create() schedules c_shrink_work via schedule_work() and the work item is still pending or running when mb_cache_destroy() runs, mb_cache_shrink_worker() will access the cache after its memory has been freed, causing a use-after-free. This is only reachable by a privileged user (root or CAP_SYS_ADMIN) who can trigger the last put of a mounted ext2/ext4/ocfs2 filesystem. Cancel the work item with cancel_work_sync() before calling shrinker_free(), ensuring the worker has finished and will not be rescheduled before the cache is torn down. Fixes: c2f3140fe2ec ("mbcache2: limit cache size") Signed-off-by: Hyungjung Joo <jhj140711@gmail.com> Link: https://patch.msgid.link/20260317054556.1821600-1-jhj140711@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 2 天前 | |
cgroup/misc: support cgroup misc to control fd hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8YC6O ---------------------------------------------------------------------- Cgroup misc provides the resource limiting and tracking mechanism for the scalar resources, it is more stable and concise to control fd than filescgroup(legacy-filescontrol), which controls fd as independent cgroup, and filescgroup will be abandoned in the future. We can use "filecg_misc" in cmdline to select misc-filescontrol to control fd. Notice: There is a difference between legacy-filescontrol and misc-filescontrol. Using misc-filescontrol, fd is charged to cgroup in which it is used first until it is freed. Migrating a process to different cgroup does not move to the charge to the destination. However, using legacy-filescontrol, migrating a process to different cgroup moves to the charge to destination. In cgroup v2, "dfl_cftypes" of filescgroup has been removed, it means that filescgroup will never be supported in cgroup v2. Interfaces of misc-filescontrol are shown as below: misc.capacity A read-only flat-keyed file shown only in the root cgroup. Getting fd available on the platform. Constantly, it is unlimited(U64_MAX). $ cat misc.capacity fd 18446744073709551615 misc.current Getting the current usage of the fd in the cgroup and its children. $ cat misc.current fd 3 misc.max A read-write flat-keyed file shown in the non root cgroups. Get/set maximum usage of the fd in the cgroup and its children. $ cat misc.max fd max Limit can be set by:: # echo fd 1024 > misc.max Signed-off-by: Chen Ridong <chenridong@huawei.com> | 2 年前 | |
fs: move mnt_idmap Now that we converted everything to just rely on struct mnt_idmap move it all into a separate file. This ensure that no code can poke around in struct mnt_idmap without any dedicated helpers and makes it easier to extend it in the future. Filesystems will now not be able to conflate mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. We are now also able to extend struct mnt_idmap as we see fit. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> | 3 年前 | |
fs: move {lock, unlock}_mount_hash to fs/mount.h hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8PWEP CVE: NA -------------------------------------- commit d033cb6784c4 ("mount: make {lock,unlock}_mount_hash() static") moves {lock, unlock}_mount_hash to fs/namespace.c due to the two functions are only used in the file. memcg_memfs_info feature needs to reference the two functions, move them back to fs/mount.h. Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> | 2 年前 | |
mpage: fix softlockup in mpage_readahead() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBYXQD --------------------------- When using null_blk with a large readahead window, calling madvise to trigger readahead can lead to system softlocks. This occurs because the CPU can remain in the mpage_readahead loop for extended periods without rescheduling, preventing other processes from running. The issue manifests as follows: Call Trace: bio_add_page+0x7f/0x120 bio_add_folio+0x24/0x40 do_mpage_readpage+0x6de/0xbc0 xas_load+0x1e/0x240 xa_load+0x8f/0xf0 mpage_readahead+0xe7/0x230 __pfx_blkdev_get_block+0x10/0x10 blkdev_readahead+0x14/0x30 read_pages+0x71/0x460 folio_add_lru+0x6a/0x140 ? filemap_add_folio+0xb6/0x190 page_cache_ra_unbounded+0x197/0x2a0 do_page_cache_ra+0x42/0x70 page_cache_ra_order+0x360/0x610 filemap_fault+0x7d8/0x1200 __do_fault+0x3e/0x380 do_pte_missing+0x628/0x1cd0 ? __pte_offset_map+0x1a/0x200 __handle_mm_fault+0x7f2/0x19b0 ? __pte_offset_map+0x1a/0x200 ? __pte_offset_map_lock+0x78/0x170 handle_mm_fault+0x275/0x560 __get_user_pages+0x1f0/0xbd0 faultin_page_range+0xae/0x460 madvise_vma_behavior+0x62e/0xfa0 ? find_vma_prev+0x85/0xe0 do_madvise+0x1eb/0x5a0 __x64_sys_madvise+0x2b/0x40 x64_sys_call+0x23b3/0x4620 do_syscall_64+0x9e/0x1b0 ? irqentry_exit_to_user_mode+0x7d/0x300 entry_SYSCALL_64_after_hwframe+0x78/0xe2 Add a cond_resched() call at the beginning of each loop iteration to allow the CPU to schedule other tasks when necessary, avoiding potential softlocks during heavy readahead operations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> | 1 年前 | |
openat2: don't trigger automounts with RESOLVE_NO_XDEV stable inclusion from stable-v6.6.113 commit dd21dc8d74518e0f96b7f2c7fd9e18ce539e5278 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8637 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd21dc8d74518e0f96b7f2c7fd9e18ce539e5278 -------------------------------- commit 042a60680de43175eb4df0977ff04a4eba9da082 upstream. openat2 had a bug: if we pass RESOLVE_NO_XDEV, then openat2 doesn't traverse through automounts, but may still trigger them. (See the link for full bug report with reproducer.) This commit fixes this bug. Link: https://lore.kernel.org/linux-fsdevel/20250817075252.4137628-1-safinaskar@zohomail.com/ Fixes: fddb5d430ad9fa91b49b1 ("open: introduce openat2(2) syscall") Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Cc: stable@vger.kernel.org Signed-off-by: Askar Safin <safinaskar@zohomail.com> Link: https://lore.kernel.org/20250825181233.2464822-5-safinaskar@zohomail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit dd21dc8d74518e0f96b7f2c7fd9e18ce539e5278) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
mount: hold namespace_sem across copy in create_new_namespace() mainline inclusion from mainline-v7.0-rc2 commit a41dbf5e004edbe1260883c43a8bd134d9cb0c1c category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9218 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a41dbf5e004edbe1260883c43a8bd134d9cb0c1c -------------------------------- Fix an oversight when creating a new mount namespace. If someone had the bright idea to make the real rootfs a shared or dependent mount and it is later copied the copy will become a peer of the old real rootfs mount or a dependent mount of it. The namespace semaphore is dropped and we use mount lock exact to lock the new real root mount. If that fails or the subsequent do_loopback() fails we rely on the copy of the real root mount to be cleaned up by path_put(). The problem is that this doesn't deal with mount propagation and will leave the mounts linked in the propagation lists. When creating a new mount namespace create_new_namespace() first acquires namespace_sem to clone the nullfs root, drops it, then reacquires it via LOCK_MOUNT_EXACT which takes inode_lock first to respect the inode_lock -> namespace_sem lock ordering. This drop-and-reacquire pattern is fragile and was the source of the propagation cleanup bug fixed in the preceding commit. Extend lock_mount_exact() with a copy_mount mode that clones the mount under the locks atomically. When copy_mount is true, path_overmounted() is skipped since we're copying the mount, not mounting on top of it - the nullfs root always has rootfs mounted on top so the check would always fail. If clone_mnt() fails after get_mountpoint() has pinned the mountpoint, __unlock_mount() is used to properly unpin the mountpoint and release both locks. This allows create_new_namespace() to use LOCK_MOUNT_EXACT_COPY which takes inode_lock and namespace_sem once and holds them throughout the clone and subsequent mount operations, eliminating the drop-and-reacquire pattern entirely. Reported-by: syzbot+a89f9434fb5a001ccd58@syzkaller.appspotmail.com Fixes: 9b8a0ba68246 ("mount: add OPEN_TREE_NAMESPACE") # mainline only Link: https://lore.kernel.org/699047f6.050a0220.2757fb.0024.GAE@google.com Signed-off-by: Christian Brauner <brauner@kernel.org> Conflicts: fs/namespace.c [Simple context conflicts, not affect this patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 30 天前 | |
mount: add OPEN_TREE_NAMESPACE mainline inclusion from mainline-v7.0-rc1 commit 9b8a0ba68246a61d903ce62c35c303b1501df28b category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/9218 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b8a0ba68246a61d903ce62c35c303b1501df28b -------------------------------- When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts. On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful. This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly. With a large mount table and a system where thousands or ten-thousands of containers are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore. Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs. The caller can setns() into that mount namespace and perform any additionally required setup such as move_mount() detached mounts in there. This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root(). A caller may for example choose to create an extremely minimal rootfs: fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts. This also works with user namespaces: unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs. Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Conflicts: fs/internal.h fs/namespace.c fs/nsfs.c [1. This kernel version does not have mnt_add_to_ns(), as commit 2eea9ce4310d ("mounts: keep list of mounts in an rbtree") not merged. Implemented mnt_add_tree_to_ns(). 2. This kernel version does not have __ns_tree_add_raw(), as commit 885fc8ac0a4d ("nstree: make iterator generic") not merged. Not affect to this patch. 3. This kernel version does not have path_from_stashed(), as commit 07fd7c329839 ("libfs: add path_from_stashed()") not merged. Implemented similar logic in open_namespace_file(). 4. There are a few other minor conflicts that do not affect the patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> | 30 天前 | |
allow finish_no_open(file, ERR_PTR(-E...)) stable inclusion from stable-v6.6.117 commit 2611313d3bf45e66f5dd19b660260ed190de0473 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8763 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2611313d3bf45e66f5dd19b660260ed190de0473 -------------------------------- [ Upstream commit fe91e078b60d1beabf5cef4a37c848457a6d2dfb ] ... allowing any ->lookup() return value to be passed to it. Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 2611313d3bf45e66f5dd19b660260ed190de0473) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 3 个月前 | |
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue mainline inclusion from mainline-v6.7-rc1 commit 478dbf12176700f28d836dd03ae93a6888278230 category: performance bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICZJGM CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=478dbf12176700f28d836dd03ae93a6888278230 -------------------------------- If there is no watch_queue, holding the pipe mutex is enough to prevent concurrent writes, and we can avoid the spinlock. O_NOTIFICATION_QUEUE is an exotic and rarely used feature, and of all the pipes that exist at any given time, only very few actually have a watch_queue, therefore it appears worthwile to optimize the common case. This patch does not optimize pipe_resize_ring() where the spinlocks could be avoided as well; that does not seem like a worthwile optimization because this function is not called often. Related commits: - commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") - commit b667b8673443 ("pipe: Advance tail pointer inside of wait spinlock in pipe_read()") - commit 189b0ddc2451 ("pipe: Fix missing lock in pipe_resize_ring()") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Message-Id: <20230921075755.1378787-4-max.kellermann@ionos.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 8 个月前 | |
do_make_slave(): choose new master sanely mainline inclusion from mainline-v6.17-rc1 commit 955336e204ab59301ff8b1f75a98a226f5a98782 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID192S CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=955336e204ab59301ff8b1f75a98a226f5a98782 -------------------------------- When mount changes propagation type so that it doesn't propagate events any more (MS_PRIVATE, MS_SLAVE, MS_UNBINDABLE), we need to make sure that event propagation between other mounts is unaffected. We need to make sure that events from peers and master of that mount (if any) still reach everything that used to be on its ->mnt_slave_list. If mount has neither peers nor master, we simply need to dissolve its ->mnt_slave_list and clear ->mnt_master of everything in there. If mount has peers, we transfer everything in ->mnt_slave_list of this mount into that of some of those peers (and adjust ->mnt_master accordingly). If mount has a master but no peers, we transfer everything in ->mnt_slave_list of this mount into that of its master (adjusting ->mnt_master, etc.). There are two problems with the current implementation: * there's a long-obsolete logics in choosing the peer - once upon a time it made sense to prefer the peer that had the same ->mnt_root as our mount, but that had been pointless since 2014 ("smarter propagate_mnt()") * the most common caller of that thing is umount_tree() taking the mounts out of propagation graph. In that case it's possible to have ->mnt_slave_list contents moved many times, since the replacement master is likely to be taken out by the same umount_tree(), etc. Take the choice of replacement master into a separate function (propagation_source()) and teach it to skip the candidates that are going to be taken out. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/pnode.c [adapt for will_be_unmounted] Signed-off-by: Yang Erkun <yangerkun@huawei.com> | 8 个月前 | |
fs: allow to mount beneath top mount Various distributions are adding or are in the process of adding support for system extensions and in the future configuration extensions through various tools. A more detailed explanation on system and configuration extensions can be found on the manpage which is listed below at [1]. System extension images may – dynamically at runtime — extend the /usr/ and /opt/ directory hierarchies with additional files. This is particularly useful on immutable system images where a /usr/ and/or /opt/ hierarchy residing on a read-only file system shall be extended temporarily at runtime without making any persistent modifications. When one or more system extension images are activated, their /usr/ and /opt/ hierarchies are combined via overlayfs with the same hierarchies of the host OS, and the host /usr/ and /opt/ overmounted with it ("merging"). When they are deactivated, the mount point is disassembled — again revealing the unmodified original host version of the hierarchy ("unmerging"). Merging thus makes the extension's resources suddenly appear below the /usr/ and /opt/ hierarchies as if they were included in the base OS image itself. Unmerging makes them disappear again, leaving in place only the files that were shipped with the base OS image itself. System configuration images are similar but operate on directories containing system or service configuration. On nearly all modern distributions mount propagation plays a crucial role and the rootfs of the OS is a shared mount in a peer group (usually with peer group id 1): TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:1 29 1 On such systems all services and containers run in a separate mount namespace and are pivot_root()ed into their rootfs. A separate mount namespace is almost always used as it is the minimal isolation mechanism services have. But usually they are even much more isolated up to the point where they almost become indistinguishable from containers. Mount propagation again plays a crucial role here. The rootfs of all these services is a slave mount to the peer group of the host rootfs. This is done so the service will receive mount propagation events from the host when certain files or directories are updated. In addition, the rootfs of each service, container, and sandbox is also a shared mount in its separate peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:24 master:1 71 47 For people not too familiar with mount propagation, the master:1 means that this is a slave mount to peer group 1. Which as one can see is the host rootfs as indicated by shared:1 above. The shared:24 indicates that the service rootfs is a shared mount in a separate peer group with peer group id 24. A service may run other services. Such nested services will also have a rootfs mount that is a slave to the peer group of the outer service rootfs mount. For containers things are just slighly different. A container's rootfs isn't a slave to the service's or host rootfs' peer group. The rootfs mount of a container is simply a shared mount in its own peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /home/ubuntu/debian-tree / ext4 shared:99 61 60 So whereas services are isolated OS components a container is treated like a separate world and mount propagation into it is restricted to a single well known mount that is a slave to the peer group of the shared mount /run on the host: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /propagate/debian-tree /run/host/incoming tmpfs master:5 71 68 Here, the master:5 indicates that this mount is a slave to the peer group with peer group id 5. This allows to propagate mounts into the container and served as a workaround for not being able to insert mounts into mount namespaces directly. But the new mount api does support inserting mounts directly. For the interested reader the blogpost in [2] might be worth reading where I explain the old and the new approach to inserting mounts into mount namespaces. Containers of course, can themselves be run as services. They often run full systems themselves which means they again run services and containers with the exact same propagation settings explained above. The whole system is designed so that it can be easily updated, including all services in various fine-grained ways without having to enter every single service's mount namespace which would be prohibitively expensive. The mount propagation layout has been carefully chosen so it is possible to propagate updates for system extensions and configurations from the host into all services. The simplest model to update the whole system is to mount on top of /usr, /opt, or /etc on the host. The new mount on /usr, /opt, or /etc will then propagate into every service. This works cleanly the first time. However, when the system is updated multiple times it becomes necessary to unmount the first update on /opt, /usr, /etc and then propagate the new update. But this means, there's an interval where the old base system is accessible. This has to be avoided to protect against downgrade attacks. The vfs already exposes a mechanism to userspace whereby mounts can be mounted beneath an existing mount. Such mounts are internally referred to as "tucked". The patch series exposes the ability to mount beneath a top mount through the new MOVE_MOUNT_BENEATH flag for the move_mount() system call. This allows userspace to seamlessly upgrade mounts. After this series the only thing that will have changed is that mounting beneath an existing mount can be done explicitly instead of just implicitly. Today, there are two scenarios where a mount can be mounted beneath an existing mount instead of on top of it: (1) When a service or container is started in a new mount namespace and pivot_root()s into its new rootfs. The way this is done is by mounting the new rootfs beneath the old rootfs: fd_newroot = open("/var/lib/machines/fedora", ...); fd_oldroot = open("/", ...); fchdir(fd_newroot); pivot_root(".", "."); After the pivot_root(".", ".") call the new rootfs is mounted beneath the old rootfs which can then be unmounted to reveal the underlying mount: fchdir(fd_oldroot); umount2(".", MNT_DETACH); Since pivot_root() moves the caller into a new rootfs no mounts must be propagated out of the new rootfs as a consequence of the pivot_root() call. Thus, the mounts cannot be shared. (2) When a mount is propagated to a mount that already has another mount mounted on the same dentry. The easiest example for this is to create a new mount namespace. The following commands will create a mount namespace where the rootfs mount / will be a slave to the peer group of the host rootfs / mount's peer group. IOW, it will receive propagation from the host: mount --make-shared / unshare --mount --propagation=slave Now a new mount on the /mnt dentry in that mount namespace is created. (As it can be confusing it should be spelled out that the tmpfs mount on the /mnt dentry that was just created doesn't propagate back to the host because the rootfs mount / of the mount namespace isn't a peer of the host rootfs.): mount -t tmpfs tmpfs /mnt TARGET SOURCE FSTYPE PROPAGATION └─/mnt tmpfs tmpfs Now another terminal in the host mount namespace can observe that the mount indeed hasn't propagated back to into the host mount namespace. A new mount can now be created on top of the /mnt dentry with the rootfs mount / as its parent: mount --bind /opt /mnt TARGET SOURCE FSTYPE PROPAGATION └─/mnt /dev/sda2[/opt] ext4 shared:1 The mount namespace that was created earlier can now observe that the bind mount created on the host has propagated into it: TARGET SOURCE FSTYPE PROPAGATION └─/mnt /dev/sda2[/opt] ext4 master:1 └─/mnt tmpfs tmpfs But instead of having been mounted on top of the tmpfs mount at the /mnt dentry the /opt mount has been mounted on top of the rootfs mount at the /mnt dentry. And the tmpfs mount has been remounted on top of the propagated /opt mount at the /opt dentry. So in other words, the propagated mount has been mounted beneath the preexisting mount in that mount namespace. Mount namespaces make this easy to illustrate but it's also easy to mount beneath an existing mount in the same mount namespace (The following example assumes a shared rootfs mount / with peer group id 1): mount --bind /opt /opt TARGET SOURCE FSTYPE MNT_ID PARENT_ID PROPAGATION └─/opt /dev/sda2[/opt] ext4 188 29 shared:1 If another mount is mounted on top of the /opt mount at the /opt dentry: mount --bind /tmp /opt The following clunky mount tree will result: TARGET SOURCE FSTYPE MNT_ID PARENT_ID PROPAGATION └─/opt /dev/sda2[/tmp] ext4 405 29 shared:1 └─/opt /dev/sda2[/opt] ext4 188 405 shared:1 └─/opt /dev/sda2[/tmp] ext4 404 188 shared:1 The /tmp mount is mounted beneath the /opt mount and another copy is mounted on top of the /opt mount. This happens because the rootfs / and the /opt mount are shared mounts in the same peer group. When the new /tmp mount is supposed to be mounted at the /opt dentry then the /tmp mount first propagates to the root mount at the /opt dentry. But there already is the /opt mount mounted at the /opt dentry. So the old /opt mount at the /opt dentry will be mounted on top of the new /tmp mount at the /tmp dentry, i.e. @opt->mnt_parent is @tmp and @opt->mnt_mountpoint is /tmp (Note that @opt->mnt_root is /opt which is what shows up as /opt under SOURCE). So again, a mount will be mounted beneath a preexisting mount. (Fwiw, a few iterations of mount --bind /opt /opt in a loop on a shared rootfs is a good example of what could be referred to as mount explosion.) The main point is that such mounts allows userspace to umount a top mount and reveal an underlying mount. So for example, umounting the tmpfs mount on /mnt that was created in example (1) using mount namespaces reveals the /opt mount which was mounted beneath it. In (2) where a mount was mounted beneath the top mount in the same mount namespace unmounting the top mount would unmount both the top mount and the mount beneath. In the process the original mount would be remounted on top of the rootfs mount / at the /opt dentry again. This again, is a result of mount propagation only this time it's umount propagation. However, this can be avoided by simply making the parent mount / of the @opt mount a private or slave mount. Then the top mount and the original mount can be unmounted to reveal the mount beneath. These two examples are fairly arcane and are merely added to make it clear how mount propagation has effects on current and future features. More common use-cases will just be things like: mount -t btrfs /dev/sdA /mnt mount -t xfs /dev/sdB --beneath /mnt umount /mnt after which we'll have updated from a btrfs filesystem to a xfs filesystem without ever revealing the underlying mountpoint. The crux is that the proposed mechanism already exists and that it is so powerful as to cover cases where mounts are supposed to be updated with new versions. Crucially, it offers an important flexibility. Namely that updates to a system may either be forced or can be delayed and the umount of the top mount be left to a service if it is a cooperative one. This adds a new flag to move_mount() that allows to explicitly move a beneath the top mount adhering to the following semantics: * Mounts cannot be mounted beneath the rootfs. This restriction encompasses the rootfs but also chroots via chroot() and pivot_root(). To mount a mount beneath the rootfs or a chroot, pivot_root() can be used as illustrated above. * The source mount must be a private mount to force the kernel to allocate a new, unused peer group id. This isn't a required restriction but a voluntary one. It avoids repeating a semantical quirk that already exists today. If bind mounts which already have a peer group id are inserted into mount trees that have the same peer group id this can cause a lot of mount propagation events to be generated (For example, consider running mount --bind /opt /opt in a loop where the parent mount is a shared mount.). * Avoid getting rid of the top mount in the kernel. Cooperative services need to be able to unmount the top mount themselves. This also avoids a good deal of additional complexity. The umount would have to be propagated which would be another rather expensive operation. So namespace_lock() and lock_mount_hash() would potentially have to be held for a long time for both a mount and umount propagation. That should be avoided. * The path to mount beneath must be mounted and attached. * The top mount and its parent must be in the caller's mount namespace and the caller must be able to mount in that mount namespace. * The caller must be able to unmount the top mount to prove that they could reveal the underlying mount. * The propagation tree is calculated based on the destination mount's parent mount and the destination mount's mountpoint on the parent mount. Of course, if the parent of the destination mount and the destination mount are shared mounts in the same peer group and the mountpoint of the new mount to be mounted is a subdir of their ->mnt_root then both will receive a mount of /opt. That's probably easier to understand with an example. Assuming a standard shared rootfs /: mount --bind /opt /opt mount --bind /tmp /opt will cause the same mount tree as: mount --bind /opt /opt mount --beneath /tmp /opt because both / and /opt are shared mounts/peers in the same peer group and the /opt dentry is a subdirectory of both the parent's and the child's ->mnt_root. If a mount tree like that is created it almost always is an accident or abuse of mount propagation. Realistically what most people probably mean in this scenarios is: mount --bind /opt /opt mount --make-private /opt mount --make-shared /opt This forces the allocation of a new separate peer group for the /opt mount. Aferwards a mount --bind or mount --beneath actually makes sense as the / and /opt mount belong to different peer groups. Before that it's likely just confusion about what the user wanted to achieve. * Refuse MOVE_MOUNT_BENEATH if: (1) the @mnt_from has been overmounted in between path resolution and acquiring @namespace_sem when locking @mnt_to. This avoids the proliferation of shadow mounts. (2) if @to_mnt is moved to a different mountpoint while acquiring @namespace_sem to lock @to_mnt. (3) if @to_mnt is unmounted while acquiring @namespace_sem to lock @to_mnt. (4) if the parent of the target mount propagates to the target mount at the same mountpoint. This would mean mounting @mnt_from on @mnt_to->mnt_parent and then propagating a copy @c of @mnt_from onto @mnt_to. This defeats the whole purpose of mounting @mnt_from beneath @mnt_to. (5) if the parent mount @mnt_to->mnt_parent propagates to @mnt_from at the same mountpoint. If @mnt_to->mnt_parent propagates to @mnt_from this would mean propagating a copy @c of @mnt_from on top of @mnt_from. Afterwards @mnt_from would be mounted on top of @mnt_to->mnt_parent and @mnt_to would be unmounted from @mnt->mnt_parent and remounted on @mnt_from. But since @c is already mounted on @mnt_from, @mnt_to would ultimately be remounted on top of @c. Afterwards, @mnt_from would be covered by a copy @c of @mnt_from and @c would be covered by @mnt_from itself. This defeats the whole purpose of mounting @mnt_from beneath @mnt_to. Cases (1) to (3) are required as they deal with races that would cause bugs or unexpected behavior for users. Cases (4) and (5) refuse semantical quirks that would not be a bug but would cause weird mount trees to be created. While they can already be created via other means (mount --bind /opt /opt x n) there's no reason to repeat past mistakes in new features. Link: https://man7.org/linux/man-pages/man8/systemd-sysext.8.html [1] Link: https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html [2] Link: https://github.com/flatcar/sysext-bakery Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1 Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2 Link: https://github.com/systemd/systemd/pull/26013 Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Message-Id: <20230202-fs-move-mount-replace-v4-4-98f3d80d7eaa@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 3 年前 | |
fs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230705190309.579783-23-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
tty, proc, kernfs, random: Use copy_splice_read() Use copy_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into the output buffer and don't splice pages. This avoids the need for them to have a ->read_folio() to satisfy filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Arnd Bergmann <arnd@arndb.de> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> | 3 年前 | |
fs: Export generic_atomic_write_valid() mainline inclusion from mainline-v6.12-rc1 commit a570bad16b9f5252db2f342622bd71febb39a19c category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8862 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a570bad16b9f5252db2f342622bd71febb39a19c -------------------------------- The XFS code will need this. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Conflicts: fs/read_write.c Signed-off-by: Long Li <leo.lilong@huawei.com> | 1 个月前 | |
vfs: get rid of old '->iterate' directory operation All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
remap_range: merge do_clone_file_range() into vfs_clone_file_range() mainline inclusion from mainline-v6.8-rc5 commit 853b8d7597eea4ccaaefbcf0942cd42fc86d542a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBHLU4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=853b8d7597eea4ccaaefbcf0942cd42fc86d542a -------------------------------- commit dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") moved the permission hooks from do_clone_file_range() out to its caller vfs_clone_file_range(), but left all the fast sanity checks in do_clone_file_range(). This makes the expensive security hooks be called in situations that they would not have been called before (e.g. fs does not support clone). The only reason for the do_clone_file_range() helper was that overlayfs did not use to be able to call vfs_clone_file_range() from copy up context with sb_writers lock held. However, since commit c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held"), overlayfs just uses an open coded version of vfs_clone_file_range(). Merge_clone_file_range() into vfs_clone_file_range(), restoring the original order of checks as it was before the regressing commit and adapt the overlayfs code to call vfs_clone_file_range() before the permission hooks that were added by commit ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()"). Note that in the merge of do_clone_file_range(), the file_start_write() context was reduced to cover ->remap_file_range() without holding it over the permission hooks, which was the reason for doing the regressing commit in the first place. Reported-and-tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com Fixes: dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com> | 1 年前 | |
hrtimer: Use and report correct timerslack values for realtime tasks stable inclusion from stable-v6.6.84 commit 472173544e74709590b3827a9a32e7ce0387624f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICCVOJ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=472173544e74709590b3827a9a32e7ce0387624f -------------------------------- commit ed4fb6d7ef68111bb539283561953e5c6e9a6e38 upstream. The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple timers in a single interrupt. This is a performance optimization. Timers of realtime tasks (having a realtime scheduling policy) should not be delayed. This logic was inconsitently applied to the hrtimers, leading to delays of realtime tasks which used timed waits for events (e.g. condition variables). Due to the downstream override of the slack for rt tasks, the procfs reported incorrect (non-zero) timerslack_ns values. This is changed by setting the timer_slack_ns task attribute to 0 for all tasks with a rt policy. By that, downstream users do not need to specially handle rt tasks (w.r.t. the slack), and the procfs entry shows the correct value of "0". Setting non-zero slack values (either via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored, as stated in "man 2 PR_SET_TIMERSLACK": Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2)). The special handling of timerslack on rt tasks in downstream users is removed as well. Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 472173544e74709590b3827a9a32e7ce0387624f) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 1 年前 | |
use less confusing names for iov_iter direction initializers READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> | 3 年前 | |
Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal/exit/ptrace updates from Eric Biederman: "This set of changes deletes some dead code, makes a lot of cleanups which hopefully make the code easier to follow, and fixes bugs found along the way. The end-game which I have not yet reached yet is for fatal signals that generate coredumps to be short-circuit deliverable from complete_signal, for force_siginfo_to_task not to require changing userspace configured signal delivery state, and for the ptrace stops to always happen in locations where we can guarantee on all architectures that the all of the registers are saved and available on the stack. Removal of profile_task_ext, profile_munmap, and profile_handoff_task are the big successes for dead code removal this round. A bunch of small bug fixes are included, as most of the issues reported were small enough that they would not affect bisection so I simply added the fixes and did not fold the fixes into the changes they were fixing. There was a bug that broke coredumps piped to systemd-coredump. I dropped the change that caused that bug and replaced it entirely with something much more restrained. Unfortunately that required some rebasing. Some successes after this set of changes: There are few enough calls to do_exit to audit in a reasonable amount of time. The lifetime of struct kthread now matches the lifetime of struct task, and the pointer to struct kthread is no longer stored in set_child_tid. The flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is removed. Issues where task->exit_code was examined with signal->group_exit_code should been examined were fixed. There are several loosely related changes included because I am cleaning up and if I don't include them they will probably get lost. The original postings of these changes can be found at: https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org I trimmed back the last set of changes to only the obviously correct once. Simply because there was less time for review than I had hoped" * 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits) ptrace/m68k: Stop open coding ptrace_report_syscall ptrace: Remove unused regs argument from ptrace_report_syscall ptrace: Remove second setting of PT_SEIZED in ptrace_attach taskstats: Cleanup the use of task->exit_code exit: Use the correct exit_code in /proc/<pid>/stat exit: Fix the exit_code for wait_task_zombie exit: Coredumps reach do_group_exit exit: Remove profile_handoff_task exit: Remove profile_task_exit & profile_munmap signal: clean up kernel-doc comments signal: Remove the helper signal_group_exit signal: Rename group_exit_task group_exec_task coredump: Stop setting signal->group_exit_task signal: Remove SIGNAL_GROUP_COREDUMP signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process signal: Make coredump handling explicit in complete_signal signal: Have prepare_signal detect coredumps using signal->core_state signal: Have the oom killer detect coredumps using signal->core_state exit: Move force_uaccess back into do_exit exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit ... | 4 年前 | |
splice: remove duplicate noinline from pipe_clear_nowait stable inclusion from stable-v6.6.89 commit 9e0d94a292227d1ef89fe1e241971611cbd02325 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID73WA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9e0d94a292227d1ef89fe1e241971611cbd02325 -------------------------------- [ Upstream commit e6f141b332ddd9007756751b6afd24f799488fd8 ] pipe_clear_nowait has two noinline macros, but we only need one. I checked the whole tree, and this is the only occurrence: $ grep -r "noinline .* noinline" fs/splice.c:static noinline void noinline pipe_clear_nowait(struct file *file) $ Fixes: 0f99fc513ddd ("splice: clear FMODE_NOWAIT on file if splice/vmsplice is used") Signed-off-by: "T.J. Mercier" <tjmercier@google.com> Link: https://lore.kernel.org/20250423180025.2627670-1-tjmercier@google.com Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 9e0d94a292227d1ef89fe1e241971611cbd02325) Signed-off-by: Wentao Guan <guanwentao@uniontech.com> | 7 个月前 | |
fs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230705190309.579783-23-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> | 2 年前 | |
fs: Add initial atomic write support info to statx mainline inclusion from mainline-v6.10-rc3 commit 0f9ca80fa4f9670ba09721e4e36b8baf086a500c category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8862 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f9ca80fa4f9670ba09721e4e36b8baf086a500c -------------------------------- Extend statx system call to return additional info for atomic write support support for a file. Helper function generic_fill_statx_atomic_writes() can be used by FSes to fill in the relevant statx fields. For now atomic_write_segments_max will always be 1, otherwise some rules would need to be imposed on iovec length and alignment, which we don't want now. Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com> jpg: relocate bdev support to another patch Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-5-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: include/linux/stat.h include/uapi/linux/stat.h fs/stat.c Signed-off-by: Long Li <leo.lilong@huawei.com> | 1 个月前 | |
statfs: enforce statfs[64] structure initialization s390's struct statfs and struct statfs64 contain padding, which field-by-field copying does not set. Initialize the respective structs with zeros before filling them and copying them to userspace, like it's already done for the compat versions of these structs. Found by KMSAN. [agordeev@linux.ibm.com: fixed typo in patch description] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20230504144021.808932-2-iii@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> | 3 年前 | |
mm: add KABI_* macros to preserve KABI hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/IC3A7I ------------------------------------------------- The shrinker patchset changed the kAPI. Add KABI markups to prevent CRC symbol changes. Signed-off-by: Mauro Carvalho Chehab <m.chehab@huawei.com> | 8 个月前 | |
riscv: compat: syscall: Add compat_sys_call_table implementation Implement compat sys_call_table and some system call functions: truncate64, ftruncate64, fallocate, pread64, pwrite64, sync_file_range, readahead, fadvise64_64 which need argument translation. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220405071314.3225832-12-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> | 4 年前 | |
sysctl: Refactor base paths registrations This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. The old way of doing this through register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a call to register_sysctl_init. The 5 base paths affected are: "kernel", "vm", "debug", "dev" and "fs". We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE macro since they are no longer needed. In order to quickly acertain that the paths did not actually change I executed find /proc/sys/ | sha1sum and made sure that the sha was the same before and after the commit. We end up saving 563 bytes with this change: ./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563) Function old new delta sysctl_init_bases 55 111 +56 init_fs_sysctls 12 33 +21 vm_base_table 128 - -128 kernel_base_table 128 - -128 fs_base_table 128 - -128 dev_base_table 128 - -128 debug_base_table 128 - -128 Total: Before=21258215, After=21257652, chg -0.00% [mcgrof: modified to use register_sysctl_init() over register_sysctl() and add bloat-o-meter stats] Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christian Brauner <brauner@kernel.org> | 3 年前 | |
timerfd: Provide timerfd_resume() Resuming timekeeping is a clock-was-set event and uses the clock-was-set notification mechanism. This is in the way of making the clock-was-set update for hrtimers selective so unnecessary IPIs are avoided when a CPU base does not have timers queued which are affected by the clock setting. Provide a seperate timerfd_resume() interface so the resume logic and the clock-was-set mechanism can be distangled in the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135158.395287410@linutronix.de | 4 年前 | |
userfaultfd: allow registration of ranges below mmap_min_addr stable inclusion from stable-v6.6.140 commit f3deabe0f5ac920412500e71452fd9d0a68379ba category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f3deabe0f5ac920412500e71452fd9d0a68379ba -------------------------------- commit 161ce69c2c89781784b945d8e281ff2da9dede9c upstream. The current implementation of validate_range() in fs/userfaultfd.c performs a hard check against mmap_min_addr. This is redundant because UFFDIO_REGISTER operates on memory ranges that must already be backed by a VMA. Enforcing mmap_min_addr or capability checks again in userfaultfd is unnecessary and prevents applications like binary compilers from using UFFD for valid memory regions mapped by application. Remove the redundant check for mmap_min_addr. We started using UFFD instead of the classic mprotect approach in the binary translator to track application writes. During development, we encountered this bug. The translator cannot control where the translated application chooses to map its memory and if the app requires a low-address area, UFFD fails, whereas mprotect would work just fine. I believe this is a genuine logic bug rather than an improvement, and I would appreciate including the fix in stable. Link: https://lore.kernel.org/20260409103345.15044-1-komlomal@gmail.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Denis M. Karpov <komlomal@gmail.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> | 25 天前 | |
Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ... | 3 年前 | |
xattr: switch to CLASS(fd) mainline inclusion from mainline-v6.13-rc1 commit a71874379ec8c6e788a61d71b3ad014a8d9a5c08 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13855 CVE: CVE-2024-14027 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a71874379ec8c6e788a61d71b3ad014a8d9a5c08 -------------------------------- Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/xattr.c [mainline 88a2f6468d01 not applied] Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com> | 3 个月前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 2 个月前 | ||
| 25 天前 | ||
| 7 个月前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 2 个月前 | ||
| 24 天前 | ||
| 11 个月前 | ||
| 25 天前 | ||
| 1 年前 | ||
| 4 个月前 | ||
| 3 个月前 | ||
| 10 个月前 | ||
| 25 天前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 1 年前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 2 个月前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 5 天前 | ||
| 25 天前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 1 天前 | ||
| 25 天前 | ||
| 3 个月前 | ||
| 25 天前 | ||
| 7 个月前 | ||
| 3 个月前 | ||
| 4 个月前 | ||
| 25 天前 | ||
| 25 天前 | ||
| 25 天前 | ||
| 7 个月前 | ||
| 25 天前 | ||
| 1 个月前 | ||
| 2 个月前 | ||
| 5 天前 | ||
| 25 天前 | ||
| 25 天前 | ||
| 25 天前 | ||
| 3 年前 | ||
| 25 天前 | ||
| 25 天前 | ||
| 2 个月前 | ||
| 3 天前 | ||
| 2 年前 | ||
| 16 天前 | ||
| 25 天前 | ||
| 25 天前 | ||
| 1 年前 | ||
| 3 个月前 | ||
| 25 天前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 3 天前 | ||
| 2 个月前 | ||
| 1 个月前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 8 个月前 | ||
| 25 天前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 4 天前 | ||
| 2 年前 | ||
| 1 个月前 | ||
| 2 年前 | ||
| 1 个月前 | ||
| 1 个月前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 1 年前 | ||
| 4 年前 | ||
| 1 年前 | ||
| 5 个月前 | ||
| 5 年前 | ||
| 25 天前 | ||
| 3 年前 | ||
| 1 年前 | ||
| 3 个月前 | ||
| 3 年前 | ||
| 3 个月前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 25 天前 | ||
| 3 个月前 | ||
| 3 天前 | ||
| 1 年前 | ||
| 1 个月前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 5 个月前 | ||
| 1 个月前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 3 年前 | ||
| 6 年前 | ||
| 25 天前 | ||
| 7 年前 | ||
| 3 个月前 | ||
| 3 年前 | ||
| 1 年前 | ||
| 30 天前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 4 个月前 | ||
| 2 个月前 | ||
| 2 天前 | ||
| 2 年前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 3 个月前 | ||
| 30 天前 | ||
| 30 天前 | ||
| 3 个月前 | ||
| 8 个月前 | ||
| 8 个月前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 3 年前 | ||
| 1 个月前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 3 年前 | ||
| 4 年前 | ||
| 7 个月前 | ||
| 2 年前 | ||
| 1 个月前 | ||
| 3 年前 | ||
| 8 个月前 | ||
| 4 年前 | ||
| 3 年前 | ||
| 4 年前 | ||
| 25 天前 | ||
| 3 年前 | ||
| 3 个月前 |