kernel/fs/fscache · openEuler/kernel - AtomGit

ZZizhi Wofscache: clean up for fscache_clear_volume_priv

b49c0e63创建于 2024年12月17日历史提交

文件	最后提交记录	最后更新时间
Kconfig	fscache: remove FSCACHE_OLD_API Kconfig option Commit 01491a756578 ("fscache, cachefiles: Disable configuration") added the FSCACHE_OLD_API configuration when rewritten. Now, it's not used any more. Remove it. Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006647.html # v1	4 年前
Makefile	fscache: Provide a means to begin an operation Provide a function to begin a read operation: int fscache_begin_read_operation( struct netfs_cache_resources cres, struct fscache_cookie cookie) This is primarily intended to be called by network filesystems on behalf of netfslib, but may also be called to use the I/O access functions directly. It attaches the resources required by the cache to cres struct from the supplied cookie. This holds access to the cache behind the cookie for the duration of the operation and forces cache withdrawal and cookie invalidation to perform synchronisation on the operation. cres->inval_counter is set from the cookie at this point so that it can be compared at the end of the operation. Note that this does not guarantee that the cache state is fully set up and able to perform I/O immediately; looking up and creation may be left in progress in the background. The operations intended to be called by the network filesystem, such as reading and writing, are expected to wait for the cookie to move to the correct state. This will, however, potentially sleep, waiting for a certain minimum state to be set or for operations such as invalidate to advance far enough that I/O can resume. Also provide a function for the cache to call to wait for the cache object to get to a state where it can be used for certain things: bool fscache_wait_for_operation(struct netfs_cache_resources *cres, enum fscache_want_stage stage); This looks at the cache resources provided by the begin function and waits for them to get to an appropriate stage. There's a choice of wanting just some parameters (FSCACHE_WANT_PARAM) or the ability to do I/O (FSCACHE_WANT_READ or FSCACHE_WANT_WRITE). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819603692.215744.146724961588817028.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906910672.143852.13856103384424986357.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967110245.1823006.2239170567540431836.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021513617.640689.16627329360866150606.stgit@warthog.procyon.org.uk/ # v4	4 年前
cache.c	fscache: Add the synchronous waiting mechanism for the volume unhash in erofs ondemand mode hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- This patch adds the synchronous waiting mechanism for the volume unhash in erofs ondemand mode. There are 2 main reasons: 1) Currently volume's unhash relies on its own reference count to be 0, partly depending on the user state release fd. That is, if the user state fails to turn off fd properly, then the volume cannot be unhashed. Next time when mount the specified volume with the same name, the system will waits for the release of the previous volume. However, the write semaphore of the corresponding superblock is not released. When traversing the sb, hung task will occur because the read semaphore cannot be obtained: [mount] [sync] vfs_get_super ksys_sync sget_fc iterate_supers alloc_super down_write_nested ----pin sb->s_umount list_add_tail(&s->s_list, &super_blocks) erofs_fc_fill_super super_lock ... down_read ----hungtask fscache_hash_volume ----wait for volume 2) During the umount process, object generates the cache directory entry (cachefiles_commit_tmpfile), but it is not synchronized. After umount is complete, the user may see that the cache directory is not generated. When inuse() is called in user mode, the cache directory cannot be found. The solution: 1) For the erofs on-demand loading scenario, "n_hash_cookies" has been introduced in "struct fscache_volume", increased whenever there is a child cookie hashed and decreased if the cookie has unhashed. When it returns zero, the volume is awakened with unhash. FSCACHE_CACHE_SYNC_VOLUME_UNHASH flag is introduced in "struct fscache_cache", which is used to indicate whether this feature is enabled. 2) cachefiles_free_volume() need to be called to ensure the next mount successful, otherwise -ENODATA will be returned because cache_priv will not be created in new volume and the object may not be initialized. To prevent use-after-free issue caused by the kfree(volume)/kfree(cache), "ref" and "lock" have been introduced in "struct cachefiles_volume". There are three benefits to this: 1) The unhash of volume does not depend on whether the fd is handled correctly by the userland. If fd is not closed after umount, it does not matter if it continues to be used, because object->file is already NULL as cachefiles_clean_up_object() is called before fscache_unhash_cookie(). 2) The cache directory can be guaranteed to be generated before the umount completes unless the process is interruped. This is because cachefiles_commit_tmpfile() is called before fscache_unhash_cookie(). 3) Before this patch, it is possible that after umount, calling inuse() on a cache entry may still shows -EBUSY, because the order of umount and cachefiles_put_directory() is indeterminate. Now thanks to volume->lock, calling cull/inuse after umount is finished will not return an error. Fixes: 62ab63352350 ("fscache: Implement volume registration") Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
cookie.c	fscache: Add the synchronous waiting mechanism for the volume unhash in erofs ondemand mode hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- This patch adds the synchronous waiting mechanism for the volume unhash in erofs ondemand mode. There are 2 main reasons: 1) Currently volume's unhash relies on its own reference count to be 0, partly depending on the user state release fd. That is, if the user state fails to turn off fd properly, then the volume cannot be unhashed. Next time when mount the specified volume with the same name, the system will waits for the release of the previous volume. However, the write semaphore of the corresponding superblock is not released. When traversing the sb, hung task will occur because the read semaphore cannot be obtained: [mount] [sync] vfs_get_super ksys_sync sget_fc iterate_supers alloc_super down_write_nested ----pin sb->s_umount list_add_tail(&s->s_list, &super_blocks) erofs_fc_fill_super super_lock ... down_read ----hungtask fscache_hash_volume ----wait for volume 2) During the umount process, object generates the cache directory entry (cachefiles_commit_tmpfile), but it is not synchronized. After umount is complete, the user may see that the cache directory is not generated. When inuse() is called in user mode, the cache directory cannot be found. The solution: 1) For the erofs on-demand loading scenario, "n_hash_cookies" has been introduced in "struct fscache_volume", increased whenever there is a child cookie hashed and decreased if the cookie has unhashed. When it returns zero, the volume is awakened with unhash. FSCACHE_CACHE_SYNC_VOLUME_UNHASH flag is introduced in "struct fscache_cache", which is used to indicate whether this feature is enabled. 2) cachefiles_free_volume() need to be called to ensure the next mount successful, otherwise -ENODATA will be returned because cache_priv will not be created in new volume and the object may not be initialized. To prevent use-after-free issue caused by the kfree(volume)/kfree(cache), "ref" and "lock" have been introduced in "struct cachefiles_volume". There are three benefits to this: 1) The unhash of volume does not depend on whether the fd is handled correctly by the userland. If fd is not closed after umount, it does not matter if it continues to be used, because object->file is already NULL as cachefiles_clean_up_object() is called before fscache_unhash_cookie(). 2) The cache directory can be guaranteed to be generated before the umount completes unless the process is interruped. This is because cachefiles_commit_tmpfile() is called before fscache_unhash_cookie(). 3) Before this patch, it is possible that after umount, calling inuse() on a cache entry may still shows -EBUSY, because the order of umount and cachefiles_put_directory() is indeterminate. Now thanks to volume->lock, calling cull/inuse after umount is finished will not return an error. Fixes: 62ab63352350 ("fscache: Implement volume registration") Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
internal.h	netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume() stable inclusion from stable-v6.6.42 commit 5c5a577a2cc62969450650a076be7183339c0857 bugzilla: https://gitee.com/openeuler/kernel/issues/IAHMJO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5c5a577a2cc62969450650a076be7183339c0857 -------------------------------- [ Upstream commit 85b08b31a22b481ec6528130daf94eee4452e23f ] Export fscache_put_volume() and add fscache_try_get_volume() helper function to allow cachefiles to get/put fscache_volume via linux/fscache-cache.h. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-2-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: 522018a0de6b ("cachefiles: fix slab-use-after-free in fscache_withdraw_volume()") Stable-dep-of: 5d8f80578907 ("cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>	1 年前
io.c	use less confusing names for iov_iter direction initializers READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	3 年前
main.c	fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF stable inclusion from stable-v6.6.52 commit e0d724932ad12e3528f4ce97fc0f6078d0cce4bc category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IARWDH CVE: CVE-2024-46786 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/fscache?h=v6.6.52&id=e0d724932ad12e3528f4ce97fc0f6078d0cce4bc -------------------------------- commit 72a6e22c604c95ddb3b10b5d3bb85b6ff4dbc34f upstream. The fscache_cookie_lru_timer is initialized when the fscache module is inserted, but is not deleted when the fscache module is removed. If timer_reduce() is called before removing the fscache module, the fscache_cookie_lru_timer will be added to the timer list of the current cpu. Afterwards, a use-after-free will be triggered in the softIRQ after removing the fscache module, as follows: ================================================================== BUG: unable to handle page fault for address: fffffbfff803c9e9 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 #855 Tainted: [W]=WARN RIP: 0010:__run_timer_base.part.0+0x254/0x8a0 Call Trace: <IRQ> tmigr_handle_remote_up+0x627/0x810 __walk_groups.isra.0+0x47/0x140 tmigr_handle_remote+0x1fa/0x2f0 handle_softirqs+0x180/0x590 irq_exit_rcu+0x84/0xb0 sysvec_apic_timer_interrupt+0x6e/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:default_idle+0xf/0x20 default_idle_call+0x38/0x60 do_idle+0x2b5/0x300 cpu_startup_entry+0x54/0x60 start_secondary+0x20d/0x280 common_startup_64+0x13e/0x148 </TASK> Modules linked in: [last unloaded: netfs] ================================================================== Therefore delete fscache_cookie_lru_timer when removing the fscahe module. Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826112056.2458299-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>	1 年前
proc.c	fscache: Implement cookie registration Add functions to the fscache API to allow data file cookies to be acquired and relinquished by the network filesystem. It is intended that the filesystem will create such cookies per-inode under a volume. To request a cookie, the filesystem should call: struct fscache_cookie * fscache_acquire_cookie(struct fscache_volume volume, u8 advice, const void index_key, size_t index_key_len, const void aux_data, size_t aux_data_len, loff_t object_size) The filesystem must first have created a volume cookie, which is passed in here. If it passes in NULL then the function will just return a NULL cookie. A binary key should be passed in index_key and is of size index_key_len. This is saved in the cookie and is used to locate the associated data in the cache. A coherency data buffer of size aux_data_len will be allocated and initialised from the buffer pointed to by aux_data. This is used to validate cache objects when they're opened and is stored on disk with them when they're committed. The data is stored in the cookie and will be updateable by various functions in later patches. The object_size must also be given. This is also used to perform a coherency check and to size the backing storage appropriately. This function disallows a cookie from being acquired twice in parallel, though it will cause the second user to wait if the first is busy relinquishing its cookie. When a network filesystem has finished with a cookie, it should call: void fscache_relinquish_cookie(struct fscache_volume volume, bool retire) If retire is true, any backing data will be discarded immediately. Changes ======= ver #3: - fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit to round up to. - When comparing cookies, simply see if the attributes are the same rather than subtracting them to produce a strcmp-style return[1]. - Add a check to see if the cookie is still hashed at the point of freeing. ver #2: - Don't hold n_accesses elevated whilst cache is bound to a cookie, but rather add a flag that prevents the state machine from being queued when n_accesses reaches 0. - Remove the unused cookie pointer field from the fscache_acquire tracepoint. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/163819590658.215744.14934902514281054323.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906891983.143852.6219772337558577395.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967088507.1823006.12659006350221417165.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021498432.640689.12743483856927722772.stgit@warthog.procyon.org.uk/ # v4	4 年前
stats.c	fscache, cachefiles: Display stat of culling events Add a stat counter of culling events whereby the cache backend culls a file to make space (when asked by cachefilesd in this case) and display in /proc/fs/fscache/stats. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819654165.215744.3797804661644212436.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906961387.143852.9291157239960289090.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967168266.1823006.14436200166581605746.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021567619.640689.4339228906248763197.stgit@warthog.procyon.org.uk/ # v4	4 年前
volume.c	fscache: clean up for fscache_clear_volume_priv hulk inclusion category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- In the previous patch, we introduced mutex_lock to prevent concurrency between cachefiles_free_volume() and cachefiles_withdraw_volume(). Now in fscache_clear_volume_priv(), there is no need to increase fscache_volume->n_accesses to prevent concurrency. Remove the related code. Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前