kernel/fs/cachefiles · openEuler/kernel - AtomGit

ZZizhi Wocachefiles: Fix the incorrect return value in cachefiles_ondemand_fd_write_iter()

f2009c26创建于 2025年7月3日历史提交

文件	最后提交记录	最后更新时间
Kconfig	cachefiles: notify the user daemon when looking up cookie Fscache/CacheFiles used to serve as a local cache for a remote networking fs. A new on-demand read mode will be introduced for CacheFiles, which can boost the scenario where on-demand read semantics are needed, e.g. container image distribution. The essential difference between these two modes is seen when a cache miss occurs: In the original mode, the netfs will fetch the data from the remote server and then write it to the cache file; in on-demand read mode, fetching the data and writing it into the cache is delegated to a user daemon. As the first step, notify the user daemon when looking up cookie. In this case, an anonymous fd is sent to the user daemon, through which the user daemon can write the fetched data to the cache file. Since the user daemon may move the anonymous fd around, e.g. through dup(), an object ID uniquely identifying the cache file is also attached. Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that the cache file size shall be retrieved at runtime. This helps the scenario where one cache file contains multiple netfs files, e.g. for the purpose of deduplication. In this case, netfs itself has no idea the size of the cache file, whilst the user daemon should give the hint on it. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	4 年前
Makefile	cachefiles: notify the user daemon when looking up cookie Fscache/CacheFiles used to serve as a local cache for a remote networking fs. A new on-demand read mode will be introduced for CacheFiles, which can boost the scenario where on-demand read semantics are needed, e.g. container image distribution. The essential difference between these two modes is seen when a cache miss occurs: In the original mode, the netfs will fetch the data from the remote server and then write it to the cache file; in on-demand read mode, fetching the data and writing it into the cache is delegated to a user daemon. As the first step, notify the user daemon when looking up cookie. In this case, an anonymous fd is sent to the user daemon, through which the user daemon can write the fetched data to the cache file. Since the user daemon may move the anonymous fd around, e.g. through dup(), an object ID uniquely identifying the cache file is also attached. Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that the cache file size shall be retrieved at runtime. This helps the scenario where one cache file contains multiple netfs files, e.g. for the purpose of deduplication. In this case, netfs itself has no idea the size of the cache file, whilst the user daemon should give the hint on it. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	4 年前
cache.c	fscache: Add the synchronous waiting mechanism for the volume unhash in erofs ondemand mode hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- This patch adds the synchronous waiting mechanism for the volume unhash in erofs ondemand mode. There are 2 main reasons: 1) Currently volume's unhash relies on its own reference count to be 0, partly depending on the user state release fd. That is, if the user state fails to turn off fd properly, then the volume cannot be unhashed. Next time when mount the specified volume with the same name, the system will waits for the release of the previous volume. However, the write semaphore of the corresponding superblock is not released. When traversing the sb, hung task will occur because the read semaphore cannot be obtained: [mount] [sync] vfs_get_super ksys_sync sget_fc iterate_supers alloc_super down_write_nested ----pin sb->s_umount list_add_tail(&s->s_list, &super_blocks) erofs_fc_fill_super super_lock ... down_read ----hungtask fscache_hash_volume ----wait for volume 2) During the umount process, object generates the cache directory entry (cachefiles_commit_tmpfile), but it is not synchronized. After umount is complete, the user may see that the cache directory is not generated. When inuse() is called in user mode, the cache directory cannot be found. The solution: 1) For the erofs on-demand loading scenario, "n_hash_cookies" has been introduced in "struct fscache_volume", increased whenever there is a child cookie hashed and decreased if the cookie has unhashed. When it returns zero, the volume is awakened with unhash. FSCACHE_CACHE_SYNC_VOLUME_UNHASH flag is introduced in "struct fscache_cache", which is used to indicate whether this feature is enabled. 2) cachefiles_free_volume() need to be called to ensure the next mount successful, otherwise -ENODATA will be returned because cache_priv will not be created in new volume and the object may not be initialized. To prevent use-after-free issue caused by the kfree(volume)/kfree(cache), "ref" and "lock" have been introduced in "struct cachefiles_volume". There are three benefits to this: 1) The unhash of volume does not depend on whether the fd is handled correctly by the userland. If fd is not closed after umount, it does not matter if it continues to be used, because object->file is already NULL as cachefiles_clean_up_object() is called before fscache_unhash_cookie(). 2) The cache directory can be guaranteed to be generated before the umount completes unless the process is interruped. This is because cachefiles_commit_tmpfile() is called before fscache_unhash_cookie(). 3) Before this patch, it is possible that after umount, calling inuse() on a cache entry may still shows -EBUSY, because the order of umount and cachefiles_put_directory() is indeterminate. Now thanks to volume->lock, calling cull/inuse after umount is finished will not return an error. Fixes: 62ab63352350 ("fscache: Implement volume registration") Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
daemon.c	cachefiles: Parse the "secctx" immediately stable inclusion from stable-v6.6.74 commit 933689000dff37b855a8d4ffc67d3c43ed8a17e4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC9PNO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=933689000dff37b855a8d4ffc67d3c43ed8a17e4 -------------------------------- [ Upstream commit e5a8b6446c0d370716f193771ccacf3260a57534 ] Instead of storing an opaque string, call security_secctx_to_secid() right in the "secctx" command handler and store only the numeric "secid". This eliminates an unnecessary string allocation and allows the daemon to receive errors when writing the "secctx" command instead of postponing the error to the "bind" command handler. For example, if the kernel was built without `CONFIG_SECURITY`, "bind" will return `EOPNOTSUPP`, but the daemon doesn't know why. With this patch, the "secctx" will instead return `EOPNOTSUPP` which is the right context for this error. This patch adds a boolean flag `have_secid` because I'm not sure if we can safely assume that zero is the special secid value for "not set". This appears to be true for SELinux, Smack and AppArmor, but since this attribute is not documented, I'm unable to derive a stable guarantee for that. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241209141554.638708-1-max.kellermann@ionos.com/ Link: https://lore.kernel.org/r/20241213135013.2964079-6-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: fs/cachefiles/daemon.c [Different error codes do not affect the patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
error_inject.c	fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls There is no need to declare an extra tables to just create directory, this can be easily be done with a prefix path with register_sysctl(). Simplify this registration. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	3 年前
interface.c	erofs/cachefiles: Change the unmark inuse sequence in erofs ondemand mode hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC349R -------------------------------- In the current erofs ondemand loading mode, the close request is sent to userspace before calling the unmark inuse interface. This means userspace cannot reliably perform culling immediately upon receiving the close req. To address this, a new cookie flag is introduced to allow calling cachefiles_unmark_inode_in_use() before sending the close request. This enables userspace to safely perform culling after the notification. Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
internal.h	cachefiles: Parse the "secctx" immediately stable inclusion from stable-v6.6.74 commit 933689000dff37b855a8d4ffc67d3c43ed8a17e4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC9PNO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=933689000dff37b855a8d4ffc67d3c43ed8a17e4 -------------------------------- [ Upstream commit e5a8b6446c0d370716f193771ccacf3260a57534 ] Instead of storing an opaque string, call security_secctx_to_secid() right in the "secctx" command handler and store only the numeric "secid". This eliminates an unnecessary string allocation and allows the daemon to receive errors when writing the "secctx" command instead of postponing the error to the "bind" command handler. For example, if the kernel was built without `CONFIG_SECURITY`, "bind" will return `EOPNOTSUPP`, but the daemon doesn't know why. With this patch, the "secctx" will instead return `EOPNOTSUPP` which is the right context for this error. This patch adds a boolean flag `have_secid` because I'm not sure if we can safely assume that zero is the special secid value for "not set". This appears to be true for SELinux, Smack and AppArmor, but since this attribute is not documented, I'm unable to derive a stable guarantee for that. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241209141554.638708-1-max.kellermann@ionos.com/ Link: https://lore.kernel.org/r/20241213135013.2964079-6-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: fs/cachefiles/daemon.c [Different error codes do not affect the patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
io.c	cachefiles: Fix the incorrect return value in cachefiles_ondemand_fd_write_iter() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICJHNT -------------------------------- In __cachefiles_write(), if the return value of the write operation > 0, it is set to 0. This makes it impossible to distinguish scenarios where a partial write has occurred, and causes the outer function cachefiles_ondemand_fd_write_iter() to unconditionally return the full length specified by user space. Fix it by modifying "ret" to reflect the actual number of bytes written. Furthermore, returning a value greater than 0 from __cachefiles_write() does not affect other call paths, such as fscache_write_to_cache() and fscache_write(). Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	11 个月前
key.c	cachefiles: Implement key to filename encoding Implement a function to encode a binary cookie key as something that can be used as a filename. Four options are considered: (1) All printable chars with no '/' characters. Prepend a 'D' to indicate the encoding but otherwise use as-is. (2) Appears to be an array of __be32. Encode as 'S' plus a list of hex-encoded 32-bit ints separated by commas. If a number is 0, it is rendered as "" instead of "0". (3) Appears to be an array of __le32. Encoded as (2) but with a 'T' encoding prefix. (4) Encoded as base64 with an 'E' prefix plus a second char indicating how much padding is involved. A non-standard base64 encoding is used because '/' cannot be used in the encoded form. If (1) is not possible, whichever of (2), (3) or (4) produces the shortest string is selected (hex-encoding a number may be less dense than base64 encoding it). Note that the prefix characters have to be selected from the set [DEIJST@] lest cachefilesd remove the files because it recognise the name. Changes ======= ver #2: - Fix a short allocation that didn't allow for a string terminator[1] Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/bcefb8f2-576a-b3fc-cc29-89808ebfd7c1@linux.alibaba.com/ [1] Link: https://lore.kernel.org/r/163819640393.215744.15212364106412961104.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906940529.143852.17352132319136117053.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967149827.1823006.6088580775428487961.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021549223.640689.14762875188193982341.stgit@warthog.procyon.org.uk/ # v4	4 年前
main.c	cachefiles: Implement object lifecycle funcs Implement allocate, get, see and put functions for the cachefiles_object struct. The members of the struct we're going to need are also added. Additionally, implement a lifecycle tracepoint. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819639457.215744.4600093239395728232.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906939569.143852.3594314410666551982.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967148857.1823006.6332962598220464364.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021547762.640689.8422781599594931000.stgit@warthog.procyon.org.uk/ # v4	4 年前
namei.c	cachefiles: Clean up in cachefiles_commit_tmpfile() mainline inclusion from mainline-v6.13-rc1 commit 09ecf8f5505465b5527a39dff4b159af62306eee category: cleanup bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09ecf8f5505465b5527a39dff4b159af62306eee -------------------------------- Currently, cachefiles_commit_tmpfile() will only be called if object->flags is set to CACHEFILES_OBJECT_USING_TMPFILE. Only cachefiles_create_file() and cachefiles_invalidate_cookie() set this flag. Both of these functions replace object->file with the new tmpfile, and both are called by fscache_cookie_state_machine(), so there are no concurrency issues. So the equation "d_backing_inode(dentry) == file_inode(object->file)" in cachefiles_commit_tmpfile() will never hold true according to the above conditions. This patch removes this part of the redundant code and does not involve any other logical changes. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-4-wozizhi@huawei.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	1 年前
ondemand.c	cachefiles: Fix the incorrect return value in cachefiles_ondemand_fd_write_iter() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICJHNT -------------------------------- In __cachefiles_write(), if the return value of the write operation > 0, it is set to 0. This makes it impossible to distinguish scenarios where a partial write has occurred, and causes the outer function cachefiles_ondemand_fd_write_iter() to unconditionally return the full length specified by user space. Fix it by modifying "ret" to reflect the actual number of bytes written. Furthermore, returning a value greater than 0 from __cachefiles_write() does not affect other call paths, such as fscache_write_to_cache() and fscache_write(). Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	11 个月前
security.c	cachefiles: Parse the "secctx" immediately stable inclusion from stable-v6.6.74 commit 933689000dff37b855a8d4ffc67d3c43ed8a17e4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC9PNO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=933689000dff37b855a8d4ffc67d3c43ed8a17e4 -------------------------------- [ Upstream commit e5a8b6446c0d370716f193771ccacf3260a57534 ] Instead of storing an opaque string, call security_secctx_to_secid() right in the "secctx" command handler and store only the numeric "secid". This eliminates an unnecessary string allocation and allows the daemon to receive errors when writing the "secctx" command instead of postponing the error to the "bind" command handler. For example, if the kernel was built without `CONFIG_SECURITY`, "bind" will return `EOPNOTSUPP`, but the daemon doesn't know why. With this patch, the "secctx" will instead return `EOPNOTSUPP` which is the right context for this error. This patch adds a boolean flag `have_secid` because I'm not sure if we can safely assume that zero is the special secid value for "not set". This appears to be true for SELinux, Smack and AppArmor, but since this attribute is not documented, I'm unable to derive a stable guarantee for that. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241209141554.638708-1-max.kellermann@ionos.com/ Link: https://lore.kernel.org/r/20241213135013.2964079-6-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: fs/cachefiles/daemon.c [Different error codes do not affect the patch.] Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
volume.c	fscache: Add the synchronous waiting mechanism for the volume unhash in erofs ondemand mode hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- This patch adds the synchronous waiting mechanism for the volume unhash in erofs ondemand mode. There are 2 main reasons: 1) Currently volume's unhash relies on its own reference count to be 0, partly depending on the user state release fd. That is, if the user state fails to turn off fd properly, then the volume cannot be unhashed. Next time when mount the specified volume with the same name, the system will waits for the release of the previous volume. However, the write semaphore of the corresponding superblock is not released. When traversing the sb, hung task will occur because the read semaphore cannot be obtained: [mount] [sync] vfs_get_super ksys_sync sget_fc iterate_supers alloc_super down_write_nested ----pin sb->s_umount list_add_tail(&s->s_list, &super_blocks) erofs_fc_fill_super super_lock ... down_read ----hungtask fscache_hash_volume ----wait for volume 2) During the umount process, object generates the cache directory entry (cachefiles_commit_tmpfile), but it is not synchronized. After umount is complete, the user may see that the cache directory is not generated. When inuse() is called in user mode, the cache directory cannot be found. The solution: 1) For the erofs on-demand loading scenario, "n_hash_cookies" has been introduced in "struct fscache_volume", increased whenever there is a child cookie hashed and decreased if the cookie has unhashed. When it returns zero, the volume is awakened with unhash. FSCACHE_CACHE_SYNC_VOLUME_UNHASH flag is introduced in "struct fscache_cache", which is used to indicate whether this feature is enabled. 2) cachefiles_free_volume() need to be called to ensure the next mount successful, otherwise -ENODATA will be returned because cache_priv will not be created in new volume and the object may not be initialized. To prevent use-after-free issue caused by the kfree(volume)/kfree(cache), "ref" and "lock" have been introduced in "struct cachefiles_volume". There are three benefits to this: 1) The unhash of volume does not depend on whether the fd is handled correctly by the userland. If fd is not closed after umount, it does not matter if it continues to be used, because object->file is already NULL as cachefiles_clean_up_object() is called before fscache_unhash_cookie(). 2) The cache directory can be guaranteed to be generated before the umount completes unless the process is interruped. This is because cachefiles_commit_tmpfile() is called before fscache_unhash_cookie(). 3) Before this patch, it is possible that after umount, calling inuse() on a cache entry may still shows -EBUSY, because the order of umount and cachefiles_put_directory() is indeterminate. Now thanks to volume->lock, calling cull/inuse after umount is finished will not return an error. Fixes: 62ab63352350 ("fscache: Implement volume registration") Signed-off-by: Zizhi Wo <wozizhi@huawei.com>	1 年前
xattr.c	cachefiles: Fix non-taking of sb_writers around set/removexattr stable inclusion from stable-v6.6.54 commit 81b048b9484bf8b3c0ad6e901a6b79fb941173b0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAZ3K2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=81b048b9484bf8b3c0ad6e901a6b79fb941173b0 -------------------------------- [ Upstream commit 80887f31672970abae3aaa9cf62ac72a124e7c89 ] Unlike other vfs_xxxx() calls, vfs_setxattr() and vfs_removexattr() don't take the sb_writers lock, so the caller should do it for them. Fix cachefiles to do this. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: David Howells <dhowells@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Gao Xiang <xiang@kernel.org> cc: netfs@lists.linux.dev cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-3-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wen Zhiwei <wenzhiwei@kylinos.cn>	1 年前