文件最后提交记录最后更新时间
partitions: mac: fix handling of bogus partition table mainline inclusion from mainline-v6.10-rc2 commit 80e648042e512d5a767da251d44132553fe04ae0 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBPC5K CVE: CVE-2025-21772 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80e648042e512d5a767da251d44132553fe04ae0 -------------------------------- Fix several issues in partition probing: - The bailout for a bad partoffset must use put_dev_sector(), since the preceding read_part_sector() succeeded. - If the partition table claims a silly sector size like 0xfff bytes (which results in partition table entries straddling sector boundaries), bail out instead of accessing out-of-bounds memory. - We must not assume that the partition table contains proper NUL termination - use strnlen() and strncmp() instead of strlen() and strcmp(). Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20250214-partition-mac-v1-1-c1c626dffbd5@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Long Li <leo.lilong@huawei.com>1 年前
block: support to dispatch bio asynchronously hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IA5AEP CVE: NA -------------------------------- In certain environments, specific CPUs handle a large number of tasks and become bottlenecks, affecting overall system performance. This commit introduces a new feature that enables asynchronous I/O dispatch to designated CPUs, thereby relieving the pressure on the busy CPUs. Signed-off-by: Li Nan <linan122@huawei.com>1 年前
treewide: replace '---help---' in Kconfig files with 'help' Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>5 年前
Makefile: Exclude false positive warning options for Clang hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9SOCR --------------------------- Fix three false positive warning options for Bisheng Clang. Exclude -Walign-mismatch for: block/blk-mq.c:767:51: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch] smp_call_function_single_async(rq->mq_ctx->cpu, &rq->csd); ^ Exclude -Wbitwise-instead-of-logical for: fs/namei.c:2053:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] } while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata,... ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || fs/namei.c:2053:13: note: cast one or both operands to int to silence this warning Exclude -Wmacro-redefined for: ./arch/arm64/include/asm/elf.h:218:9: warning: 'COMPAT_ARCH_DLINFO' macro redefined [-Wmacro-redefined] #define COMPAT_ARCH_DLINFO \ ^ arch/arm64/kernel/binfmt_elf32.c:22:9: note: previous definition is here #define COMPAT_ARCH_DLINFO Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>2 年前
block/badblocks: fix badblocks loss when badblocks combine hulk inclusion category: bugfix bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6ZG5B CVE: NA -------------------------------- badblocks will loss if we set it as below: # echo 1 1 > bad_blocks # echo 3 1 > bad_blocks # echo 1 5 > bad_blocks # cat bad_blocks 1 3 we will combine badblocks if there is an intersection between p[lo] and p[hi] in badblocks_set(). The end of new badblocks is p[hi]'s end now. but p[lo] may cross p[hi] and new end should be the larger of p[lo] and p[hi]. lo: |------------------------| hi: |--------| Fixes: 9e0e252a048b ("badblocks: Add core badblock management code") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Hou Tao <houtao1@huawei.com> (cherry picked from commit e35a77628aa041f9d94921992ba9e8d3c6dbe8ba)2 年前
block, bfq: fix uaf for bfqq in bic_set_bfqq() stable inclusion from stable-v5.10.175 commit 7f77f3dab5066a7c9da73d72d1eee895ff84a8d5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8711T Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7f77f3dab5066a7c9da73d72d1eee895ff84a8d5 -------------------------------- [ Upstream commit b600de2d7d3a16f9007fad1bdae82a3951a26af2 ] After commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'"), bic->bfqq will be accessed in bic_set_bfqq(), however, in some context bic->bfqq will be freed, and bic_set_bfqq() is called with the freed bic->bfqq. Fix the problem by always freeing bfqq after bic_set_bfqq(). Fixes: 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-and-tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230130014136.591038-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>2 年前
blk-mq: avoid to touch q->elevator without any protection mainline inclusion from mainline-v5.19-rc3 commit 4d337cebcb1c27d9b48c48b9a98e939d4552d584 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBY0UQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d337cebcb1c27d9b48c48b9a98e939d4552d584 ------------------ q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq.c [Due to not merging commit 4f481208749a ("blk-mq: prepare for implementing hctx table via xarray").] block/mq-deadline.c [Due to not merging commit 3e9a99eba058 ("block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()").] include/linux/blkdev.h [Context conflicts.] include/linux/elevator.h [Due to not merging commit 2e9bc3465ac5 ("block: move elevator.h to block/").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
block, bfq: switch 'bfqg->ref' to use atomic refcount apis mainline inclusion from mainline-v6.2-rc5 commit 216f764716f34fe68cedc7296ae2043a7727e640 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6DYRK CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=216f764716f34fe68cedc7296ae2043a7727e640 -------------------------------- The updating of 'bfqg->ref' should be protected by 'bfqd->lock', however, during code review, we found that bfq_pd_free() update 'bfqg->ref' without holding the lock, which is problematic: 1) bfq_pd_free() triggered by removing cgroup is called asynchronously; 2) bfqq will grab bfqg reference, and exit bfqq will drop the reference, which can concurrent with 1). Unfortunately, 'bfqd->lock' can't be held here because 'bfqd' might already be freed in bfq_pd_free(). Fix the problem by using atomic refcount apis. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230103084755.1256479-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit a5c41e6f98e5fddc50efd68e0725e5b534c2f8d6)3 年前
Revert "block, bfq: move bfqq to root_group if parent group is offlined" hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60IHY CVE: NA -------------------------------- This reverts commit eeabdc14ef8231fea94074b744d9648805a4015b. Prepare to backport solution from mainline. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>3 年前
block: use bi_max_vecs to find the bvec pool mainline inclusion from mainline-v5.12-rc1 commit 7a800a20ae6329e803c5c646b20811a6ae9ca136 bugzilla: https://gitee.com/openeuler/kernel/issues/IB7FJU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a800a20ae6329e803c5c646b20811a6ae9ca136 -------------------------------- Instead of encoding of the bvec pool using magic bio flags, just use a helper to find the pool based on the max_vecs value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: include/linux/bio.h block/bio.c block/blk.h [Commit 3175199ab0ac ("block: split bio_kmalloc from bio_alloc_bioset") modifies the method of determining whether bio_vec needs to be allocated; commit 309dca309fc3 ("block: store a block_device pointer in struct bio") change the comment of __bio_clone_fast; commit c42bca92be92 ("bio: don't copy bvec for direct IO") add WARN_ON_ONCE when "BVEC_POOL_IDX(bio) != 0"; commit 49d1ec8573f7 ("block: manage bio slab cache by xarray") remove bio_slabs/bio_slab_nr/bio_slab_max; commit eec716a1c18c ("block: move three bvec helpers declaration into private helper") move declarations of three bvec helpers after blk_freeze_queue.] Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>1 年前
block: fix kabi broken hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB7FJU -------------------------------- Commit 0f2e6ab851ae ("block: turn the nr_iovecs argument to bio_alloc* into an unsigned short") change the type of parameter of bio_alloc_bioset. Change it back to fix kabi broken since it does't matter. Fixes: 0f2e6ab851ae ("block: turn the nr_iovecs argument to bio_alloc* into an unsigned short") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>1 年前
blk-cgroup: Fix the recursive blkg rwstat stable inclusion from stable-5.10.27 commit 71b996c9b883313be4320954c902e84031399fd9 bugzilla: 51493 -------------------------------- [ Upstream commit 4f44657d74873735e93a50eb25014721a66aac19 ] The current blkio.throttle.io_service_bytes_recursive doesn't work correctly. As an example, for the following blkcg hierarchy: (Made 1GB READ in test1, 512MB READ in test2) test / \ test1 test2 $ head -n 1 test/test1/blkio.throttle.io_service_bytes_recursive 8:0 Read 1073684480 $ head -n 1 test/test2/blkio.throttle.io_service_bytes_recursive 8:0 Read 537448448 $ head -n 1 test/blkio.throttle.io_service_bytes_recursive 8:0 Read 537448448 Clearly, above data of "test" reflects "test2" not "test1"+"test2". Do the correct summary in blkg_rwstat_recursive_sum(). Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by:  Weilong Chen <chenweilong@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>5 年前
blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT blkg_rwstat is now only used by bfq-iosched and blk-throtl when on cgroup1. Let's move it into its own files and gate it behind a config option. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>6 年前
PSI: add more memory fine grained stall tracking in pressure.stat hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8BCV4 ------------------------------- Introcude more memory fine grianed stall tracking in pressure.stat, such as global memory relcaim, memory compact, memory async cgroup reclaim and swap. Signed-off-by: Lu Jialin <lujialin4@huawei.com>2 年前
block: remove QUEUE_FLAG_DEAD mainline inclusion from mainline-v6.0-rc1 commit 1f90307e5f0d7bc9a336ead528f616a5df8e5944 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBY0UQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f90307e5f0d7bc9a336ead528f616a5df8e5944 ------------------ Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-core.c [Context conflicts.] block/blk-mq-debugfs.c [Context conflicts.] include/linux/blkdev.h [Context conflicts.] drivers/block/mtip32xx/mtip32xx.c [Due to not merging commit e8b58ef09e84 ("mtip32xx: fix device removal").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
block: rename generic_make_request to submit_bio_noacct generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>5 年前
blk-mq: release crypto keyslot before reporting I/O complete stable inclusion from stable-v5.10.180 commit 874bdf43b4a7dc5463c31508f62b3e42eb237b08 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8DDFN Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=874bdf43b4a7dc5463c31508f62b3e42eb237b08 -------------------------------- commit 9cd1e566676bbcb8a126acd921e4e194e6339603 upstream. Once all I/O using a blk_crypto_key has completed, filesystems can call blk_crypto_evict_key(). However, the block layer currently doesn't call blk_crypto_put_keyslot() until the request is being freed, which happens after upper layers have been told (via bio_endio()) the I/O has completed. This causes a race condition where blk_crypto_evict_key() can see 'slot_refs != 0' without there being an actual bug. This makes __blk_crypto_evict_key() hit the 'WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)' and return without doing anything, eventually causing a use-after-free in blk_crypto_reprogram_all_keys(). (This is a very rare bug and has only been seen when per-file keys are being used with fscrypt.) There are two options to fix this: either release the keyslot before bio_endio() is called on the request's last bio, or make __blk_crypto_evict_key() ignore slot_refs. Let's go with the first solution, since it preserves the ability to report bugs (via WARN_ON_ONCE) where a key is evicted while still in-use. Fixes: a892c8d52c02 ("block: Inline encryption support for blk-mq") Cc: stable@vger.kernel.org Reviewed-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>2 年前
blk-crypto: make blk_crypto_evict_key() more robust stable inclusion from stable-v5.10.180 commit 701a8220762ff90615dc91d3543f789391b63298 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8DDFN Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=701a8220762ff90615dc91d3543f789391b63298 -------------------------------- commit 5c7cb94452901a93e90c2230632e2c12a681bc92 upstream. If blk_crypto_evict_key() sees that the key is still in-use (due to a bug) or that ->keyslot_evict failed, it currently just returns while leaving the key linked into the keyslot management structures. However, blk_crypto_evict_key() is only called in contexts such as inode eviction where failure is not an option. So actually the caller proceeds with freeing the blk_crypto_key regardless of the return value of blk_crypto_evict_key(). These two assumptions don't match, and the result is that there can be a use-after-free in blk_crypto_reprogram_all_keys() after one of these errors occurs. (Note, these errors *shouldn't* happen; we're just talking about what happens if they do anyway.) Fix this by making blk_crypto_evict_key() unlink the key from the keyslot management structures even on failure. Also improve some comments. Fixes: 1b2628397058 ("block: Keyslot Manager for Inline Encryption") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>2 年前
block: return errors from blk_execute_rq() mainline inclusion from mainline-v5.14-rc1 commit fb9b16e15cd70e21d8af7f03d700deb9509c2ce8 category: bugfix bugzilla: 185778 https://gitee.com/openeuler/kernel/issues/I4LM14 CVE: NA ----------------------------------------- The synchronous blk_execute_rq() had not provided a way for its callers to know if its request was successful or not. Return the blk_status_t result of the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> conflict: 1. in blkdev.h and blk-exec, blk_execute_rq return value change; 2. input parameter in blk_execute_rq is not the same as mainline; Signed-off-by: zhangwensheng <zhangwensheng5@huawei.com> Reviewed-by: qiulaibin <qiulaibin@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>4 年前
block: fix crash on cmpxchg for request_wrapper hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65K8D CVE: NA -------------------------------- Now that address of request_wrapper is caculated by address of request plus cmd_size, if cmd_size is not aligned to 8 bytes, request_wrapper will end up not aligned to 8 bytes as well, which will crash in arm64 because assembly instruction casal requires that operand address is aligned to 8 bytes: Internal error: Oops: 96000021 [#1] SMP pc : blk_account_io_latency+0x54/0x134 Call trace: blk_account_io_latency+0x54/0x134 blk_account_io_done+0x3c/0x4c __blk_mq_end_request+0x78/0x134 scsi_end_request+0xcc/0x1f0 scsi_io_completion+0x88/0x240 scsi_finish_command+0x104/0x140 scsi_softirq_done+0x90/0x180 blk_mq_complete_request+0x5c/0x70 scsi_mq_done+0x4c/0x100 Fix the problem by declaring request_wrapper as aligned to cachline, and placing it before request. Fixes: 82327165da5c ("blk-mq: don't access request_wrapper if request is not allocated from block layer") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>3 年前
block: return errors from blk_integrity_add mainline inclusion from mainline-v5.15-rc1 commit 614310c9c8ca15359f4e71a5bbd9165897b4d54e category: bugfix bugzilla: 188733 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=614310c9c8ca15359f4e71a5bbd9165897b4d54e ---------------------------------------- Prepare for proper error handling in add_disk. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>2 年前
block: fix UAF from race of ioc_release_fn() and __ioc_clear_queue() hulk inclusion category: bugfix bugzilla: 182666 https://gitee.com/openeuler/kernel/issues/I4DDEL -------------------------- KASAN reports a use-after-free report when doing block test: [293762.535116] ================================================================== [293762.535129] BUG: KASAN: use-after-free in queued_spin_lock_slowpath+0x78/0x4c8 [293762.535133] Write of size 2 at addr ffff8000d5f12bc8 by task sh/9148 [293762.535135] [293762.535145] CPU: 1 PID: 9148 Comm: sh Kdump: loaded Tainted: G W 4.19.90-vhulk2108.6.0.h824.kasan.eulerosv2r10.aarch64 #1 [293762.535148] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [293762.535150] Call trace: [293762.535154] dump_backtrace+0x0/0x310 [293762.535158] show_stack+0x28/0x38 [293762.535165] dump_stack+0xec/0x15c [293762.535172] print_address_description+0x68/0x2d0 [293762.535177] kasan_report+0x130/0x2f0 [293762.535182] __asan_store2+0x80/0xa8 [293762.535189] queued_spin_lock_slowpath+0x78/0x4c8 [293762.535194] __ioc_clear_queue+0x158/0x160 [293762.535198] ioc_clear_queue+0x194/0x258 [293762.535202] elevator_switch_mq+0x64/0x170 [293762.535206] elevator_switch+0x140/0x270 [293762.535211] elv_iosched_store+0x1a4/0x2a0 [293762.535215] queue_attr_store+0x90/0xe0 [293762.535219] sysfs_kf_write+0xa8/0xe8 [293762.535222] kernfs_fop_write+0x1f8/0x378 [293762.535227] __vfs_write+0xe0/0x360 [293762.535233] vfs_write+0xf0/0x270 [293762.535237] ksys_write+0xdc/0x1b8 [293762.535241] __arm64_sys_write+0x50/0x60 [293762.535245] el0_svc_common+0xc8/0x320 [293762.535250] el0_svc_handler+0xf8/0x160 [293762.535253] el0_svc+0x10/0x218 [293762.535254] [293762.535258] Allocated by task 3466763: [293762.535264] kasan_kmalloc+0xe0/0x190 [293762.535269] kasan_slab_alloc+0x14/0x20 [293762.535276] kmem_cache_alloc_node+0x1b4/0x420 [293762.535280] create_task_io_context+0x40/0x210 [293762.535284] generic_make_request_checks+0xc78/0xe38 [293762.535288] generic_make_request+0xf8/0x640 [293762.535394] generic_file_direct_write+0x100/0x268 [293762.535401] __generic_file_write_iter+0x128/0x370 [293762.535467] vfs_iter_write+0x64/0x90 [293762.535489] ovl_write_iter+0x2f8/0x458 [overlay] [293762.535493] __vfs_write+0x264/0x360 [293762.535497] vfs_write+0xf0/0x270 [293762.535501] ksys_write+0xdc/0x1b8 [293762.535505] __arm64_sys_write+0x50/0x60 [293762.535509] el0_svc_common+0xc8/0x320 [293762.535387] ext4_direct_IO+0x3c8/0xe80 [ext4] [293762.535394] generic_file_direct_write+0x100/0x268 [293762.535401] __generic_file_write_iter+0x128/0x370 [293762.535452] ext4_file_write_iter+0x610/0xa80 [ext4] [293762.535457] do_iter_readv_writev+0x28c/0x390 [293762.535463] do_iter_write+0xfc/0x360 [293762.535467] vfs_iter_write+0x64/0x90 [293762.535489] ovl_write_iter+0x2f8/0x458 [overlay] [293762.535493] __vfs_write+0x264/0x360 [293762.535497] vfs_write+0xf0/0x270 [293762.535501] ksys_write+0xdc/0x1b8 [293762.535505] __arm64_sys_write+0x50/0x60 [293762.535509] el0_svc_common+0xc8/0x320 [293762.535513] el0_svc_handler+0xf8/0x160 [293762.535517] el0_svc+0x10/0x218 [293762.535521] [293762.535523] Freed by task 3466763: [293762.535528] __kasan_slab_free+0x120/0x228 [293762.535532] kasan_slab_free+0x10/0x18 [293762.535536] kmem_cache_free+0x68/0x248 [293762.535540] put_io_context+0x104/0x190 [293762.535545] put_io_context_active+0x204/0x2c8 [293762.535549] exit_io_context+0x74/0xa0 [293762.535553] do_exit+0x658/0xae0 [293762.535557] do_group_exit+0x74/0x1a8 [293762.535561] get_signal+0x21c/0xf38 [293762.535564] do_signal+0x10c/0x450 [293762.535568] do_notify_resume+0x224/0x4b0 [293762.535573] work_pending+0x8/0x10 [293762.535574] [293762.535578] The buggy address belongs to the object at ffff8000d5f12bb8 which belongs to the cache blkdev_ioc of size 136 [293762.535582] The buggy address is located 16 bytes inside of 136-byte region [ffff8000d5f12bb8, ffff8000d5f12c40) [293762.535583] The buggy address belongs to the page: [293762.535588] page:ffff7e000357c480 count:1 mapcount:0 mapping:ffff8000d8563c00 index:0x0 [293762.536201] flags: 0x7ffff0000000100(slab) [293762.536540] raw: 07ffff0000000100 ffff7e0003118588 ffff8000d8adb530 ffff8000d8563c00 [293762.536546] raw: 0000000000000000 0000000000140014 00000001ffffffff 0000000000000000 [293762.536551] page dumped because: kasan: bad access detected [293762.536552] [293762.536554] Memory state around the buggy address: [293762.536558] ffff8000d5f12a80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fb fb [293762.536562] ffff8000d5f12b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [293762.536566] >ffff8000d5f12b80: fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb fb [293762.536568] ^ [293762.536572] ffff8000d5f12c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [293762.536576] ffff8000d5f12c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [293762.536577] ================================================================== ioc_release_fn() will destroy icq from ioc->icq_list and __ioc_clear_queue() will destroy icq from request_queue->icq_list. However, the ioc_release_fn() will hold ioc_lock firstly, and free ioc finally. Then __ioc_clear_queue() will get ioc from icq and hold ioc_lock, but ioc has been released, which will result in a use-after-free. CPU0 CPU1 put_io_context elevator_switch_mq queue_work &ioc->release_work ioc_clear_queue ^^^ splice q->icq_list __ioc_clear_queue ^^^get icq from icq_list get ioc from icq->ioc ioc_release_fn spin_lock(ioc->lock) ioc_destroy_icq(icq) spin_unlock(ioc->lock) free(ioc) spin_lock(ioc->lock) <= UAF Fix by grabbing the request_queue->queue_lock in ioc_clear_queue() to avoid this race scene. Signed-off-by: Laibin Qiu <qiulaibin@huawei.com> Link: https://lore.kernel.org/lkml/1c9ad9f2-c487-c793-1ffc-5c3ec0fcc0ae@kernel.dk/ Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>4 年前
blk_iocost: fix more out of bound shifts stable inclusion from stable-v5.10.227 commit 1f61d509257d6a05763d05bf37943b35306522b1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAYRCW CVE: CVE-2024-49933 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1f61d509257d6a05763d05bf37943b35306522b1 -------------------------------- [ Upstream commit 9bce8005ec0dcb23a58300e8522fe4a31da606fa ] Recently running UBSAN caught few out of bound shifts in the ioc_forgive_debts() function: UBSAN: shift-out-of-bounds in block/blk-iocost.c:2142:38 shift exponent 80 is too large for 64-bit type 'u64' (aka 'unsigned long long') ... UBSAN: shift-out-of-bounds in block/blk-iocost.c:2144:30 shift exponent 80 is too large for 64-bit type 'u64' (aka 'unsigned long long') ... Call Trace: <IRQ> dump_stack_lvl+0xca/0x130 __ubsan_handle_shift_out_of_bounds+0x22c/0x280 ? __lock_acquire+0x6441/0x7c10 ioc_timer_fn+0x6cec/0x7750 ? blk_iocost_init+0x720/0x720 ? call_timer_fn+0x5d/0x470 call_timer_fn+0xfa/0x470 ? blk_iocost_init+0x720/0x720 __run_timer_base+0x519/0x700 ... Actual impact of this issue was not identified but I propose to fix the undefined behaviour. The proposed fix to prevent those out of bound shifts consist of precalculating exponent before using it the shift operations by taking min value from the actual exponent and maximum possible number of bits. Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Konstantin Ovsepian <ovs@ovs.to> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240822154137.2627818-1-ovs@ovs.to Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
block: fix rq-qos breakage from skipping rq_qos_done_bio() mainline inclusion from mainline-v5.18-rc1 commit aa1b46dcdc7baaf5fec0be25782ef24b26aa209e bugzilla: https://gitee.com/openeuler/kernel/issues/IB7FJU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa1b46dcdc7baaf5fec0be25782ef24b26aa209e -------------------------------- a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set. While this fixed a potential oops, it also broke blk-iocost by skipping the done_bio callback for merged bios. Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(), rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED distinguishing the former from the latter. rq_qos_done_bio() is not called for bios which wenth through rq_qos_merge(). This royally confuses blk-iocost as the merged bios never finish and are considered perpetually in-flight. One reliably reproducible failure mode is an intermediate cgroup geting stuck active preventing its children from being activated due to the leaf-only rule, leading to loss of control. The following is from resctl-bench protection scenario which emulates isolating a web server like workload from a memory bomb run on an iocost configuration which should yield a reasonable level of protection. # cat /sys/block/nvme2n1/device/model Samsung SSD 970 PRO 512GB # cat /sys/fs/cgroup/io.cost.model 259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025 # cat /sys/fs/cgroup/io.cost.qos 259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00 # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82 lat-imp% 0 0 0 0 0 4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6 Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96% The isolation result of 58.12% is close to what this device would show without any IO control. Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and calling rq_qos_done_bio() on them too. For consistency and clarity, rename BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into rq_qos_done_bio() so that it's next to the code paths that set the flags. With the patch applied, the above same benchmark shows: # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42 2.81 lat-imp% 0 0 0 0 0 2.81 5.73 11.11 13.92 17.53 22.61 4.10 4.68 Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0% Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") Cc: stable@vger.kernel.org # v5.15+ Cc: Ming Lei <ming.lei@redhat.com> Cc: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/Yi7rdrzQEHjJLGKB@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/bio.c block/blk-rq-qos.h include/linux/blk_types.h [Commit 309dca309fc3 ("block: store a block_device pointer in struct bio") replace bi_disk with bi_bdev in struct bio; commit 7eeb22198dcc ("[Huawei] blk-io-hierarchy: support hierarchy stats for bio lifetime") add BIO_HAS_DATA/BIO_HIERARCHY_ACCT.] Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>1 年前
block: check io size before submit discard hulk inclusion category: bugfix bugzilla: 189755, https://gitee.com/openeuler/kernel/issues/I9K0H3 -------------------------------- In __blkdev_issue_discard(), q->limits.discard_granularity is read multi times. WARN_ON_ONCE(!q->limits.discard_granularity) [1] ... q->limits.discard_granularity >> SECTOR_SHIFT [2] ... bio_aligned_discard_max_sectors [3] It can be changed to 0 after check [1], such as submitting ioctl 'LOOP_SET_STATUS'. This is undesirable. If 'discard_granularity' is set to 0, BUG_ON might be triggered and the loop of 'while(nr_sects)' will never exit(see fixes commit). Fix it by checking 'req_sects' before submit io, return error code directly if it is 0. Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()") Signed-off-by: Li Nan <linan122@huawei.com>2 年前
block: fix pin count management when merging same-page segments mainline inclusion from mainline-v6.6-rc1 commit 5905afc2c7bb713d52c7c7585565feecbb686b44 bugzilla: https://gitee.com/src-openeuler/kernel/issues/IANSAC Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5905afc2c7bb713d52c7c7585565feecbb686b44 -------------------------------- There is no need to unpin the added page when adding it to the bio fails as that is done by the loop below. Instead we want to unpin it when adding a single page to the bio more than once as bio_release_pages will only unpin it once. Fixes: d1916c86ccdc ("block: move same page handling from __bio_add_pc_page to the callers") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230905124731.328255-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-map.c [Context differences] Signed-off-by: Yifan Qiao <qiaoyifan4@huawei.com>1 年前
block: fix kabi in struct queue_limits hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9VTE3 CVE: NA -------------------------------- Fix kabi breakage in struct queue_limits. Signed-off-by: Long Li <leo.lilong@huawei.com>1 年前
blk-mq: remove the calling of local_memory_node() We don't need to check whether the node is memoryless numa node before calling allocator interface. SLUB(and SLAB,SLOB) relies on the page allocator to pick a node. Page allocator should deal with memoryless nodes just fine. It has zonelists constructed for each possible nodes. And it will automatically fall back into a node which is closest to the requested node. As long as __GFP_THISNODE is not enforced of course. The code comments of kmem_cache_alloc_node() of SLAB also showed this: * Fallback to other node is possible if __GFP_THISNODE is not set. blk-mq code doesn't set __GFP_THISNODE, so we can remove the calling of local_memory_node(). Signed-off-by: Xianting Tian <tian.xianting@h3c.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>5 年前
block: Cleanup license notice Remove the imprecise and sloppy: "This files is licensed under the GPL." license notice in the top level comment. 1) The file already contains a SPDX license identifier which clearly states that the license of the file is GPL V2 only 2) The notice resolves to GPL v1 or later for scanners which is just contrary to the intent of SPDX identifiers to provide clear and non ambiguous license information. Aside of that the value add of this notice is below zero, Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Fixes: 6a5ac9846508 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>7 年前
block: Decode all flag names in the debugfs output mainline inclusion from mainline-v6.5-rc1 commit d5fb8726f1dea70543a93ab1d7332857f157b7f3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBY0UQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5fb8726f1dea70543a93ab1d7332857f157b7f3 ------------------ See also: * Commit 4d337cebcb1c ("blk-mq: avoid to touch q->elevator without any protection"). * Commit 414dd48e882c ("blk-mq: add tagset quiesce interface"). Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230518222708.1190867-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq-debugfs.c [Due to not merging commit 414dd48e882c ("blk-mq: add tagset quiesce interface") and commit 3222d8c2a7f8 ("block: remove ->rw_page").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
blk-mq: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. When all of these checks are cleaned up, lots of the functions used in the blk-mq-debugfs code can now return void, as no need to check the return value of them either. Overall, this ends up cleaning up the code and making it smaller, always a nice win. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>6 年前
block: Fix blk_mq_*_map_queues() kernel-doc headers This patch avoids that the kernel-doc script complains about these function headers when building with W=1. Cc: Hannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Fixes: ed76e329d74a ("blk-mq: abstract out queue map") # v5.0. Fixes: e42b3867de4b ("blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues") # v5.0. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>6 年前
block: Fix blk_mq_*_map_queues() kernel-doc headers This patch avoids that the kernel-doc script complains about these function headers when building with W=1. Cc: Hannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Fixes: ed76e329d74a ("blk-mq: abstract out queue map") # v5.0. Fixes: e42b3867de4b ("blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues") # v5.0. Reviewed-by: Chaitanya Kulkarni <chiatanya.kulkarni@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>6 年前
blk-mq: avoid to touch q->elevator without any protection mainline inclusion from mainline-v5.19-rc3 commit 4d337cebcb1c27d9b48c48b9a98e939d4552d584 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBY0UQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d337cebcb1c27d9b48c48b9a98e939d4552d584 ------------------ q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq.c [Due to not merging commit 4f481208749a ("blk-mq: prepare for implementing hctx table via xarray").] block/mq-deadline.c [Due to not merging commit 3e9a99eba058 ("block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()").] include/linux/blkdev.h [Context conflicts.] include/linux/elevator.h [Due to not merging commit 2e9bc3465ac5 ("block: move elevator.h to block/").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}() mainline inclusion from mainline-v5.16-rc1 commit 1820f4f0a5e75633ff384167027bf45ed1b10234 category: performance bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1820f4f0a5e75633ff384167027bf45ed1b10234 -------------------------------- To be more concise and consistent in naming, rename blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs(). Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>3 年前
blk-mq: fix possible memleak when register 'hctx' failed stable inclusion from stable-v5.10.163 commit e8022da1fa2fdf2fa204b445dd3354e7a66d085a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8022da1fa2fdf2fa204b445dd3354e7a66d085a ---------------------------------------------------- [ Upstream commit 4b7a21c57b14fbcd0e1729150189e5933f5088e9 ] There's issue as follows when do fault injection test: unreferenced object 0xffff888132a9f400 (size 512): comm "insmod", pid 308021, jiffies 4324277909 (age 509.733s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 08 f4 a9 32 81 88 ff ff ...........2.... 08 f4 a9 32 81 88 ff ff 00 00 00 00 00 00 00 00 ...2............ backtrace: [<00000000e8952bb4>] kmalloc_node_trace+0x22/0xa0 [<00000000f9980e0f>] blk_mq_alloc_and_init_hctx+0x3f1/0x7e0 [<000000002e719efa>] blk_mq_realloc_hw_ctxs+0x1e6/0x230 [<000000004f1fda40>] blk_mq_init_allocated_queue+0x27e/0x910 [<00000000287123ec>] __blk_mq_alloc_disk+0x67/0xf0 [<00000000a2a34657>] 0xffffffffa2ad310f [<00000000b173f718>] 0xffffffffa2af824a [<0000000095a1dabb>] do_one_initcall+0x87/0x2a0 [<00000000f32fdf93>] do_init_module+0xdf/0x320 [<00000000cbe8541e>] load_module+0x3006/0x3390 [<0000000069ed1bdb>] __do_sys_finit_module+0x113/0x1b0 [<00000000a1a29ae8>] do_syscall_64+0x35/0x80 [<000000009cd878b0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fault injection context as follows: kobject_add blk_mq_register_hctx blk_mq_sysfs_register blk_register_queue device_add_disk null_add_dev.part.0 [null_blk] As 'blk_mq_register_hctx' may already add some objects when failed halfway, but there isn't do fallback, caller don't know which objects add failed. To solve above issue just do fallback when add objects failed halfway in 'blk_mq_register_hctx'. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221117022940.873959-1-yebin@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: zhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com>2 年前
blk-mq: fix lockdep hardirq warning in __blk_mq_tag_idle() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAVONP CVE: NA -------------------------------- commit c6f9c0e2d53d ("blk-mq: allow hardware queue to get more tag while sharing a tag set") started to call blk_mq_tag_idle() from hardirq context. And later commit 1b6433f06bda ("blk-mq: fix potential io hang by wrong 'wake_batch'") add spin_unlock_irq() in blk_mq_tag_idle(), which may enable interrupt in hardirq context, and may trigger AA deadlock. Fixes: 1b6433f06bda ("blk-mq: fix potential io hang by wrong 'wake_batch'") Signed-off-by: Yu Kuai <yukuai3@huawei.com>1 年前
blk-mq: fix kabi broken in blk_mq_tags hulk inclusion category: bugfix bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5 CVE: NA -------------------------------- Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>3 年前
blk-mq: Fix typo in comment Fix typo in words: 'vector' and 'query'. Signed-off-by: Gabriela Bittencourt <gabrielabittencourt00@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz>6 年前
blk-mq: fix NULL dereference on q->elevator in blk_mq_elv_switch_none mainline inclusion from mainline-v6.5-rc1 commit 245165658e1c9f95c0fecfe02b9b1ebd30a1198a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICY9NG CVE: CVE-2023-53292 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=245165658e1c9f95c0fecfe02b9b1ebd30a1198a ------------------ After grabbing q->sysfs_lock, q->elevator may become NULL because of elevator switch. Fix the NULL dereference on q->elevator by checking it with lock. Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230616132354.415109-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq.c [Due to not merging commit dd6f7f17bf58 ("block: add proper helpers for elevator_type module refcount management").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>8 个月前
scsi: blk-mq: Return budget token from .get_budget callback mainline inclusion from mainline-v5.13-rc1 commit 2a5a24aa83382a88c43d18a901fab66e6ffe1199 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IB4C27 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a5a24aa83382a88c43d18a901fab66e6ffe1199 ------------------------------------------- SCSI uses a global atomic variable to track queue depth for each LUN/request queue. This doesn't scale well when there are lots of CPU cores and the disk is very fast. It has been observed that IOPS is affected a lot by tracking queue depth via sdev->device_busy in the I/O path. Return budget token from .get_budget callback. The budget token can be passed to driver so that we can replace the atomic variable with sbitmap_queue and alleviate the scaling problems that way. Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Conflicts: block/blk-mq.c [The conflict here is due to inconsistencies in the context caused by 65267b13c8b22 ("blk-mq: Fix potential io hung for shared sbitmap per tagset").] block/blk-mq.h [The conflict here is due to inconsistencies in the context caused by badcb143e9e6b ("block: update nsecs[] in part_") and 2ed9d17cc07fc ("block : don't use cmpxchg64() on").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume() mainline inclusion from mainline-v5.17-rc1 commit 6e1fcab00a23f7fe9f4fe9704905a790efa1eeab category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4ZHSV CVE: NA https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/hisi_sas?id=6e1fcab00a23f7fe9f4fe9704905a790efa1eeab -------------------------------- John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Fujai Ni<nifuijia1@hisilicon.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>4 年前
scsi: block: Do not accept any requests while suspended stable inclusion from stable-5.10.7 commit d55d15a332ec651ccb49c42a8a10c03447fdf418 bugzilla: 47429 -------------------------------- [ Upstream commit 52abca64fd9410ea6c9a3a74eab25663b403d7da ] blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime power management state. Now that SCSI domain validation no longer depends on this behavior, modify the behavior of blk_queue_enter() as follows: - Do not accept any requests while suspended. - Only process power management requests while suspending or resuming. Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended causes runtime-suspended devices not to resume as they should. The request which should cause a runtime resume instead gets issued directly, without resuming the device first. Of course the device can't handle it properly, the I/O fails, and the device remains suspended. The problem is fixed by checking that the queue's runtime-PM status isn't RPM_SUSPENDED before allowing a request to be issued, and queuing a runtime-resume request if it is. In particular, the inline blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the code is unified by merging the surrounding checks into the routine. If the queue isn't set up for runtime PM, or there currently is no restriction on allowed requests, the request is allowed. Likewise if the BLK_MQ_REQ_PM flag is set and the status isn't RPM_SUSPENDED. Otherwise a runtime resume is queued and the request is blocked until conditions are more suitable. [ bvanassche: modified commit message and removed Cc: stable because without the previous patches from this series this patch would break parallel SCSI domain validation + introduced queue_rpm_status() ] Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Martin Kepplinger <martin.kepplinger@puri.sm> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com>5 年前
blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race stable inclusion from stable-v5.10.228 commit 3bc6d0f8b70a9101456cf02ab99acb75254e1852 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IB0ENF CVE: CVE-2024-50082 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3bc6d0f8b70a9101456cf02ab99acb75254e1852 --------------------------- commit e972b08b91ef48488bae9789f03cfedb148667fb upstream. We're seeing crashes from rq_qos_wake_function that look like this: BUG: unable to handle page fault for address: ffffafe180a40084 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 100000067 P4D 100000067 PUD 10027c067 PMD 10115d067 PTE 0 Oops: Oops: 0002 [#1] PREEMPT SMP PTI CPU: 17 UID: 0 PID: 0 Comm: swapper/17 Not tainted 6.12.0-rc3-00013-geca631b8fe80 #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:_raw_spin_lock_irqsave+0x1d/0x40 Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 54 9c 41 5c fa 65 ff 05 62 97 30 4c 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 0a 4c 89 e0 41 5c c3 cc cc cc cc 89 c6 e8 2c 0b 00 RSP: 0018:ffffafe180580ca0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffafe180a3f7a8 RCX: 0000000000000011 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffafe180a40084 RBP: 0000000000000000 R08: 00000000001e7240 R09: 0000000000000011 R10: 0000000000000028 R11: 0000000000000888 R12: 0000000000000002 R13: ffffafe180a40084 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff9aaf1f280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffafe180a40084 CR3: 000000010e428002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> try_to_wake_up+0x5a/0x6a0 rq_qos_wake_function+0x71/0x80 __wake_up_common+0x75/0xa0 __wake_up+0x36/0x60 scale_up.part.0+0x50/0x110 wb_timer_fn+0x227/0x450 ... So rq_qos_wake_function() calls wake_up_process(data->task), which calls try_to_wake_up(), which faults in raw_spin_lock_irqsave(&p->pi_lock). p comes from data->task, and data comes from the waitqueue entry, which is stored on the waiter's stack in rq_qos_wait(). Analyzing the core dump with drgn, I found that the waiter had already woken up and moved on to a completely unrelated code path, clobbering what was previously data->task. Meanwhile, the waker was passing the clobbered garbage in data->task to wake_up_process(), leading to the crash. What's happening is that in between rq_qos_wake_function() deleting the waitqueue entry and calling wake_up_process(), rq_qos_wait() is finding that it already got a token and returning. The race looks like this: rq_qos_wait() rq_qos_wake_function() ============================================================== prepare_to_wait_exclusive() data->got_token = true; list_del_init(&curr->entry); if (data.got_token) break; finish_wait(&rqw->wait, &data.wq); ^- returns immediately because list_empty_careful(&wq_entry->entry) is true ... return, go do something else ... wake_up_process(data->task) (NO LONGER VALID!)-^ Normally, finish_wait() is supposed to synchronize against the waker. But, as noted above, it is returning immediately because the waitqueue entry has already been removed from the waitqueue. The bug is that rq_qos_wake_function() is accessing the waitqueue entry AFTER deleting it. Note that autoremove_wake_function() wakes the waiter and THEN deletes the waitqueue entry, which is the proper order. Fix it by swapping the order. We also need to use list_del_init_careful() to match the list_empty_careful() in finish_wait(). Fixes: 38cfb5a45ee0 ("blk-wbt: improve waking of tasks") Cc: stable@vger.kernel.org Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/d3bee2463a67b1ee597211823bf7ad3721c26e41.1729014591.git.osandov@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
block: fix rq-qos breakage from skipping rq_qos_done_bio() mainline inclusion from mainline-v5.18-rc1 commit aa1b46dcdc7baaf5fec0be25782ef24b26aa209e bugzilla: https://gitee.com/openeuler/kernel/issues/IB7FJU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa1b46dcdc7baaf5fec0be25782ef24b26aa209e -------------------------------- a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set. While this fixed a potential oops, it also broke blk-iocost by skipping the done_bio callback for merged bios. Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(), rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED distinguishing the former from the latter. rq_qos_done_bio() is not called for bios which wenth through rq_qos_merge(). This royally confuses blk-iocost as the merged bios never finish and are considered perpetually in-flight. One reliably reproducible failure mode is an intermediate cgroup geting stuck active preventing its children from being activated due to the leaf-only rule, leading to loss of control. The following is from resctl-bench protection scenario which emulates isolating a web server like workload from a memory bomb run on an iocost configuration which should yield a reasonable level of protection. # cat /sys/block/nvme2n1/device/model Samsung SSD 970 PRO 512GB # cat /sys/fs/cgroup/io.cost.model 259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025 # cat /sys/fs/cgroup/io.cost.qos 259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00 # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82 lat-imp% 0 0 0 0 0 4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6 Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96% The isolation result of 58.12% is close to what this device would show without any IO control. Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and calling rq_qos_done_bio() on them too. For consistency and clarity, rename BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into rq_qos_done_bio() so that it's next to the code paths that set the flags. With the patch applied, the above same benchmark shows: # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42 2.81 lat-imp% 0 0 0 0 0 2.81 5.73 11.11 13.92 17.53 22.61 4.10 4.68 Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0% Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") Cc: stable@vger.kernel.org # v5.15+ Cc: Ming Lei <ming.lei@redhat.com> Cc: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/Yi7rdrzQEHjJLGKB@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/bio.c block/blk-rq-qos.h include/linux/blk_types.h [Commit 309dca309fc3 ("block: store a block_device pointer in struct bio") replace bi_disk with bi_bdev in struct bio; commit 7eeb22198dcc ("[Huawei] blk-io-hierarchy: support hierarchy stats for bio lifetime") add BIO_HAS_DATA/BIO_HIERARCHY_ACCT.] Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>1 年前
block: avoid possible overflow for chunk_sectors check in blk_stack_limits() stable inclusion from stable-v5.10.241 commit 418751910044649baa2b424ea31cce3fc4dcc253 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICXSWT CVE: CVE-2025-39795 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=418751910044649baa2b424ea31cce3fc4dcc253 ------------------ [ Upstream commit 448dfecc7ff807822ecd47a5c052acedca7d09e8 ] In blk_stack_limits(), we check that the t->chunk_sectors value is a multiple of the t->physical_block_size value. However, by finding the chunk_sectors value in bytes, we may overflow the unsigned int which holds chunk_sectors, so change the check to be based on sectors. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250729091448.1691334-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>8 个月前
block: prevent division by zero in blk_rq_stat_sum() stable inclusion from stable-v5.10.215 commit b0cb5564c3e8e0ee0a2d28c86fa7f02e82d64c3c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9QGM9 CVE: CVE-2024-35925 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b0cb5564c3e8e0ee0a2d28c86fa7f02e82d64c3c -------------------------------- The expression dst->nr_samples + src->nr_samples may have zero value on overflow. It is necessary to add a check to avoid division by zero. Found by Linux Verification Center (linuxtesting.org) with Svace. Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20240305134509.23108-1-r.smirnov@omp.ru Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Li Nan <linan122@huawei.com>2 年前
block: deactivate blk_stat timer in wbt_disable_default() rwb_enabled() can't be changed when there is any inflight IO. wbt_disable_default() may set rwb->wb_normal as zero, however the blk_stat timer may still be pending, and the timer function will update wrb->wb_normal again. This patch introduces blk_stat_deactivate() and applies it in wbt_disable_default(), then the following IO hang triggered when running parted & switching io scheduler can be fixed: [ 369.937806] INFO: task parted:3645 blocked for more than 120 seconds. [ 369.938941] Not tainted 4.20.0-rc6-00284-g906c801e5248 #498 [ 369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.940768] parted D 0 3645 3239 0x00000000 [ 369.941500] Call Trace: [ 369.941874] ? __schedule+0x6d9/0x74c [ 369.942392] ? wbt_done+0x5e/0x5e [ 369.942864] ? wbt_cleanup_cb+0x16/0x16 [ 369.943404] ? wbt_done+0x5e/0x5e [ 369.943874] schedule+0x67/0x78 [ 369.944298] io_schedule+0x12/0x33 [ 369.944771] rq_qos_wait+0xb5/0x119 [ 369.945193] ? karma_partition+0x1c2/0x1c2 [ 369.945691] ? wbt_cleanup_cb+0x16/0x16 [ 369.946151] wbt_wait+0x85/0xb6 [ 369.946540] __rq_qos_throttle+0x23/0x2f [ 369.947014] blk_mq_make_request+0xe6/0x40a [ 369.947518] generic_make_request+0x192/0x2fe [ 369.948042] ? submit_bio+0x103/0x11f [ 369.948486] ? __radix_tree_lookup+0x35/0xb5 [ 369.949011] submit_bio+0x103/0x11f [ 369.949436] ? blkg_lookup_slowpath+0x25/0x44 [ 369.949962] submit_bio_wait+0x53/0x7f [ 369.950469] blkdev_issue_flush+0x8a/0xae [ 369.951032] blkdev_fsync+0x2f/0x3a [ 369.951502] do_fsync+0x2e/0x47 [ 369.951887] __x64_sys_fsync+0x10/0x13 [ 369.952374] do_syscall_64+0x89/0x149 [ 369.952819] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 369.953492] RIP: 0033:0x7f95a1e729d4 [ 369.953996] Code: Bad RIP value. [ 369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [ 369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4 [ 369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004 [ 369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0 [ 369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380 [ 369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008 Cc: stable@vger.kernel.org Cc: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>7 年前
block: fix lock ordering in blk_register_queue() error path hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IC0KIA -------------------------------- The error path in blk_register_queue() when elv_register_queue() fails was releasing q->sysfs_dir_lock too early, before calling blk_mq_unregister_dev() to release dev resources. This incorrect order caused lockdep warnings. Move the mutex_unlock(&q->sysfs_dir_lock) to the end of the error handling sequence to ensure the lock remains held during all cleanup operations, preventing potential race conditions. Fixes: 59ed4a531cc4 ("block: fix resource leak in elevator registration error path") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
blk-throttle: check for overflow in calculate_bytes_allowed mainline inclusion from mainline-v6.7-rc1 commit 2dd710d476f2f1f6eaca884f625f69ef4389ed40 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA8QFN Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2dd710d476f2f1f6eaca884f625f69ef4389ed40 -------------------------------- Inexact, we may reject some not-overflowing values incorrectly, but they'll be on the order of exabytes allowed anyways. This fixes divide error crash on x86 if bps_limit is not configured or is set too high in the rare case that jiffy_elapsed is greater than HZ. Fixes: e8368b57c006 ("blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()") Fixes: 8d6bbaada2e0 ("blk-throttle: prevent overflow while calculating wait time") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231020223617.2739774-1-khazhy@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>1 年前
block: blk-timeout: delete duplicated word Drop the repeated word "request". Change to the correct kernel-doc notation for function name separtor. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>5 年前
blk-wbt: don't show valid wbt_lat_usec in sysfs while wbt is disabled mainline inclusion from mainline-v6.2-rc1 commit 3642ef4d95699193c4a461862382e643ae3720f0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z1UG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc7&id=3642ef4d95699193c4a461862382e643ae3720f0 ---------------------------------------- Currently, if wbt is initialized and then disabled by wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which will confuse users that wbt is still enabled. This patch shows wbt_lat_usec as zero if it's disabled. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reported-and-tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019121518.3865235-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>2 年前
blk-wbt: don't show valid wbt_lat_usec in sysfs while wbt is disabled mainline inclusion from mainline-v6.2-rc1 commit 3642ef4d95699193c4a461862382e643ae3720f0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z1UG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc7&id=3642ef4d95699193c4a461862382e643ae3720f0 ---------------------------------------- Currently, if wbt is initialized and then disabled by wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which will confuse users that wbt is still enabled. This patch shows wbt_lat_usec as zero if it's disabled. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reported-and-tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019121518.3865235-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>2 年前
blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN stable inclusion from stable-5.10.67 commit d15554f98597d89bcae43464a2b18b665ca6acfa bugzilla: 182619 https://gitee.com/openeuler/kernel/issues/I4EWO7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d15554f98597d89bcae43464a2b18b665ca6acfa -------------------------------- commit 4d643b66089591b4769bcdb6fd1bfeff2fe301b8 upstream. A user space process should not need the CAP_SYS_ADMIN capability set in order to perform a BLKREPORTZONE ioctl. Getting the zone report is required in order to get the write pointer. Neither read() nor write() requires CAP_SYS_ADMIN, so it is reasonable that a user space process that can read/write from/to the device, also can get the write pointer. (Since e.g. writes have to be at the write pointer.) Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: stable@vger.kernel.org # v4.10+ Link: https://lore.kernel.org/r/20210811110505.29649-3-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>4 年前
block: reuse BIO_INLINE_VECS for integrity bvecs mainline inclusion from mainline-v5.12-rc1 commit dc0b8a57ad7b05036fcb19a5bf0319467597e67a bugzilla: https://gitee.com/openeuler/kernel/issues/IB7FJU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dc0b8a57ad7b05036fcb19a5bf0319467597e67a -------------------------------- bvec_alloc always uses biovec_slabs, and thus always needs to use the same number of inline vecs. Share a single definition for the data and integrity bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/bio.c block/blk.h [Commit 82b10af05f7a ("[Huawei] blk-io-hierarchy: factor out a new struct bio_hierarchy_data from bio") add inclusion of "blk-io-hierarchy/stats.h"; commit eec716a1c18c ("block: move three bvec helpers declaration into private helper") move declarations of three bvec helpers after blk_freeze_queue.] Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>1 年前
block: make bio_crypt_clone() able to fail bio_crypt_clone() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, bio_crypt_clone() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via kcryptd_io_read() in drivers/md/dm-crypt.c. Neither case is currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make bio_crypt_clone() able to fail, analogous to bio_integrity_clone(). Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Satya Tangirala <satyat@google.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>5 年前
block: drop double zeroing sg_init_table zeroes its first argument, so the allocation of that argument doesn't have to. the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ x = - kzalloc + kmalloc (...) ... sg_init_table(x,...) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jens Axboe <axboe@kernel.dk>5 年前
scsi: bsg: Remove support for SCSI_IOCTL_SEND_COMMAND stable inclusion from stable-5.10.67 commit f1eccc40816852e8a74a70cace71bdff336e14cb bugzilla: 182619 https://gitee.com/openeuler/kernel/issues/I4EWO7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f1eccc40816852e8a74a70cace71bdff336e14cb -------------------------------- [ Upstream commit beec64d0c9749afedf51c3c10cf52de1d9a89cc0 ] SCSI_IOCTL_SEND_COMMAND has been deprecated longer than bsg exists and has been warning for just as long. More importantly it harcodes SCSI CDBs and thus will do the wrong thing on non-SCSI bsg nodes. Link: https://lore.kernel.org/r/20210724072033.1284840-2-hch@lst.de Fixes: aa387cc89567 ("block: add bsg helper library") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>4 年前
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>8 年前
blk-mq: use quiesced elevator switch when reinitializing queues mainline inclusion from mainline-v6.1-rc1 commit 8237c01f1696bc53c470493bf1fe092a107648a6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6GI6F CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=8237c01f1696bc53c470493bf1fe092a107648a6 -------------------------------- The hctx's run_work may be racing with the elevator switch when reinitializing hardware queues. The queue is merely frozen in this context, but that only prevents requests from allocating and doesn't stop the hctx work from running. The work may get an elevator pointer that's being torn down, and can result in use-after-free errors and kernel panics (example below). Use the quiesced elevator switch instead, and make the previous one static since it is now only used locally. nvme nvme0: resetting controller nvme nvme0: 32/0/0 default/read/poll queues BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000020c8861067 P4D 80000020c8861067 PUD 250f8c8067 PMD 0 Oops: 0000 [#1] SMP PTI Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:kyber_has_work+0x29/0x70 ... Call Trace: __blk_mq_do_dispatch_sched+0x83/0x2b0 __blk_mq_sched_dispatch_requests+0x12e/0x170 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x2b/0x50 process_one_work+0x1ef/0x380 worker_thread+0x2d/0x3e0 Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220927155652.3260724-1-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq.c block/blk.h Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>3 年前
block: add a flag to make put_disk on partially initalized disks safer mainline inclusion from mainline-v5.14-rc1 commit 958229a7c55f219b1cff99f939dabbc1b6ba7161 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I81XCK Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=958229a7c55f219b1cff99f939dabbc1b6ba7161 ---------------------------------------- Add a flag to indicate that __device_add_disk did grab a queue reference so that disk_release only drops it if we actually had it. This sort out one of the major pitfals with partially initialized gendisk that a lot of drivers did get wrong or still do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> conflicts: block/genhd.c include/linux/genhd.h Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Signed-off-by: Li Nan <linan122@huawei.com>2 年前
block: fix overflow in blk_ioctl_discard() mainline inclusion from mainline-v6.9-rc3 commit 22d24a544b0d49bbcbd61c8c0eaf77d3c9297155 category: bugfix bugzilla: 189755, https://gitee.com/openeuler/kernel/issues/I9K0H3 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22d24a544b0d49bbcbd61c8c0eaf77d3c9297155 -------------------------------- There is no check for overflow of 'start + len' in blk_ioctl_discard(). Hung task occurs if submit an discard ioctl with the following param: start = 0x80000000000ff000, len = 0x8000000000fff000; Add the overflow validation now. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240329012319.2034550-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflict: block/ioctl.c Signed-off-by: Li Nan <linan122@huawei.com>2 年前
block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) stable inclusion from stable-v5.10.85 commit 5dfe61147442cb9eedeb282630abf304144bc71c bugzilla: 186032 https://gitee.com/openeuler/kernel/issues/I4QVI4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5dfe61147442cb9eedeb282630abf304144bc71c -------------------------------- commit e6a59aac8a8713f335a37d762db0dbe80e7f6d38 upstream. do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. Serialize ioprio_get to take the tasklist_lock in this case, just like it's set counterpart. Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()) Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>4 年前
blk-crypto: make blk_crypto_evict_key() more robust stable inclusion from stable-v5.10.180 commit 701a8220762ff90615dc91d3543f789391b63298 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8DDFN Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=701a8220762ff90615dc91d3543f789391b63298 -------------------------------- commit 5c7cb94452901a93e90c2230632e2c12a681bc92 upstream. If blk_crypto_evict_key() sees that the key is still in-use (due to a bug) or that ->keyslot_evict failed, it currently just returns while leaving the key linked into the keyslot management structures. However, blk_crypto_evict_key() is only called in contexts such as inode eviction where failure is not an option. So actually the caller proceeds with freeing the blk_crypto_key regardless of the return value of blk_crypto_evict_key(). These two assumptions don't match, and the result is that there can be a use-after-free in blk_crypto_reprogram_all_keys() after one of these errors occurs. (Note, these errors *shouldn't* happen; we're just talking about what happens if they do anyway.) Fix this by making blk_crypto_evict_key() unlink the key from the keyslot management structures even on failure. Also improve some comments. Fixes: 1b2628397058 ("block: Keyslot Manager for Inline Encryption") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>2 年前
blk-mq: avoid to touch q->elevator without any protection mainline inclusion from mainline-v5.19-rc3 commit 4d337cebcb1c27d9b48c48b9a98e939d4552d584 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBY0UQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d337cebcb1c27d9b48c48b9a98e939d4552d584 ------------------ q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq.c [Due to not merging commit 4f481208749a ("blk-mq: prepare for implementing hctx table via xarray").] block/mq-deadline.c [Due to not merging commit 3e9a99eba058 ("block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()").] include/linux/blkdev.h [Context conflicts.] include/linux/elevator.h [Due to not merging commit 2e9bc3465ac5 ("block: move elevator.h to block/").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
blk-mq: avoid to touch q->elevator without any protection mainline inclusion from mainline-v5.19-rc3 commit 4d337cebcb1c27d9b48c48b9a98e939d4552d584 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IBY0UQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d337cebcb1c27d9b48c48b9a98e939d4552d584 ------------------ q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-mq.c [Due to not merging commit 4f481208749a ("blk-mq: prepare for implementing hctx table via xarray").] block/mq-deadline.c [Due to not merging commit 3e9a99eba058 ("block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()").] include/linux/blkdev.h [Context conflicts.] include/linux/elevator.h [Due to not merging commit 2e9bc3465ac5 ("block: move elevator.h to block/").] Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>1 年前
block: sed-opal: Change the check condition for regular session validity This patch changes the check condition for the validity/authentication of the session. 1. The Host Session Number(HSN) in the response should match the HSN for the session. 2. The TPER Session Number(TSN) can never be less than 4096 for a regular session. Reference: Section 3.2.2.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Opal_SSC_Application_Note_1-00_1-00-Final.pdf Section 3.3.7.1.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Architecture_Core_Spec_v2.01_r1.00.pdf Co-developed-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>6 年前
Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver updates for 5.10. A few SCSI updates in here too, in coordination with Martin as they depend on core block changes for the shared tag bitmap. This contains: - NVMe pull requests via Christoph: - fix keep alive timer modification (Amit Engel) - order the PCI ID list more sensibly (Andy Shevchenko) - cleanup the open by controller helper (Chaitanya Kulkarni) - use an xarray for the CSE log lookup (Chaitanya Kulkarni) - support ZNS in nvmet passthrough mode (Chaitanya Kulkarni) - fix nvme_ns_report_zones (Christoph Hellwig) - add a sanity check to nvmet-fc (James Smart) - fix interrupt allocation when too many polled queues are specified (Jeffle Xu) - small nvmet-tcp optimization (Mark Wunderlich) - fix a controller refcount leak on init failure (Chaitanya Kulkarni) - misc cleanups (Chaitanya Kulkarni) - major refactoring of the scanning code (Christoph Hellwig) - MD updates via Song: - Bug fixes in bitmap code, from Zhao Heming - Fix a work queue check, from Guoqing Jiang - Fix raid5 oops with reshape, from Song Liu - Clean up unused code, from Jason Yan - Discard improvements, from Xiao Ni - raid5/6 page offset support, from Yufen Yu - Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap, Hannes) - null_blk open/active zone limit support (Niklas) - Set of bcache updates (Coly, Dongsheng, Qinglang)" * tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits) md/raid5: fix oops during stripe resizing md/bitmap: fix memory leak of temporary bitmap md: fix the checking of wrong work queue md/bitmap: md_bitmap_get_counter returns wrong blocks md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks md/raid0: remove unused function is_io_in_chunk_boundary() nvme-core: remove extra condition for vwc nvme-core: remove extra variable nvme: remove nvme_identify_ns_list nvme: refactor nvme_validate_ns nvme: move nvme_validate_ns nvme: query namespace identifiers before adding the namespace nvme: revalidate zone bitmaps in nvme_update_ns_info nvme: remove nvme_update_formats nvme: update the known admin effects nvme: set the queue limits in nvme_update_ns_info nvme: remove the 0 lba_shift check in nvme_update_ns_info nvme: clean up the check for too large logic block sizes nvme: freeze the queue over ->lba_shift updates nvme: factor out a nvme_configure_metadata helper ...5 年前
block: sed-opal: kmalloc the cmd/resp buffers stable inclusion from stable-v5.10.156 commit 58636b5ff3f654fd348ad217f9ac7a18201d4164 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7MCG1 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=58636b5ff3f654fd348ad217f9ac7a18201d4164 -------------------------------- [ Upstream commit f829230dd51974c1f4478900ed30bb77ba530b40 ] In accordance with [1] the DMA-able memory buffers must be cacheline-aligned otherwise the cache writing-back and invalidation performed during the mapping may cause the adjacent data being lost. It's specifically required for the DMA-noncoherent platforms [2]. Seeing the opal_dev.{cmd,resp} buffers are implicitly used for DMAs in the NVME and SCSI/SD drivers in framework of the nvme_sec_submit() and sd_sec_submit() methods respectively they must be cacheline-aligned to prevent the denoted problem. One of the option to guarantee that is to kmalloc the buffers [2]. Let's explicitly allocate them then instead of embedding into the opal_dev structure instance. Note this fix was inspired by the commit c94b7f9bab22 ("nvme-hwmon: kmalloc the NVME SMART log buffer"). [1] Documentation/core-api/dma-api.rst [2] Documentation/core-api/dma-api-howto.rst Fixes: 455a7b238cd6 ("block: Add Sed-opal library") Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221107203944.31686-1-Sergey.Semin@baikalelectronics.ru Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com>2 年前
block: Allow t10-pi to be modular Currently t10-pi can only be built into the block layer which via crc-t10dif pulls in a whole chunk of the Crypto API. In fact all users of t10-pi work as modules and there is no reason for it to always be built-in. This patch adds a new hidden option for t10-pi that is selected automatically based on BLK_DEV_INTEGRITY and whether the users of t10-pi are built-in or not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>6 年前