| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
bpf: Fix regsafe() for pointers to packet mainline inclusion from mainline-v7.0-rc7 commit a8502a79e832b861e99218cbd2d8f4312d62e225 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8900 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a8502a79e832 -------------------------------- In case rold->reg->range == BEYOND_PKT_END && rcur->reg->range == N regsafe() may return true which may lead to current state with valid packet range not being explored. Fix the bug. Fixes: 6d94e741a8ff ("bpf: Support for pointers beyond pkt_end.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260331204228.26726-1-alexei.starovoitov@gmail.com Signed-off-by: Pu Lehui <pulehui@huawei.com> | 1 个月前 | |
cgroup/cpuset.c: Comment out early when prefer_cpus cpumask equals hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8397 CVE: NA ---------------------------------------- The current logic skips updating task's prefer_cpus in a cgroup when the cgroup's prefer_cpus is unchanged. However, task's prefer_cpus may mismatch the cgroup config in some cases, causing inconsistent CPU affinity. Comment out the early return logic to ensure task's prefer_cpus are always updated on cgroup.prefer_cpus writes, aligning task and cgroup settings. Fixes: 243865da2684 ("cpuset: Introduce new interface for scheduler dynamic affinity") Signed-off-by: Chen Jinghuang <chenjinghuang2@huawei.com> | 27 天前 | |
Revert "compiler: remove CONFIG_OPTIMIZE_INLINING entirely" hulk inclusion category: bugfix bugzilla: 182617 https://gitee.com/openeuler/kernel/issues/I4DDEL -------------------------------- This reverts commit 889b3c1245de48ed0cacf7aebb25c489d3e4a3e9. Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
kdb: Fix buffer overflow during tab-complete stable inclusion from stable-v5.10.219 commit cfdc2fa4db57503bc6d3817240547c8ddc55fa96 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAB05N CVE: CVE-2024-39480 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cfdc2fa4db57503bc6d3817240547c8ddc55fa96 -------------------------------- commit e9730744bf3af04cda23799029342aa3cddbc454 upstream. Currently, when the user attempts symbol completion with the Tab key, kdb will use strncpy() to insert the completed symbol into the command buffer. Unfortunately it passes the size of the source buffer rather than the destination to strncpy() with predictably horrible results. Most obviously if the command buffer is already full but cp, the cursor position, is in the middle of the buffer, then we will write past the end of the supplied buffer. Fix this by replacing the dubious strncpy() calls with memmove()/memcpy() calls plus explicit boundary checks to make sure we have enough space before we start moving characters around. Reported-by: Justin Stitt <justinstitt@google.com> Closes: https://lore.kernel.org/all/CAFhGd8qESuuifuHsNjFPR-Va3P80bxrw+LqvC8deA8GziUJLpw@mail.gmail.com/ Cc: stable@vger.kernel.org Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Tested-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-1-f236dbe9828d@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)") Signed-off-by: Tengda Wu <wutengda2@huawei.com> | 1 年前 | |
x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests mainline inclusion from mainline-v5.11 commit e998879d4fb7991856916972168cf27c0d86ed12 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8295 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e998879d4fb7991856916972168cf27c0d86ed12 --------------------------- For SEV, all DMA to and from guest has to use shared (un-encrypted) pages. SEV uses SWIOTLB to make this happen without requiring changes to device drivers. However, depending on the workload being run, the default 64MB of it might not be enough and it may run out of buffers to use for DMA, resulting in I/O errors and/or performance degradation for high I/O workloads. Adjust the default size of SWIOTLB for SEV guests using a percentage of the total memory available to guest for the SWIOTLB buffers. Adds a new sev_setup_arch() function which is invoked from setup_arch() and it calls into a new swiotlb generic code function swiotlb_adjust_size() to do the SWIOTLB buffer adjustment. v5 fixed build errors and warnings as Reported-by: kbuild test robot <lkp@intel.com> Hygon-SIG: commit e998879d4fb7 upstream x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Zhiguang Ni <nizhiguang@hygon.cn> | 5 个月前 | |
entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up mainline inclusion from mainline-v6.3-rc4 commit b416514054810cf2d2cc348ae477cea619b64da7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9NZ3E Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b416514054810cf2d2cc348ae477cea619b64da7 -------------------------------- RCU sometimes needs to perform a delayed wake up for specific kthreads handling offloaded callbacks (RCU_NOCB). These wakeups are performed by timers and upon entry to idle (also to guest and to user on nohz_full). However the delayed wake-up on kernel exit is actually performed after the thread flags are fetched towards the fast path check for work to do on exit to user. As a result, and if there is no other pending work to do upon that kernel exit, the current task will resume to userspace with TIF_RESCHED set and the pending wake up ignored. Fix this with fetching the thread flags _after_ the delayed RCU-nocb kthread wake-up. Fixes: 47b8ff194c1f ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230315194349.10798-3-joel@joelfernandes.org Signed-off-by: Wei Li <liwei391@huawei.com> | 1 年前 | |
perf: Fix Wdeclaration-after-statement hulk inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13907 CVE: CVE-2026-23271 -------------------------------- ISO C90 forbids mixed declarations and code. So fix it. Fixes: 95a89b18638f ("perf: Fix __perf_event_overflow() vs perf_remove_from_context() race") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> | 2 个月前 | |
futex: Clear stale exiting pointer in futex_lock_pi() retry path stable inclusion from stable-v5.10.253 commit 33095ae3bdde5e5c264d7e88a2f3e7703a26c7aa category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14301 CVE: CVE-2026-31555 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=33095ae3bdde5e5c264d7e88a2f3e7703a26c7aa -------------------------------- commit 210d36d892de5195e6766c45519dfb1e65f3eb83 upstream. Fuzzying/stressing futexes triggered: WARNING: kernel/futex/core.c:825 at wait_for_owner_exiting+0x7a/0x80, CPU#11: futex_lock_pi_s/524 When futex_lock_pi_atomic() sees the owner is exiting, it returns -EBUSY and stores a refcounted task pointer in 'exiting'. After wait_for_owner_exiting() consumes that reference, the local pointer is never reset to nil. Upon a retry, if futex_lock_pi_atomic() returns a different error, the bogus pointer is passed to wait_for_owner_exiting(). CPU0 CPU1 CPU2 futex_lock_pi(uaddr) // acquires the PI futex exit() futex_cleanup_begin() futex_state = EXITING; futex_lock_pi(uaddr) futex_lock_pi_atomic() attach_to_pi_owner() // observes EXITING *exiting = owner; // takes ref return -EBUSY wait_for_owner_exiting(-EBUSY, owner) put_task_struct(); // drops ref // exiting still points to owner goto retry; futex_lock_pi_atomic() lock_pi_update_atomic() cmpxchg(uaddr) *uaddr ^= WAITERS // whatever // value changed return -EAGAIN; wait_for_owner_exiting(-EAGAIN, exiting) // stale WARN_ON_ONCE(exiting) Fix this by resetting upon retry, essentially aligning it with requeue_pi. Fixes: 3ef240eaff36 ("futex: Prevent exit livelock") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260326001759.4129680-1-dave@stgolabs.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiacheng Yu <yujiacheng3@huawei.com> | 1 个月前 | |
gcov: add support for checksum field stable inclusion from stable-v5.10.163 commit f24474d12e68dfb3fa051ece57ff6bb11361d7f8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f24474d12e68dfb3fa051ece57ff6bb11361d7f8 ---------------------------------------------------- commit e96b95c2b7a63a454b6498e2df67aac14d046d13 upstream. In GCC version 12.1 a checksum field was added. This patch fixes a kernel crash occurring during boot when using gcov-kernel with GCC version 12.2. The crash occurred on a system running on i.MX6SX. Link: https://lkml.kernel.org/r/20221220102318.3418501-1-rickaran@axis.com Fixes: 977ef30a7d88 ("gcov: support GCC 12.1 and newer compilers") Signed-off-by: Rickard x Andersson <rickaran@axis.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Martin Liska <mliska@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: zhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com> | 2 年前 | |
!16386 v5 Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields Merge Pull Request from: @ci-robot PR sync from: Liu Chao <liuchao173@huawei.com> https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/TFF5EY37GFM2XD4YQQLDG4V24H3BT2TL/ mainline inclusion from mainline-v5.11-rc1 commit 2e730cb56b2cd1626fecaf23ef1537fb24721ef2 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IC9GHN Reference: https://lore.kernel.org/all/20201024213535.443185-1-dwmw2@infradead.org/ -------------------------------- Fix the conditions for enabling x2apic on guests without interrupt remapping, and support 15-bit Extended Destination ID to allow 32768 CPUs without IR on hypervisors that support it. Make the I/OAPIC code generate its RTE directly from the MSI message created by the parent irqchip, and fix up a bunch of magic mask/shift macros to use bitfields for MSI messages and I/OAPIC RTEs while we're at it. v5: add new commit d981059e13ff ("x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it") v4: add new bugfix commit 4b8ef157ca83. v3: add new bugfix commits. v2: adapt ioapic_sync_hardware_data. David Woodhouse (17): x86/hpet: Move MSI support into hpet.c x86/ioapic: Generate RTE directly from parent irqchip's MSI message genirq/irqdomain: Implement get_name() method on irqchip fwnodes x86/apic: Add select() method on vector irqdomain iommu/amd: Implement select() method on remapping irqdomain iommu/vt-d: Implement select() method on remapping irqdomain iommu/hyper-v: Implement select() method on remapping irqdomain x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain x86: Kill all traces of irq_remapping_get_irq_domain() iommu/vt-d: Simplify intel_irq_remapping_select() x86/ioapic: Handle Extended Destination ID field in RTE x86/apic: Support 15 bits of APIC ID in MSI where available iommu/hyper-v: Disable IRQ pseudo-remapping if 15 bit APIC IDs are available x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index iommu/amd: Stop irq_remapping_select() matching when remapping is disabled Dexuan Cui (2): iommu/hyper-v: Remove I/O-APIC ID check from hyperv_irq_remapping_select() x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it Joerg Roedel (1): iommu/amd: Keep track of amd_iommu_irq_remap state Thomas Gleixner (8): x86/devicetree: Fix the ioapic interrupt type table PCI: vmd: Use msi_msg shadow structs x86/kvm: Use msi_msg shadow structs x86/pci/xen: Use msi_msg shadow structs x86/msi: Remove msidef.h x86/io_apic: Cleanup trigger/polarity helpers x86/ioapic: Cleanup IO/APIC route entry structs x86/ioapic: Correct the PCI/ISA trigger type selection https://gitee.com/src-openeuler/kernel/issues/IC9GHN Link:https://gitee.com/openeuler/kernel/pulls/16386 Reviewed-by: Kevin Zhu <zhukeqian1@huawei.com> Reviewed-by: Weilong Chen <chenweilong@huawei.com> Reviewed-by: Wei Li <liwei391@huawei.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Wenkuan Wang <wenkuan.wang@amd.com> Signed-off-by: Tengda Wu <wutengda2@huawei.com> | 10 个月前 | |
kcsan: Turn report_filterlist_lock into a raw_spinlock mainline inclusion from mainline-v6.13-rc1 commit 59458fa4ddb47e7891c61b4a928d13d5f5b00aa0 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAOU CVE: CVE-2024-56610 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=59458fa4ddb47e7891c61b4a928d13d5f5b00aa0 -------------------------------- Ran Xiaokai reports that with a KCSAN-enabled PREEMPT_RT kernel, we can see splats like: | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 | preempt_count: 10002, expected: 0 | RCU nest depth: 0, expected: 0 | no locks held by swapper/1/0. | irq event stamp: 156674 | hardirqs last enabled at (156673): [<ffffffff81130bd9>] do_idle+0x1f9/0x240 | hardirqs last disabled at (156674): [<ffffffff82254f84>] sysvec_apic_timer_interrupt+0x14/0xc0 | softirqs last enabled at (0): [<ffffffff81099f47>] copy_process+0xfc7/0x4b60 | softirqs last disabled at (0): [<0000000000000000>] 0x0 | Preemption disabled at: | [<ffffffff814a3e2a>] paint_ptr+0x2a/0x90 | CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0+ #3 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 | Call Trace: | <IRQ> | dump_stack_lvl+0x7e/0xc0 | dump_stack+0x1d/0x30 | __might_resched+0x1a2/0x270 | rt_spin_lock+0x68/0x170 | kcsan_skip_report_debugfs+0x43/0xe0 | print_report+0xb5/0x590 | kcsan_report_known_origin+0x1b1/0x1d0 | kcsan_setup_watchpoint+0x348/0x650 | __tsan_unaligned_write1+0x16d/0x1d0 | hrtimer_interrupt+0x3d6/0x430 | __sysvec_apic_timer_interrupt+0xe8/0x3a0 | sysvec_apic_timer_interrupt+0x97/0xc0 | </IRQ> On a detected data race, KCSAN's reporting logic checks if it should filter the report. That list is protected by the report_filterlist_lock *non-raw* spinlock which may sleep on RT kernels. Since KCSAN may report data races in any context, convert it to a raw_spinlock. This requires being careful about when to allocate memory for the filter list itself which can be done via KCSAN's debugfs interface. Concurrent modification of the filter list via debugfs should be rare: the chosen strategy is to optimistically pre-allocate memory before the critical section and discard if unused. Link: https://lore.kernel.org/all/20240925143154.2322926-1-ranxiaokai627@163.com/ Reported-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Tested-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Kaixiong Yu <yukaixiong@huawei.com> | 1 年前 | |
livepatch: Fix find wrong ftrace entry hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9UQ7I -------------------------------- There is case that 'kernel_kexec' has multiple fentrys, and when CONFIG_LIVEPATCH_ISOLATE_KPROBE enable, livepatch want to find the fentry at function start, however it actually find a wrong fentry due to binary search in ftrace_location_range() through the whole function range. # grep kernel_kexec /sys/kernel/tracing/available_filter_functions kernel_kexec <-- kernel_kexec+0x4/0xf8 kernel_kexec <-- kernel_kexec+0xdc/0xf8 kernel_kexec <-- kernel_kexec+0xdc/0xf8 To solve the issue, shrink the search range in first several instructions. Fixes: fe25ad14ec35 ("livepatch: Avoid patching conflicts with kprobes") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> | 2 年前 | |
rtmutex: Use waiter::task instead of current in remove_waiter() mainline inclusion from mainline-v7.1-rc1 commit 3bfdc63936dd4773109b7b8c280c0f3b5ae7d349 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15071 CVE: CVE-2026-43499 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3bfdc63936dd4773109b7b8c280c0f3b5ae7d349 -------------------------------- remove_waiter() is used by the slowlock paths, but it is also used for proxy-lock rollback in rt_mutex_start_proxy_lock() when invoked from futex_requeue(). In the latter case waiter::task is not current, but remove_waiter() operates on current for the dequeue operation. That results in several problems: 1) the rbtree dequeue happens without waiter::task::pi_lock being held 2) the waiter task's pi_blocked_on state is not cleared, which leaves a dangling pointer primed for UAF around. 3) rt_mutex_adjust_prio_chain() operates on the wrong top priority waiter task Use waiter::task instead of current in all related operations in remove_waiter() to cure those problems. [ tglx: Fixup rt_mutex_adjust_prio_chain(), add a comment and amend the changelog ] Fixes: 8161239a8bcc ("rtmutex: Simplify PI algorithm and make highest prio task get lock") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Conflicts: kernel/locking/rtmutex.c [ Context conflict ] Signed-off-by: Yao Kai <yaokai34@huawei.com> | 29 天前 | |
PM: EM: Change the order of arguments in the .active_power() callback mainline inclusion from mainline-v5.19-rc1 commit 75a3a99a5a9886af13be44e640cb415ebda80db2 category: cleanup bugzilla: https://atomgit.com/openeuler/kernel/issues/8319 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=75a3a99a5a9886af13be44e640cb415ebda80db2 ---------------------------------------------------------------------- The .active_power() callback passes the device pointer when it's called. Aligned with a convetion present in other subsystems and pass the 'dev' as a first argument. It looks more cleaner. Adjust all affected drivers which implement that API callback. Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: huwentao <huwentao19@h-partners.com> | 4 个月前 | |
sw64: improve sw64_rrk Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IC94VY -------------------------------- Implement an improved sw64_rrk which stores the timestamp and loglevel. Signed-off-by: Mao Minkai <maominkai@wxiat.com> Reviewed-by: He Sheng <hesheng@wxiat.com> Signed-off-by: Gu Zitao <guzitao@wxiat.com> | 1 年前 | |
rcu: Fix racy re-initialization of irq_work causing hangs mainline inclusion from mainline-v6.17-rc2 commit 61399e0c5410567ef60cb1cda34cca42903842e3 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/8988 CVE: CVE-2025-39744 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61399e0c5410567ef60cb1cda34cca42903842e3 -------------------------------- RCU re-initializes the deferred QS irq work everytime before attempting to queue it. However there are situations where the irq work is attempted to be queued even though it is already queued. In that case re-initializing messes-up with the irq work queue that is about to be handled. The chances for that to happen are higher when the architecture doesn't support self-IPIs and irq work are then all lazy, such as with the following sequence: 1) rcu_read_unlock() is called when IRQs are disabled and there is a grace period involving blocked tasks on the node. The irq work is then initialized and queued. 2) The related tasks are unblocked and the CPU quiescent state is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE, allowing the irq work to be requeued in the future (note the previous one hasn't fired yet). 3) A new grace period starts and the node has blocked tasks. 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work is re-initialized (but it's queued! and its node is cleared) and requeued. Which means it's requeued to itself. 5) The irq work finally fires with the tick. But since it was requeued to itself, it loops and hangs. Fix this with initializing the irq work only once before the CPU boots. Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> Conflicts: kernel/rcu/tree.c [Context conflicts] Signed-off-by: Jiacheng Yu <yujiacheng3@huawei.com> | 4 个月前 | |
sched/cputime: Fix time overflow in cputime_adjust() hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8331 ------------------ When a process has hundreds of threads and runs for an extended period, the aggregated stime or utime of its thread group can become extremely large. Accessing /proc/xx/stat then triggers a divide error like the following: divide error: 0000 [#1] SMP NOPTI CPU: 273 PID: 4619 Comm: ... Tainted: 5.10.0 x86_64 RIP: 0010:cputime_adjust+0x55/0xb0 RSP: 0018:ffffae408e07bbc8 EFLAGS: 00010807 ... Call Trace: thread_group_cputime_adjusted+0x4b/0x70 do_task_stat+0x2d8/0xdc0 This is due to overflow in stime + utime, which causes mul_u64_u64_div_u64() to trigger a divide error. Add overflow detection logic and, upon overflow, right-shift the values before computation. This fixes the issue while preserving the original calculation semantics. Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()") Signed-off-by: Xia Fukun <xiafukun@huawei.com> | 24 天前 | |
time/jiffies: Mark jiffies_64_to_clock_t() notrace mainline inclusion from mainline-v7.0-rc4 commit 755a648e78f12574482d4698d877375793867fa1 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8837 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=755a648e78f12574482d4698d877375793867fa1 -------------------------------- The trace_clock_jiffies() function that handles the "uptime" clock for tracing calls jiffies_64_to_clock_t(). This causes the function tracer to constantly recurse when the tracing clock is set to "uptime". Mark it notrace to prevent unnecessary recursion when using the "uptime" clock. Fixes: 58d4e21e50ff3 ("tracing: Fix wraparound problems in "uptime" trace clock") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260306212403.72270bb2@robin Conflicts: kernel/time/time.c [Context conflict] Signed-off-by: Liu Kai <liukai284@huawei.com> | 2 个月前 | |
| 3 天前 | ||
kbuild: update config_data.gz only when the content of .config is changed stable inclusion from stable-5.10.36 commit a27aad32175179083e6a05b7c58d047639a1b1c0 bugzilla: 51867 CVE: NA -------------------------------- commit 46b41d5dd8019b264717978c39c43313a524d033 upstream. If the timestamp of the .config file is updated, config_data.gz is regenerated, then vmlinux is re-linked. This occurs even if the content of the .config has not changed at all. This issue was mitigated by commit 67424f61f813 ("kconfig: do not write .config if the content is the same"); Kconfig does not update the .config when it ends up with the identical configuration. The issue is remaining when the .config is created by *_defconfig with some config fragment(s) applied on top. This is typical for powerpc and mips, where several *_defconfig targets are constructed by using merge_config.sh. One workaround is to have the copy of the .config. The filechk rule updates the copy, kernel/config_data, by checking the content instead of the timestamp. With this commit, the second run with the same configuration avoids the needless rebuilds. $ make ARCH=mips defconfig all [ snip ] $ make ARCH=mips defconfig all *** Default configuration is based on target '32r2el_defconfig' Using ./arch/mips/configs/generic_defconfig as base Merging arch/mips/configs/generic/32r2.config Merging arch/mips/configs/generic/el.config Merging ./arch/mips/configs/generic/board-boston.config Merging ./arch/mips/configs/generic/board-ni169445.config Merging ./arch/mips/configs/generic/board-ocelot.config Merging ./arch/mips/configs/generic/board-ranchu.config Merging ./arch/mips/configs/generic/board-sead-3.config Merging ./arch/mips/configs/generic/board-xilfpga.config # # configuration written to .config # SYNC include/config/auto.conf CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh CHK include/generated/compile.h CHK include/generated/autoksyms.h Reported-by: Elliot Berman <eberman@codeaurora.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 6 年前 | |
treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 6 年前 | |
sched/rt, locking: Use CONFIG_PREEMPTION CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the Kconfig dependency to use CONFIG_PREEMPTION. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20191015191821.11479-32-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> | 6 年前 | |
sched/core: Change depends of SCHED_CORE hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA -------------------------------------------------------------------------- Due to the conflict between Core Scheduling and SMT expeller, so SCHED_CORE and QOS_SCHED_SMT_EXPELLER are mutually exclusive. Signed-off-by: Lin Shengwang <linshengwang1@huawei.com> Reviewed-by: lihua <hucool.lihua@huawei.com> Reviewed-by: Chen Hui <judy.chenhui@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
static_call: Don't make __static_call_return0 static mainline inclusion from mainline-v5.18-rc2 commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9JNPZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b ------------------------------------------------------ System.map shows that vmlinux contains several instances of __static_call_return0(): c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0 arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not: c0011520 <arch_static_call_transform>: c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it. In order to work properly, global single instance of __static_call_return0() is required. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu Conflicts: kernel/Makefile kernel/static_call.c kernel/static_call_inline.c Signed-off-by: liwei <liwei728@huawei.com> | 2 年前 | |
acct: fix potential integer overflow in encode_comp_t() stable inclusion from stable-v5.10.163 commit 6edd0cdee5780fd5f43356b72b29a2a6d48ef6da category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6edd0cdee5780fd5f43356b72b29a2a6d48ef6da ---------------------------------------------------- [ Upstream commit c5f31c655bcc01b6da53b836ac951c1556245305 ] The integer overflow is descripted with following codes: > 317 static comp_t encode_comp_t(u64 value) > 318 { > 319 int exp, rnd; ...... > 341 exp <<= MANTSIZE; > 342 exp += value; > 343 return exp; > 344 } Currently comp_t is defined as type of '__u16', but the variable 'exp' is type of 'int', so overflow would happen when variable 'exp' in line 343 is greater than 65535. Link: https://lkml.kernel.org/r/20210515140631.369106-3-zhengyejian1@huawei.com Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Jinhao <zhangjinhao2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: zhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com> | 2 年前 | |
async: Introduce async_schedule_dev_nocall() stable inclusion from stable-v5.10.210 commit ac4dcccbe9106a5cec483d2ffa7f628b95340a07 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I97NIA CVE: CVE-2023-52498 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ac4dcccbe9106a5cec483d2ffa7f628b95340a07 -------------------------------- commit 7d4b5d7a37bdd63a5a3371b988744b060d5bb86f upstream. In preparation for subsequent changes, introduce a specialized variant of async_schedule_dev() that will not invoke the argument function synchronously when it cannot be scheduled for asynchronous execution. The new function, async_schedule_dev_nocall(), will be used for fixing possible deadlocks in the system-wide power management core code. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> for the series. Tested-by: Youngmin Nam <youngmin.nam@samsung.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> | 2 年前 | |
audit: Send netlink ACK before setting connection in auditd_set stable inclusion from stable-v5.10.210 commit c74b2af2ccbc0f19219b825f79160dae85960eed category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAE52H Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c74b2af2ccbc0f19219b825f79160dae85960eed -------------------------------- [ Upstream commit 022732e3d846e197539712e51ecada90ded0572a ] When auditd_set sets the auditd_conn pointer, audit messages can immediately be put on the socket by other kernel threads. If the backlog is large or the rate is high, this can immediately fill the socket buffer. If the audit daemon requested an ACK for this operation, a full socket buffer causes the ACK to get dropped, also setting ENOBUFS on the socket. To avoid this race and ensure ACKs get through, fast-track the ACK in this specific case to ensure it is sent before auditd_conn is set. Signed-off-by: Chris Riches <chris.riches@nutanix.com> [PM: fix some tab vs space damage] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng1 <sanglipeng1@jd.com> | 1 年前 | |
fix kabi breakage due to exec is_check hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAZ996 CVE: NA ---------------------------------------------------------------------- Since struct audit_context and linux_binprm contains hole at the end, we can directly add one list without breaking KABI. Signed-off-by: Gu Bowen <gubowen5@huawei.com> | 1 年前 | |
audit: fix potential double free on error path from fsnotify_add_inode_mark stable inclusion from stable-v5.10.140 commit e10bb2f2e99b01ab7f9ec965735dcb4592b5490a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e10bb2f2e99b01ab7f9ec965735dcb4592b5490a -------------------------------- commit ad982c3be4e60c7d39c03f782733503cbd88fd2a upstream. Audit_alloc_mark() assign pathname to audit_mark->path, on error path from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory of audit_mark->path, but the caller of audit_alloc_mark will free the pathname again, so there will be double free problem. Fix this by resetting audit_mark->path to NULL pointer on error path from fsnotify_add_inode_mark(). Cc: stable@vger.kernel.org Fixes: 7b1293234084d ("fsnotify: Add group pointer in fsnotify_init_mark()") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: Wei Li <liwei391@huawei.com> | 3 年前 | |
audit: move put_tree() to avoid trim_trees refcount underflow and UAF mainline inclusion from mainline-5.10.62 commit 38c1915d3e9f457006c4b2ef5eaab038c34a95a2 bugzilla: 182217 https://gitee.com/openeuler/kernel/issues/I4EFOS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38c1915d3e9f457006c4b2ef5eaab038c34a95a2 -------------------------------- commit 67d69e9d1a6c889d98951c1d74b19332ce0565af upstream. AUDIT_TRIM is expected to be idempotent, but multiple executions resulted in a refcount underflow and use-after-free. git bisect fingered commit fb041bb7c0a9 ("locking/refcount: Consolidate implementations of refcount_t") but this patch with its more thorough checking that wasn't in the x86 assembly code merely exposed a previously existing tree refcount imbalance in the case of tree trimming code that was refactored with prune_one() to remove a tree introduced in commit 8432c7006297 ("audit: Simplify locking around untag_chunk()") Move the put_tree() to cover only the prune_one() case. Passes audit-testsuite and 3 passes of "auditctl -t" with at least one directory watch. Cc: Jan Kara <jack@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Seiji Nishikawa <snishika@redhat.com> Cc: stable@vger.kernel.org Fixes: 8432c7006297 ("audit: Simplify locking around untag_chunk()") Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> [PM: reformatted/cleaned-up the commit description] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() stable inclusion from stable-v5.10.202 commit fc557bcfd7ff8080d1376f353476e7df8f303a3f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9DZOS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fc557bcfd7ff8080d1376f353476e7df8f303a3f -------------------------------- commit 969d90ec212bae4b45bf9d21d7daa30aa6cf055e upstream. eBPF can end up calling into the audit code from some odd places, and some of these places don't have @current set properly so we end up tripping the WARN_ON_ONCE(!current->mm) near the top of audit_exe_compare(). While the basic !current->mm check is good, the WARN_ON_ONCE() results in some scary console messages so let's drop that and just do the regular !current->mm check to avoid problems. Cc: <stable@vger.kernel.org> Fixes: 47846d51348d ("audit: don't take task_lock() in audit_exe_compare() code path") Reported-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
ima: Avoid blocking in RCU read-side critical section stable inclusion from stable-v5.10.222 commit a6176a802c4bfb83bf7524591aa75f44a639a853 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9HXKB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a6176a802c4bfb83bf7524591aa75f44a639a853 -------------------------------- commit 9a95c5bfbf02a0a7f5983280fe284a0ff0836c34 upstream. A panic happens in ima_match_policy: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 PGD 42f873067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 5 PID: 1286325 Comm: kubeletmonit.sh Kdump: loaded Tainted: P Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:ima_match_policy+0x84/0x450 Code: 49 89 fc 41 89 cf 31 ed 89 44 24 14 eb 1c 44 39 7b 18 74 26 41 83 ff 05 74 20 48 8b 1b 48 3b 1d f2 b9 f4 00 0f 84 9c 01 00 00 <44> 85 73 10 74 ea 44 8b 6b 14 41 f6 c5 01 75 d4 41 f6 c5 02 74 0f RSP: 0018:ff71570009e07a80 EFLAGS: 00010207 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000200 RDX: ffffffffad8dc7c0 RSI: 0000000024924925 RDI: ff3e27850dea2000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffabfce739 R10: ff3e27810cc42400 R11: 0000000000000000 R12: ff3e2781825ef970 R13: 00000000ff3e2785 R14: 000000000000000c R15: 0000000000000001 FS: 00007f5195b51740(0000) GS:ff3e278b12d40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000626d24002 CR4: 0000000000361ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ima_get_action+0x22/0x30 process_measurement+0xb0/0x830 ? page_add_file_rmap+0x15/0x170 ? alloc_set_pte+0x269/0x4c0 ? prep_new_page+0x81/0x140 ? simple_xattr_get+0x75/0xa0 ? selinux_file_open+0x9d/0xf0 ima_file_check+0x64/0x90 path_openat+0x571/0x1720 do_filp_open+0x9b/0x110 ? page_counter_try_charge+0x57/0xc0 ? files_cgroup_alloc_fd+0x38/0x60 ? __alloc_fd+0xd4/0x250 ? do_sys_open+0x1bd/0x250 do_sys_open+0x1bd/0x250 do_syscall_64+0x5d/0x1d0 entry_SYSCALL_64_after_hwframe+0x65/0xca Commit c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") introduced call to ima_lsm_copy_rule within a RCU read-side critical section which contains kmalloc with GFP_KERNEL. This implies a possible sleep and violates limitations of RCU read-side critical sections on non-PREEMPT systems. Sleeping within RCU read-side critical section might cause synchronize_rcu() returning early and break RCU protection, allowing a UAF to happen. The root cause of this issue could be described as follows: | Thread A | Thread B | | |ima_match_policy | | | rcu_read_lock | |ima_lsm_update_rule | | | synchronize_rcu | | | | kmalloc(GFP_KERNEL)| | | sleep | ==> synchronize_rcu returns early | kfree(entry) | | | | entry = entry->next| ==> UAF happens and entry now becomes NULL (or could be anything). | | entry->action | ==> Accessing entry might cause panic. To fix this issue, we are converting all kmalloc that is called within RCU read-side critical section to use GFP_ATOMIC. Fixes: c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") Cc: stable@vger.kernel.org Signed-off-by: GUO Zihua <guozihua@huawei.com> Acked-by: John Johansen <john.johansen@canonical.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> [PM: fixed missing comment, long lines, !CONFIG_IMA_LSM_RULES case] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Gu Bowen <gubowen5@huawei.com> | 1 年前 | |
exec: Add a new AT_CHECK flag to execveat(2) maillist inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/IAZ996 CVE: NA Reference: https://lore.kernel.org/all/20241011184422.977903-2-mic@digikod.net/ -------------------------------- Add a new AT_CHECK flag to execveat(2) to check if a file would be allowed for execution. The main use case is for script interpreters and dynamic linkers to check execution permission according to the kernel's security policy. Another use case is to add context to access logs e.g., which script (instead of interpreter) accessed a file. As any executable code, scripts could also use this check [1]. This is different from faccessat(2) + X_OK which only checks a subset of access rights (i.e. inode permission and mount options for regular files), but not the full context (e.g. all LSM access checks). The main use case for access(2) is for SUID processes to (partially) check access on behalf of their caller. The main use case for execveat(2) + AT_CHECK is to check if a script execution would be allowed, according to all the different restrictions in place. Because the use of AT_CHECK follows the exact kernel semantic as for a real execution, user space gets the same error codes. An interesting point of using execveat(2) instead of openat2(2) is that it decouples the check from the enforcement. Indeed, the security check can be logged (e.g. with audit) without blocking an execution environment not yet ready to enforce a strict security policy. LSMs can control or log execution requests with security_bprm_creds_for_exec(). However, to enforce a consistent and complete access control (e.g. on binary's dependencies) LSMs should restrict file executability, or mesure executed files, with security_file_open() by checking file->f_flags & __FMODE_EXEC. Because AT_CHECK is dedicated to user space interpreters, it doesn't make sense for the kernel to parse the checked files, look for interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC if the format is unknown. Because of that, security_bprm_check() is never called when AT_CHECK is used. It should be noted that script interpreters cannot directly use execveat(2) (without this new AT_CHECK flag) because this could lead to unexpected behaviors e.g., python script.sh could lead to Bash being executed to interpret the script. Unlike the kernel, script interpreters may just interpret the shebang as a simple comment, which should not change for backward compatibility reasons. Because scripts or libraries files might not currently have the executable permission set, or because we might want specific users to be allowed to run arbitrary scripts, the following patch provides a dynamic configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE securebits. This is a redesign of the CLIP OS 4's O_MAYEXEC: https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch This patch has been used for more than a decade with customized script interpreters. Some examples can be found here: https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge Hallyn <serge@hallyn.com> Link: https://docs.python.org/3/library/io.html#io.open_code [1] Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20241011184422.977903-2-mic@digikod.net Conflicts: include/linux/binfmts.h include/uapi/linux/fcntl.h kernel/audit.h security/security.c [Context conflicts] Signed-off-by: Gu Bowen <gubowen5@huawei.com> | 1 年前 | |
treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> | 5 年前 | |
kbuild: fix kernel/bounds.c 'W=1' warning Building any configuration with 'make W=1' produces a warning: kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes] When also passing -Werror, this prevents us from building any other files. Nobody ever calls the function, but we can't make it 'static' either since we want the compiler output. Calling it 'main' instead however avoids the warning, because gcc does not insist on having a declaration for main. Link: http://lkml.kernel.org/r/20181005083313.2088252-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 7 年前 | |
xfs: don't generate selinux audit messages for capability testing mainline inclusion from mainline-v5.16-rc3 commit eba0549bc7d100691c13384b774346b8aa9cf9a9 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eba0549bc7d100691c13384b774346b8aa9cf9a9 -------------------------------- There are a few places where we test the current process' capability set to decide if we're going to be more or less generous with resource acquisition for a system call. If the process doesn't have the capability, we can continue the call, albeit in a degraded mode. These are /not/ the actual security decisions, so it's not proper to use capable(), which (in certain selinux setups) causes audit messages to get logged. Switch them to has_capability_noaudit. Fixes: 7317a03df703f ("xfs: refactor inode ownership change transaction/inode/quota allocation idiom") Fixes: ea9a46e1c4925 ("xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Conflicts: fs/xfs/xfs_fsmap.c Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
sched_getaffinity: don't assume 'cpumask_size()' is fully initialized stable inclusion from stable-v5.10.177 commit 1f2a94baee431c73b7e2a3230b1adafef6128000 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I88YNP Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1f2a94baee431c73b7e2a3230b1adafef6128000 -------------------------------- [ Upstream commit 6015b1aca1a233379625385feb01dd014aca60b5 ] The getaffinity() system call uses 'cpumask_size()' to decide how big the CPU mask is - so far so good. It is indeed the allocation size of a cpumask. But the code also assumes that the whole allocation is initialized without actually doing so itself. That's wrong, because we might have fixed-size allocations (making copying and clearing more efficient), but not all of it is then necessarily used if 'nr_cpu_ids' is smaller. Having checked other users of 'cpumask_size()', they all seem to be ok, either using it purely for the allocation size, or explicitly zeroing the cpumask before using the size in bytes to copy it. See for example the ublk_ctrl_get_queue_affinity() function that uses the proper 'zalloc_cpumask_var()' to make sure that the whole mask is cleared, whether the storage is on the stack or if it was an external allocation. Fix this by just zeroing the allocation before using it. Do the same for the compat version of sched_getaffinity(), which had the same logic. Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to access the bits. For a cpumask_var_t, it ends up being a pointer to the same data either way, but it's just a good idea to treat it like you would a 'cpumask_t'. The compat case already did that. Reported-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/ Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
proc: convert everything to "struct proc_ops" The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in seq_file.h. Conversion rule is: llseek => proc_lseek unlocked_ioctl => proc_ioctl xxx => proc_xxx delete ".owner = THIS_MODULE" line [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c] [sfr@canb.auug.org.au: fix kernel/sched/psi.c] Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 6 年前 | |
context_tracking: Ensure that the critical path cannot be instrumented context tracking lacks a few protection mechanisms against instrumentation: - While the core functions are marked NOKPROBE they lack protection against function tracing which is required as the function entry/exit points can be utilized by BPF. - static functions invoked from the protected functions need to be marked as well as they can be instrumented otherwise. - using plain inline allows the compiler to emit traceable and probable functions. Fix this by marking the functions noinstr and converting the plain inlines to __always_inline. The NOKPROBE_SYMBOL() annotations are removed as the .noinstr.text section is already excluded from being probed. Cures the following objtool warnings: vmlinux.o: warning: objtool: enter_from_user_mode()+0x34: call to __context_tracking_exit() leaves .noinstr.text section vmlinux.o: warning: objtool: prepare_exit_to_usermode()+0x29: call to __context_tracking_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_return_slowpath()+0x29: call to __context_tracking_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: do_syscall_64()+0x7f: call to __context_tracking_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: do_int80_syscall_32()+0x3d: call to __context_tracking_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: do_fast_syscall_32()+0x9c: call to __context_tracking_enter() leaves .noinstr.text section and generates new ones... Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134340.811520478@linutronix.de | 5 年前 | |
cpu/SMT: Enable SMT only if a core is online mainline inclusion from mainline-v6.11-rc4 commit 6c17ea1f3eaa330d445ac14a9428402ce4e3055e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8317 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c17ea1f3eaa330d445ac14a9428402ce4e3055e ---------------------------------------------------------------------- If a core is offline then enabling SMT should not online CPUs of this core. By enabling SMT, what is intended is either changing the SMT value from "off" to "on" or setting the SMT level (threads per core) from a lower to higher value. On PowerPC the ppc64_cpu utility can be used, among other things, to perform the following functions: ppc64_cpu --cores-on # Get the number of online cores ppc64_cpu --cores-on=X # Put exactly X cores online ppc64_cpu --offline-cores=X[,Y,...] # Put specified cores offline ppc64_cpu --smt={on|off|value} # Enable, disable or change SMT level If the user has decided to offline certain cores, enabling SMT should not online CPUs in those cores. This patch fixes the issue and changes the behaviour as described, by introducing an arch specific function topology_is_core_online(). It is currently implemented only for PowerPC. Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support") Reported-by: Tyrel Datwyler <tyreld@linux.ibm.com> Closes: https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731030126.956210-2-nysal@linux.ibm.com Signed-off-by: huwentao <huwentao19@h-partners.com> | 5 个月前 | |
PM: cpu: Make notifier chain use a raw_spinlock_t stable inclusion from stable-5.10.65 commit 4b7874a32ec23cc4892e7c9ffac1dd8160ea3697 bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4b7874a32ec23cc4892e7c9ffac1dd8160ea3697 -------------------------------- [ Upstream commit b2f6662ac08d0e7c25574ce53623c71bdae9dd78 ] Invoking atomic_notifier_chain_notify() requires acquiring a spinlock_t, which can block under CONFIG_PREEMPT_RT. Notifications for members of the cpu_pm notification chain will be issued by the idle task, which can never block. Making *all* atomic_notifiers use a raw_spinlock is too big of a hammer, as only notifications issued by the idle task are problematic. Special-case cpu_pm_notifier_chain by kludging a raw_notifier and raw_spinlock_t together, matching the atomic_notifier behavior with a raw_spinlock_t. Fixes: 70d932985757 ("notifier: Fix broken error handling pattern") Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
arm64: kdump: fix kmemleak unknown object warning when crash base is set hulk inclusion category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/13312 -------------------------------- When crash_base is set via kernel command line, reserve_crashkernel_high() does not allocate new memory and thus no kmemleak object is created. However, kmemleak_ignore_phys() was called unconditionally, leading to unnecessary kmemleak warnings. Move kmemleak_ignore_phys() inside the allocation branch where it's actually needed, ensuring it's only called when new memory is allocated and tracked by kmemleak. Fixes: 951001072323 ("arm64: kdump: Skip kmemleak scan reserved memory for kdump") Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Qi Xi <xiqi2@huawei.com> | 5 个月前 | |
crash_dump: Remove no longer used saved_max_pfn saved_max_pfn was originally introduced in commit 92aa63a5a1bf ("[PATCH] kdump: Retrieve saved max pfn") It used to make sure that the user does not try to read the physical memory beyond saved_max_pfn. But since commit 921d58c0e699 ("vmcore: remove saved_max_pfn check") it's no longer used for the check. This variable doesn't have any users anymore so just remove it. [ bp: Drop the Calgary IOMMU reference from the commit message. ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/20200330181544.1595733-1-kasong@redhat.com | 6 年前 | |
Revert "Add a reference to ucounts for each cred" stable inclusion from stable-5.10.63 commit ae16b7c668378ea00eb60ab9d29e0d46b0e7aa15 bugzilla: 182231 https://gitee.com/openeuler/kernel/issues/I4EFS1 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ae16b7c668378ea00eb60ab9d29e0d46b0e7aa15 -------------------------------- This reverts commit b2c4d9a33cc2dec7466f97eba2c4dd571ad798a5 which is commit 905ae01c4ae2ae3df05bb141801b1db4b7d83c61 upstream. This commit should not have been applied to the 5.10.y stable tree, so revert it. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/r/87v93k4bl6.fsf@disp2133 Cc: Alexey Gladkov <legion@kernel.org> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it would be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 6 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154043.007767574@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 6 年前 | |
proc: introduce proc_create_single{,_data} Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de> | 7 年前 | |
proc: introduce proc_create_single{,_data} Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de> | 7 年前 | |
ptrace: slightly saner 'get_dumpable()' logic stable inclusion from stable-v5.10.256 commit 93d4ba49d18e3d7fb41a9927c2d0cca5e9dfefd6 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15055 CVE: CVE-2026-46333 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=93d4ba49d18e -------------------------------- commit 31e62c2ebbfdc3fe3dbdf5e02c92a9dc67087a3a upstream. The 'dumpability' of a task is fundamentally about the memory image of the task - the concept comes from whether it can core dump or not - and makes no sense when you don't have an associated mm. And almost all users do in fact use it only for the case where the task has a mm pointer. But we have one odd special case: ptrace_may_access() uses 'dumpable' to check various other things entirely independently of the MM (typically explicitly using flags like PTRACE_MODE_READ_FSCREDS). Including for threads that no longer have a VM (and maybe never did, like most kernel threads). It's not what this flag was designed for, but it is what it is. The ptrace code does check that the uid/gid matches, so you do have to be uid-0 to see kernel thread details, but this means that the traditional "drop capabilities" model doesn't make any difference for this all. Make it all make a *bit* more sense by saying that if you don't have a MM pointer, we'll use a cached "last dumpability" flag if the thread ever had a MM (it will be zero for kernel threads since it is never set), and require a proper CAP_SYS_PTRACE capability to override. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> | 1 个月前 | |
arm64: extable: add new extable type "__mc_ex_table" hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5GB28 CVE: NA ------------------------------- A new type of extable is added, which is specially used fixup for machine check safe. In order to keep kabi consistency, we cannot add type and data members to struct exception_table_entry(d6e2cc564775 arm64: extable: add type and data fields), so we put the fixup entry to new extable separately. Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> | 3 年前 | |
kernel/fail_function: fix memory leak with using debugfs_lookup() stable inclusion from stable-v5.10.173 commit dd9981a11d74ff2eb253bb5c459876f8bd3c6c36 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7X0QU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd9981a11d74ff2eb253bb5c459876f8bd3c6c36 -------------------------------- [ Upstream commit 2bb3669f576559db273efe49e0e69f82450efbca ] When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20230202151633.2310897-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
arm64: Reserve a kabi in task_struct exclusively for xcall hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IC9Q31 -------------------------------- Refactor the xcall_enable logic to ensure only one reserved kabi in task_struct is used by xcall, which facilitates future xcall expansion. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> | 1 年前 | |
Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" stable inclusion from stable-5.10.28 commit 12b5f9dae41027f895b144ea01741d687bb31051 bugzilla: 51779 -------------------------------- commit d3dc04cd81e0eaf50b2d09ab051a13300e587439 upstream. This reverts commit 15b2219facadec583c24523eed40fa45865f859f. Before IO threads accepted signals, the freezer using take signals to wake up an IO thread would cause them to loop without any way to clear the pending signal. That is no longer the case, so stop special casing PF_IO_WORKER in the freezer. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 5 年前 | |
kbuild: add variables for compression tools Allow user to use alternative implementations of compression tools, such as pigz, pbzip2, pxz. For example, multi-threaded tools to speed up the build: $ make GZIP=pigz BZIP2=pbzip2 Variables _GZIP, _BZIP2, _LZOP are used internally because original env vars are reserved by the tools. The use of GZIP in gzip tool is obsolete since 2015. However, alternative implementations (e.g., pigz) still rely on it. BZIP2, BZIP, LZOP vars are not obsolescent. The credit goes to @grsecurity. As a sidenote, for multi-threaded lzma, xz compression one can use: $ export XZ_OPT="--threads=0" Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> | 5 年前 | |
LSM: Signal to SafeSetID when setting group IDs For SafeSetID to properly gate set*gid() calls, it needs to know whether ns_capable() is being called from within a sys_set*gid() function or is being called from elsewhere in the kernel. This allows SafeSetID to deny CAP_SETGID to restricted groups when they are attempting to use the capability for code paths other than updating GIDs (e.g. setting up userns GID mappings). This is the identical approach to what is currently done for CAP_SETUID. NOTE: We also add signaling to SafeSetID from the setgroups() syscall, as we have future plans to restrict a process' ability to set supplementary groups in addition to what is added in this series for restricting setting of the primary group. Signed-off-by: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Micah Morton <mortonm@chromium.org> | 5 年前 | |
kernel/hung_task.c: make type annotations consistent Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") removed various __user annotations from function signatures as part of its refactoring. It also removed the __user annotation for proc_dohung_task_timeout_secs() at its declaration in sched/sysctl.h, but not at its definition in kernel/hung_task.c. Hence, sparse complains: kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces)) Adjust the annotation at the definition fitting to that refactoring to make sparse happy again, which also resolves this warning from sparse: kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces) kernel/hung_task.c:277:52: expected void * kernel/hung_task.c:277:52: got void [noderef] __user *buffer No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ignatov <rdna@fb.com> Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
mm/nvdimm: add is_ioremap_addr and use that to check ioremap address Architectures like powerpc use different address range to map ioremap and vmalloc range. The memunmap() check used by the nvdimm layer was wrongly using is_vmalloc_addr() to check for ioremap range which fails for ppc64. This result in ppc64 not freeing the ioremap mapping. The side effect of this is an unbind failure during module unload with papr_scm nvdimm driver Link: http://lkml.kernel.org/r/20190701134038.14165-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 6 年前 | |
irq_work: Cleanup mainline inclusion from mainline-v5.11-rc1 commit 7a9f50a05843fee8366bd3a65addbebaa7cf7f07 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4U05V CVE: NA ------------------------------------------------------------------------- Get rid of the __call_single_node union and clean up the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Cheng Jian <cj.chengjian@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
jump_label: Fix usage in module __init mainline inclusion from mainline-v5.11-rc1 commit 55d2eba8e7cd439c11cdb204898c2d384227629b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CQ3 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=55d2eba8e7cd439c11cdb204898c2d384227629b ------------------------------------------------- When the static_key is part of the module, and the module calls static_key_inc/enable() from it's __init section *AND* has a static_branch_*() user in that very same __init section, things go wobbly. If the static_key lives outside the module, jump_label_add_module() would append this module's sites to the key and jump_label_update() would take the static_key_linked() branch and all would be fine. If all the sites are outside of __init, then everything will be fine too. However, when all is aligned just as described above, jump_label_update() calls __jump_label_update(.init = false) and we'll not update sites in __init text. Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Jessica Yu <jeyu@kernel.org> Link: https://lkml.kernel.org/r/20201216135435.GV3092@hirez.programming.kicks-ass.net Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
livepatch: Avoid CPU hogging with cond_resched mainline inclusion from mainline-v5.17-rc1 commit f5bdb34bf0c9314548f2d8e2360b703ff3610303 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60MYE CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5bdb34bf0c9314548f2d8e2360b703ff3610303 -------------------------------- When initializing a 'struct klp_object' in klp_init_object_loaded(), and performing relocations in klp_resolve_symbols(), klp_find_object_symbol() is invoked to look up the address of a symbol in an already-loaded module (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or module_kallsyms_on_each_symbol() to find the address of the symbol that is being patched. It turns out that symbol lookups often take up the most CPU time when enabling and disabling a patch, and may hog the CPU and cause other tasks on that CPU's runqueue to starve -- even in paths where interrupts are enabled. For example, under certain workloads, enabling a KLP patch with many objects or functions may cause ksoftirqd to be starved, and thus for interrupts to be backlogged and delayed. This may end up causing TCP retransmits on the host where the KLP patch is being applied, and in general, may cause any interrupts serviced by softirqd to be delayed while the patch is being applied. So as to ensure that kallsyms_on_each_symbol() does not end up hogging the CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol() and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol lookup in vmlinux and a module respectively. Without this patch, if a live-patch is applied on a 36-core Intel host with heavy TCP traffic, a ~10x spike is observed in TCP retransmits while the patch is being applied. Additionally, collecting sched events with perf indicates that ksoftirqd is awakened ~1.3 seconds before it's eventually scheduled. With the patch, no increase in TCP retransmit events is observed, and ksoftirqd is scheduled shortly after it's awakened. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211229215646.830451-1-void@manifault.com Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Kuohai Xu <xukuohai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
exec: Transform exec_update_mutex into a rw_semaphore stable inclusion from stable-5.10.6 commit ab7709b551de24e7bebf44946120e6740b1e28db bugzilla: 47418 -------------------------------- [ Upstream commit f7cfd871ae0c5008d94b6f66834e7845caa93c15 ] Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> | 5 年前 | |
kcov: make some symbols static Fix sparse build warnings: kernel/kcov.c:99:1: warning: symbol '__pcpu_scope_kcov_percpu_data' was not declared. Should it be static? kernel/kcov.c:778:6: warning: symbol 'kcov_remote_softirq_start' was not declared. Should it be static? kernel/kcov.c:795:6: warning: symbol 'kcov_remote_softirq_stop' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Link: http://lkml.kernel.org/r/20200702115501.73077-1-weiyongjun1@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
panic, kexec: make __crash_kexec() NMI safe stable inclusion from stable-v5.10.178 commit 56314b90fd43bd2444942bc14a7c5c768ce8ec57 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8ALH3 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=56314b90fd43bd2444942bc14a7c5c768ce8ec57 -------------------------------- commit 05c6257433b7212f07a7e53479a8ab038fc1666a upstream. Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI panic() doesn't work. The cause of that lies in the PREEMPT_RT definition of mutex_trylock(): if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task())) return 0; This prevents an nmi_panic() from executing the main body of __crash_kexec() which does the actual kexec into the kdump kernel. The warning and return are explained by: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") [...] The reasons for this are: 1) There is a potential deadlock in the slowpath 2) Another cpu which blocks on the rtmutex will boost the task which allegedly locked the rtmutex, but that cannot work because the hard/softirq context borrows the task context. Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex and replace it with an atomic variable. This is somewhat overzealous as *some* callsites could keep using a mutex (e.g. the sysfs-facing ones like crash_shrink_memory()), but this has the benefit of involving a single unified lock and preventing any future NMI-related surprises. Tested by triggering NMI panics via: $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi $ echo 1 > /proc/sys/kernel/unknown_nmi_panic $ echo 1 > /proc/sys/kernel/panic $ ipmitool power diag Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") Signed-off-by: Valentin Schneider <vschneid@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
kexec: fix a memory leak in crash_shrink_memory() stable inclusion from stable-v5.10.188 commit 6cb477e7226bbceb9ca3f7fc306cdadf3eb92e7a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8KYFP Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6cb477e7226bbceb9ca3f7fc306cdadf3eb92e7a -------------------------------- [ Upstream commit 1cba6c4309f03de570202c46f03df3f73a0d4c82 ] Patch series "kexec: enable kexec_crash_size to support two crash kernel regions". When crashkernel=X fails to reserve region under 4G, it will fall back to reserve region above 4G and a region of the default size will also be reserved under 4G. Unfortunately, /sys/kernel/kexec_crash_size only supports one crash kernel region now, the user cannot sense the low memory reserved by reading /sys/kernel/kexec_crash_size. Also, low memory cannot be freed by writing this file. For example: resource_size(crashk_res) = 512M resource_size(crashk_low_res) = 256M The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be 768M. When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB, which is incorrect. Since crashk_res manages the memory with high address and crashk_low_res manages the memory with low address, crashk_low_res is shrunken only when all crashk_res is shrunken. And because when there is only one crash kernel region, crashk_res is always used. Therefore, if all crashk_res is shrunken and crashk_low_res still exists, swap them. This patch (of 6): If the value of parameter 'new_size' is in the semi-open and semi-closed interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the calculation result of ram_res is: ram_res->start = crashk_res.end + 1 ram_res->end = crashk_res.end The operation of insert_resource() fails, and ram_res is not added to iomem_resource. As a result, the memory of the control block ram_res is leaked. In fact, on all architectures, the start address and size of crashk_res are already aligned by KEXEC_CRASH_MEM_ALIGN. Therefore, we do not need to round up crashk_res.start again. Instead, we should round up 'new_size' in advance. Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com Fixes: 6480e5a09237 ("kdump: add missing RAM resource in crash_shrink_memory()") Fixes: 06a7f711246b ("kexec: premit reduction of the reserved memory size") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Cong Wang <amwang@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
kexec_elf: support 32 bit ELF files The powerpc version only supported 64 bit. Add some code to switch decoding of fields during runtime so we can kexec a 32 bit kernel from a 64 bit kernel and vice versa. Signed-off-by: Sven Schnelle <svens@stackframe.org> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Helge Deller <deller@gmx.de> | 6 年前 | |
kexec: derive purgatory entry from symbol stable inclusion from stable-v5.10.252 commit 027797595a108726f4a0a45d225f603b0ffbd22b category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8682 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=027797595a108726f4a0a45d225f603b0ffbd22b -------------------------------- [ Upstream commit 480e1d5c64bb14441f79f2eb9421d5e26f91ea3d ] kexec_load_purgatory() derives image->start by locating e_entry inside an SHF_EXECINSTR section. If the purgatory object contains multiple executable sections with overlapping sh_addr, the entrypoint check can match more than once and trigger a WARN. Derive the entry section from the purgatory_start symbol when present and compute image->start from its final placement. Keep the existing e_entry fallback for purgatories that do not expose the symbol. WARNING: kernel/kexec_file.c:1009 at kexec_load_purgatory+0x395/0x3c0, CPU#10: kexec/1784 Call Trace: <TASK> bzImage64_load+0x133/0xa00 __do_sys_kexec_file_load+0x2b3/0x5c0 do_syscall_64+0x81/0x610 entry_SYSCALL_64_after_hwframe+0x76/0x7e [me@linux.beauty: move helper to avoid forward declaration, per Baoquan] Link: https://lkml.kernel.org/r/20260128043511.316860-1-me@linux.beauty Link: https://lkml.kernel.org/r/20260120124005.148381-1-me@linux.beauty Fixes: 8652d44f466a ("kexec: support purgatories with .text.hot sections") Signed-off-by: Li Chen <me@linux.beauty> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Graf <graf@amazon.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Li Chen <me@linux.beauty> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ricardo Ribalda Delgado <ribalda@chromium.org> Cc: Ross Zwisler <zwisler@google.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Tengda Wu <wutengda2@huawei.com> | 3 个月前 | |
panic, kexec: make __crash_kexec() NMI safe stable inclusion from stable-v5.10.178 commit 56314b90fd43bd2444942bc14a7c5c768ce8ec57 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8ALH3 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=56314b90fd43bd2444942bc14a7c5c768ce8ec57 -------------------------------- commit 05c6257433b7212f07a7e53479a8ab038fc1666a upstream. Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI panic() doesn't work. The cause of that lies in the PREEMPT_RT definition of mutex_trylock(): if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task())) return 0; This prevents an nmi_panic() from executing the main body of __crash_kexec() which does the actual kexec into the kdump kernel. The warning and return are explained by: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") [...] The reasons for this are: 1) There is a potential deadlock in the slowpath 2) Another cpu which blocks on the rtmutex will boost the task which allegedly locked the rtmutex, but that cannot work because the hard/softirq context borrows the task context. Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex and replace it with an atomic variable. This is somewhat overzealous as *some* callsites could keep using a mutex (e.g. the sysfs-facing ones like crash_shrink_memory()), but this has the benefit of involving a single unified lock and preventing any future NMI-related surprises. Tested by triggering NMI panics via: $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi $ echo 1 > /proc/sys/kernel/unknown_nmi_panic $ echo 1 > /proc/sys/kernel/panic $ ipmitool power diag Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") Signed-off-by: Valentin Schneider <vschneid@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
kheaders: Use array declaration instead of char stable inclusion from stable-v5.10.180 commit fcd2da2e6bf2640a31a2a5b118b50dc3635c707b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8DDFN Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fcd2da2e6bf2640a31a2a5b118b50dc3635c707b -------------------------------- commit b69edab47f1da8edd8e7bfdf8c70f51a2a5d89fb upstream. Under CONFIG_FORTIFY_SOURCE, memcpy() will check the size of destination and source buffers. Defining kernel_headers_data as "char" would trip this check. Since these addresses are treated as byte arrays, define them as arrays (as done everywhere else). This was seen with: $ cat /sys/kernel/kheaders.tar.xz >> /dev/null detected buffer overflow in memcpy kernel BUG at lib/string_helpers.c:1027! ... RIP: 0010:fortify_panic+0xf/0x20 [...] Call Trace: <TASK> ikheaders_read+0x45/0x50 [kheaders] kernfs_fop_read_iter+0x1a4/0x2f0 ... Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20230302112130.6e402a98@kernel.org/ Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 43d8ce9d65a5 ("Provide in-kernel headers to make extending kernel easier") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230302224946.never.243-kees@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
kmod: remove redundant "be an" in the comment There exists redundant "be an" in the comment, remove it. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: James Morris <jmorris@namei.org> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergei Trofimovich <slyfox@gentoo.org> Cc: Sergey Kvachonok <ravenexp@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Vroon <chainsaw@gentoo.org> Cc: Christoph Hellwig <hch@infradead.org> Link: http://lkml.kernel.org/r/20200610154923.27510-3-mcgrof@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
kprobes: Fix deadlock issue with kmemleak hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9K8D1 ----------------------------------------- There is following deadlock issue: CPU0 CPU1 ==== ==== kretprobe_trampoline kmem_cache_alloc trampoline_handler kmemleak_alloc __kretprobe_trampoline_handler create_object kretprobe_hash_lock <-- hold kmemleak lock <-- hold kretprobe table lock __link_object recycle_rp_inst stack_trace_save kfree_rcu kvfree_call_rcu ... kmemleak_ignore unwind_next_frame kretprobe_find_ret_addr <-- wait for kmemleak lock kretprobe_hash_lock <-- wait for kretprobe table lock One task on CPU0 hold kretprobe_hash_lock and wait for kmemleak_lock, however, kmemleak_lock was held by other task on CPU1 and that task is waiting for kretprobe_hash_lock, then deadlock happended. To fix it, move kfree_rcu() out of kretprobe table lock area. Fixes: 88fef946364b ("kprobes: Add kretprobe_find_ret_addr() for searching return address") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> | 1 年前 | |
kexec: turn all kexec_mutex acquisitions into trylocks stable inclusion from stable-v5.10.178 commit d425f348211ffb387587dc65113852767240d023 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8ALH3 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d425f348211ffb387587dc65113852767240d023 -------------------------------- commit 7bb5da0d490b2d836c5218f5186ee588d2145310 upstream. Patch series "kexec, panic: Making crash_kexec() NMI safe", v4. This patch (of 2): Most acquistions of kexec_mutex are done via mutex_trylock() - those were a direct "translation" from: 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") there have however been two additions since then that use mutex_lock(): crash_get_memory_size() and crash_shrink_memory(). A later commit will replace said mutex with an atomic variable, and locking operations will become atomic_cmpxchg(). Rather than having those mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into trylocks that can return -EBUSY on acquisition failure. This does halve the printable size of the crash kernel, but that's still neighbouring 2G for 32bit kernels which should be ample enough. Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.com Signed-off-by: Valentin Schneider <vschneid@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
kthread: Fix PF_KTHREAD vs to_kthread() race mainline inclusion from mainline-v5.13-rc1 commit 3a7956e25e1d7b3c148569e78895e1f3178122a9 bugzilla: 52510 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a7956e25e1d7b3c148569e78895e1f3178122a9 -------------------------------- The kthread_is_per_cpu() construct relies on only being called on PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the following usage pattern: if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p)) However, as reported by syzcaller, this is broken. The scenario is: CPU0 CPU1 (running p) (p->flags & PF_KTHREAD) // true begin_new_exec() me->flags &= ~(PF_KTHREAD|...); kthread_is_per_cpu(p) to_kthread(p) WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT* Introduce __to_kthread() that omits the WARN and is sure to check both values. Use this to remove the problematic pattern for kthread_is_per_cpu() and fix a number of other kthread_*() functions that have similar issues but are currently not used in ways that would expose the problem. Notably kthread_func() is only ever called on 'current', while kthread_probe_data() is only used for PF_WQ_WORKER, which implies the task is from kthread_create*(). Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Link: https://lkml.kernel.org/r/YH6WJc825C4P0FCK@hirez.programming.kicks-ass.net Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com> Conflicts: kernel/sched/core.c Reviewed-by: Chen Hui <judy.chenhui@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> | 6 年前 | |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public licence as published by the free software foundation either version 2 of the licence or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 114 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 6 年前 | |
module: Fix kernel panic when a symbol st_shndx is out of bounds mainline inclusion from mainline-v7.0-rc3 commit f9d69d5e7bde2295eb7488a56f094ac8f5383b92 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/14254 CVE: CVE-2026-31521 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f9d69d5e7bde2295eb7488a56f094ac8f5383b92 -------------------------------- The module loader doesn't check for bounds of the ELF section index in simplify_symbols(): for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) { const char *name = info->strtab + sym[i].st_name; switch (sym[i].st_shndx) { case SHN_COMMON: [...] default: /* Divert to percpu allocation if a percpu var. */ if (sym[i].st_shndx == info->index.pcpu) secbase = (unsigned long)mod_percpu(mod); else /** HERE --> **/ secbase = info->sechdrs[sym[i].st_shndx].sh_addr; sym[i].st_value += secbase; break; } } A symbol with an out-of-bounds st_shndx value, for example 0xffff (known as SHN_XINDEX or SHN_HIRESERVE), may cause a kernel panic: BUG: unable to handle page fault for address: ... RIP: 0010:simplify_symbols+0x2b2/0x480 ... Kernel panic - not syncing: Fatal exception This can happen when module ELF is legitimately using SHN_XINDEX or when it is corrupted. Add a bounds check in simplify_symbols() to validate that st_shndx is within the valid range before using it. This issue was discovered due to a bug in llvm-objcopy, see relevant discussion for details [1]. [1] https://lore.kernel.org/linux-modules/20251224005752.201911-1-ihor.solodrai@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Conflicts: kernel/module/main.c kernel/module.c [commit cfc1d277891eb merged] Signed-off-by: Zhang Yuwei <zhangyuwei20@huawei.com> | 1 个月前 | |
module: harden ELF info handling stable inclusion from stable-5.10.26 commit d802672c7f00963613f289579073ac519f0d306c bugzilla: 51363 -------------------------------- [ Upstream commit ec2a29593c83ed71a7f16e3243941ebfcf75fdf6 ] 5fdc7db644 ("module: setup load info before module_sig_check()") moved the ELF setup, so that it was done before the signature check. This made the module name available to signature error messages. However, the checks for ELF correctness in setup_load_info are not sufficient to prevent bad memory references due to corrupted offset fields, indices, etc. So, there's a regression in behavior here: a corrupt and unsigned (or badly signed) module, which might previously have been rejected immediately, can now cause an oops/crash. Harden ELF handling for module loading by doing the following: - Move the signature check back up so that it comes before ELF initialization. It's best to do the signature check to see if we can trust the module, before using the ELF structures inside it. This also makes checks against info->len more accurate again, as this field will be reduced by the length of the signature in mod_check_sig(). The module name is now once again not available for error messages during the signature check, but that seems like a fair tradeoff. - Check if sections have offset / size fields that at least don't exceed the length of the module. - Check if sections have section name offsets that don't fall outside the section name table. - Add a few other sanity checks against invalid section indices, etc. This is not an exhaustive consistency check, but the idea is to at least get through the signature and blacklist checks without crashing because of corrupted ELF info, and to error out gracefully for most issues that would have caused problems later on. Fixes: 5fdc7db6448a ("module: setup load info before module_sig_check()") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 5 年前 | |
module: harden ELF info handling stable inclusion from stable-5.10.26 commit d802672c7f00963613f289579073ac519f0d306c bugzilla: 51363 -------------------------------- [ Upstream commit ec2a29593c83ed71a7f16e3243941ebfcf75fdf6 ] 5fdc7db644 ("module: setup load info before module_sig_check()") moved the ELF setup, so that it was done before the signature check. This made the module name available to signature error messages. However, the checks for ELF correctness in setup_load_info are not sufficient to prevent bad memory references due to corrupted offset fields, indices, etc. So, there's a regression in behavior here: a corrupt and unsigned (or badly signed) module, which might previously have been rejected immediately, can now cause an oops/crash. Harden ELF handling for module loading by doing the following: - Move the signature check back up so that it comes before ELF initialization. It's best to do the signature check to see if we can trust the module, before using the ELF structures inside it. This also makes checks against info->len more accurate again, as this field will be reduced by the length of the signature in mod_check_sig(). The module name is now once again not available for error messages during the signature check, but that seems like a fair tradeoff. - Check if sections have offset / size fields that at least don't exceed the length of the module. - Check if sections have section name offsets that don't fall outside the section name table. - Add a few other sanity checks against invalid section indices, etc. This is not an exhaustive consistency check, but the idea is to at least get through the signature and blacklist checks without crashing because of corrupted ELF info, and to error out gracefully for most issues that would have caused problems later on. Fixes: 5fdc7db6448a ("module: setup load info before module_sig_check()") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 5 年前 | |
notifier: Fix broken error handling pattern The current notifiers have the following error handling pattern all over the place: int err, nr; err = __foo_notifier_call_chain(&chain, val_up, v, -1, &nr); if (err & NOTIFIER_STOP_MASK) __foo_notifier_call_chain(&chain, val_down, v, nr-1, NULL) And aside from the endless repetition thereof, it is broken. Consider blocking notifiers; both calls take and drop the rwsem, this means that the notifier list can change in between the two calls, making @nr meaningless. Fix this by replacing all the __foo_notifier_call_chain() functions with foo_notifier_call_chain_robust() that embeds the above pattern, but ensures it is inside a single lock region. Note: I switched atomic_notifier_call_chain_robust() to use the spinlock, since RCU cannot provide the guarantee required for the recovery. Note: software_resume() error handling was broken afaict. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20200818135804.325626653@infradead.org | 5 年前 | |
Revert "ima: Introduce ima namespace" hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4O25G CVE: NA -------------------------------- This reverts commit a8352473e0aa18693cd0a911beab237cb2d396b9. Signed-off-by: Zhang Tianxing <zhangtianxing3@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Xiu Jianfeng<xiujianfeng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
padata: Fix pd UAF once and for all mainline inclusion from mainline-v6.17-rc1 commit 71203f68c7749609d7fc8ae6ad054bdedeb24f91 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICU75T CVE: CVE-2025-38584 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=71203f68c7749609d7fc8ae6ad054bdedeb24f91 -------------------------------- There is a race condition/UAF in padata_reorder that goes back to the initial commit. A reference count is taken at the start of the process in padata_do_parallel, and released at the end in padata_serial_worker. This reference count is (and only is) required for padata_replace to function correctly. If padata_replace is never called then there is no issue. In the function padata_reorder which serves as the core of padata, as soon as padata is added to queue->serial.list, and the associated spin lock released, that padata may be processed and the reference count on pd would go away. Fix this by getting the next padata before the squeue->serial lock is released. In order to make this possible, simplify padata_reorder by only calling it once the next padata arrives. Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Conflicts: kernel/padata.c [modify the parameters of cpumask_next_wrap() because it's re-implemented in mainline] Signed-off-by: Liu Kai <liukai284@huawei.com> | 6 个月前 | |
panic: Flush kernel log buffer at the end stable inclusion from stable-v5.10.215 commit 5b71a921dbe7c804048922da794f7d5dab5b2c2f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAJJ2D Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5b71a921dbe7c804048922da794f7d5dab5b2c2f -------------------------------- [ Upstream commit d988d9a9b9d180bfd5c1d353b3b176cb90d6861b ] If the kernel crashes in a context where printk() calls always defer printing (such as in NMI or inside a printk_safe section) then the final panic messages will be deferred to irq_work. But if irq_work is not available, the messages will not get printed unless explicitly flushed. The result is that the final "end Kernel panic" banner does not get printed. Add one final flush after the last printk() call to make sure the final panic messages make it out as well. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-14-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng1 <sanglipeng1@jd.com> | 1 年前 | |
module: ensure that kobject_put() is safe for module type kobjects stable inclusion from stable-v5.15.183 commit f1c71b4bd721a4ea21da408806964b10468623f2 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICBIZQ CVE: CVE-2025-37995 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f1c71b4bd721a4ea21da408806964b10468623f2 -------------------------------- commit a6aeb739974ec73e5217c75a7c008a688d3d5cf1 upstream. In 'lookup_or_create_module_kobject()', an internal kobject is created using 'module_ktype'. So call to 'kobject_put()' on error handling path causes an attempt to use an uninitialized completion pointer in 'module_kobject_release()'. In this scenario, we just want to release kobject without an extra synchronization required for a regular module unloading process, so adding an extra check whether 'complete()' is actually required makes 'kobject_put()' safe. Reported-by: syzbot+7fb8a372e1f6add936dd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7fb8a372e1f6add936dd Fixes: 942e443127e9 ("module: Fix mod->mkobj.kobj potentially freed too early") Cc: stable@vger.kernel.org Suggested-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://lore.kernel.org/r/20250507065044.86529-1-dmantipov@yandex.ru Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Pan Taixi <pantaixi1@huawei.com> | 1 年前 | |
pid: Add a judgment for ns null in pid_nr_ns stable inclusion from stable-v5.10.246 commit c2d09d724856b6f82ab688f65fc1ce833bb56333 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ID6BVM CVE: CVE-2025-40178 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c2d09d724856b6f82ab688f65fc1ce833bb56333 -------------------------------- [ Upstream commit 006568ab4c5ca2309ceb36fa553e390b4aa9c0c7 ] __task_pid_nr_ns ns = task_active_pid_ns(current); pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns); if (pid && ns->level <= pid->level) { Sometimes null is returned for task_active_pid_ns. Then it will trigger kernel panic in pid_nr_ns. For example: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 Mem abort info: ESR = 0x0000000096000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=00000002175aa000 [0000000000000058] pgd=08000002175ab003, p4d=08000002175ab003, pud=08000002175ab003, pmd=08000002175be003, pte=0000000000000000 pstate: 834000c5 (Nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : __task_pid_nr_ns+0x74/0xd0 lr : __task_pid_nr_ns+0x24/0xd0 sp : ffffffc08001bd10 x29: ffffffc08001bd10 x28: ffffffd4422b2000 x27: 0000000000000001 x26: ffffffd442821168 x25: ffffffd442821000 x24: 00000f89492eab31 x23: 00000000000000c0 x22: ffffff806f5693c0 x21: ffffff806f5693c0 x20: 0000000000000001 x19: 0000000000000000 x18: 0000000000000000 x17: 00000000529c6ef0 x16: 00000000529c6ef0 x15: 00000000023a1adc x14: 0000000000000003 x13: 00000000007ef6d8 x12: 001167c391c78800 x11: 00ffffffffffffff x10: 0000000000000000 x9 : 0000000000000001 x8 : ffffff80816fa3c0 x7 : 0000000000000000 x6 : 49534d702d535449 x5 : ffffffc080c4c2c0 x4 : ffffffd43ee128c8 x3 : ffffffd43ee124dc x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffffff806f5693c0 Call trace: __task_pid_nr_ns+0x74/0xd0 ... __handle_irq_event_percpu+0xd4/0x284 handle_irq_event+0x48/0xb0 handle_fasteoi_irq+0x160/0x2d8 generic_handle_domain_irq+0x44/0x60 gic_handle_irq+0x4c/0x114 call_on_irq_stack+0x3c/0x74 do_interrupt_handler+0x4c/0x84 el1_interrupt+0x34/0x58 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x68/0x6c account_kernel_stack+0x60/0x144 exit_task_stack_account+0x1c/0x80 do_exit+0x7e4/0xaf8 ... get_signal+0x7bc/0x8d8 do_notify_resume+0x128/0x828 el0_svc+0x6c/0x70 el0t_64_sync_handler+0x68/0xbc el0t_64_sync+0x1a8/0x1ac Code: 35fffe54 911a02a8 f9400108 b4000128 (b9405a69) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Link: https://lore.kernel.org/20250802022123.3536934-1-gxxa03070307@gmail.com Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Liu Kai <liukai284@huawei.com> | 6 个月前 | |
zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING stable inclusion from stable-v5.10.221 commit 82c7acf9a12c9b4e4b0aba531de82b4e39a345c8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAOPW2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=82c7acf9a12c9b4e4b0aba531de82b4e39a345c8 -------------------------------- [ Upstream commit 7fea700e04bd3f424c2d836e98425782f97b494e ] kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true. That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop. Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/ Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Wei Fu <fuweid89@gmail.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joel Granados <j.granados@samsung.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Zqiang <qiang.zhang1211@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> | 1 年前 | |
x86: profiling: Check prof_buffer in profile_tick() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9T4EM -------------------------------- KASAN reported a UAF problem in profile_tick(): BUG: KASAN: use-after-free in profile_tick+0x5c/0x80 Read of size 8 at addr ffff888100928aa0 by task bash/1108 CPU: 2 PID: 1108 Comm: bash Not tainted 5.10.0+ #72 Call Trace: <IRQ> dump_stack+0x93/0xc5 print_address_description.constprop.0+0x1c/0x3c0 kasan_report.cold+0x37/0x74 check_memory_region+0x161/0x1c0 profile_tick+0x5c/0x80 tick_sched_timer+0xcd/0x100 __hrtimer_run_queues+0x23e/0x480 hrtimer_interrupt+0x1c2/0x440 asm_call_irq_on_stack+0xf/0x20 </IRQ> ... It is beacause in profiling_store(), profile_init() is possible to fail and free prof_cpu_mask. However prof_cpu_mask is not set to NULL and cpumask_available(prof_cpu_mask) will return true in profile_tick(). Then cpumask_test_cpu() will dereference prof_cpu_mask and trigger the KASAN warning. There is no interface to disable profile_tick() even though profile_init has been already failed. And when CONFIG_CPUMASK_OFFSTACK=n, cpumask_var_t is an array pointer which can not be set to NULL. So prof_cpu_mask can be a freed pointner or uncleaned array here, and cpumask_available() can't promise it is safe to use it. To fix this, add check for prof_buffer in profile_tick() before check cpumask_available(prof_cpu_mask). Because prof_cpu_mask is available only after prof_buffer is allocated successfully. Fixes: c309b917cab5 ("cpumask: convert kernel/profile.c") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> | 2 年前 | |
ptrace: slightly saner 'get_dumpable()' logic stable inclusion from stable-v5.10.256 commit 93d4ba49d18e3d7fb41a9927c2d0cca5e9dfefd6 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15055 CVE: CVE-2026-46333 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=93d4ba49d18e -------------------------------- commit 31e62c2ebbfdc3fe3dbdf5e02c92a9dc67087a3a upstream. The 'dumpability' of a task is fundamentally about the memory image of the task - the concept comes from whether it can core dump or not - and makes no sense when you don't have an associated mm. And almost all users do in fact use it only for the case where the task has a mm pointer. But we have one odd special case: ptrace_may_access() uses 'dumpable' to check various other things entirely independently of the MM (typically explicitly using flags like PTRACE_MODE_READ_FSCREDS). Including for threads that no longer have a VM (and maybe never did, like most kernel threads). It's not what this flag was designed for, but it is what it is. The ptrace code does check that the uid/gid matches, so you do have to be uid-0 to see kernel thread details, but this means that the traditional "drop capabilities" model doesn't make any difference for this all. Make it all make a *bit* more sense by saying that if you don't have a MM pointer, we'll use a cached "last dumpability" flag if the thread ever had a MM (it will be zero for kernel threads since it is never set), and require a proper CAP_SYS_PTRACE capability to override. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> | 1 个月前 | |
kernel.h: split out min()/max() et al. helpers kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out min()/max() et al. helpers. At the same time convert users in header and lib folder to use new header. Though for time being include new header back to kernel.h to avoid twisted indirected includes for other existing users. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
kernel/reboot: emergency_restart: Set correct system_state stable inclusion from stable-v5.10.202 commit 8f8fc95b3a7f39d7ca3aff514f21f5a02c7c2ac7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9DZOS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8f8fc95b3a7f39d7ca3aff514f21f5a02c7c2ac7 -------------------------------- commit 60466c067927abbcaff299845abd4b7069963139 upstream. As the emergency restart does not call kernel_restart_prepare(), the system_state stays in SYSTEM_RUNNING. Since bae1d3a05a8b, this hinders i2c_in_atomic_xfer_mode() from becoming active, and therefore might lead to avoidable warnings in the restart handlers, e.g.: [ 12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0 [ 12.676926] Voluntary context switch within RCU read-side critical section! ... [ 12.742376] schedule_timeout from wait_for_completion_timeout+0x90/0x114 [ 12.749179] wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70 ... [ 12.994527] atomic_notifier_call_chain from machine_restart+0x34/0x58 [ 13.001050] machine_restart from panic+0x2a8/0x32c Avoid these by setting the correct system_state. Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()") Cc: stable@vger.kernel.org # v5.2+ Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Tested-by: Nishanth Menon <nm@ti.com> Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com> Link: https://lore.kernel.org/r/20230327-tegra-pmic-reboot-v7-1-18699d5dcd76@skidata.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
regset: kill ->get() no instances left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> | 5 年前 | |
relay: fix type mismatch when allocating memory in relay_create_buf() stable inclusion from stable-v5.10.163 commit 93cdd1263691df25b7ec710f3b9ba2264724f1a3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=93cdd1263691df25b7ec710f3b9ba2264724f1a3 ---------------------------------------------------- [ Upstream commit 4d8586e04602fe42f0a782d2005956f8b6302678 ] The 'padding' field of the 'rchan_buf' structure is an array of 'size_t' elements, but the memory is allocated for an array of 'size_t *' elements. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20221129092002.3538384-1-Ilia.Gavrilov@infotecs.ru Fixes: b86ff981a825 ("[PATCH] relay: migrate from relayfs to a generic relay API") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: zhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com> | 2 年前 | |
kernel/resource: fix kfree() of bootmem memory again mainline inclusion from mainline-v5.18-rc1 commit 0cbcc92917c5de80f15c24d033566539ad696892 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/486 CVE: CVE-2022-49190 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0cbcc92917c5de80f15c24d033566539ad696892 -------------------------------- Since commit ebff7d8f270d ("mem hotunplug: fix kfree() of bootmem memory"), we could get a resource allocated during boot via alloc_resource(). And it's required to release the resource using free_resource(). Howerver, many people use kfree directly which will result in kernel BUG. In order to fix this without fixing every call site, just leak a couple of bytes in such corner case. Link: https://lkml.kernel.org/r/20220217083619.19305-1-linmiaohe@huawei.com Fixes: ebff7d8f270d ("mem hotunplug: fix kfree() of bootmem memory") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Conflicts: kernel/resource.c [Just contexts conflicts.] Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> | 4 个月前 | |
rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs() stable inclusion from stable-v5.10.110 commit f0250e05e5744899539a514b722ccc34430ede64 bugzilla: https://gitee.com/openeuler/kernel/issues/I574AL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f0250e05e5744899539a514b722ccc34430ede64 -------------------------------- [ Upstream commit 5e0ccd4a3b01c5a71732a13186ca110a138516ea ] Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union, update includes") added regressions for our servers. Using copy_from_user() and clear_user() for 64bit values is suboptimal. We can use faster put_user() and get_user() on 64bit arches. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lkml.kernel.org/r/20210413203352.71350-4-eric.dumazet@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Wei Li <liwei391@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
scftorture: Forgive memory-allocation failure if KASAN stable inclusion from stable-v5.10.197 commit 3f72fdb20f6d1103ded73b1594ad9c26a6026a24 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I96Q8P Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3f72fdb20f6d1103ded73b1594ad9c26a6026a24 -------------------------------- [ Upstream commit 013608cd0812bdb21fc26d39ed8fdd2fc76e8b9b ] Kernels built with CONFIG_KASAN=y quarantine newly freed memory in order to better detect use-after-free errors. However, this can exhaust memory more quickly in allocator-heavy tests, which can result in spurious scftorture failure. This commit therefore forgives memory-allocation failure in kernels built with CONFIG_KASAN=y, but continues counting the errors for use in detailed test-result analyses. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
mm: memcontrol: account kernel stack per node Currently the kernel stack is being accounted per-zone. There is no need to do that. In addition due to being per-zone, memcg has to keep a separate MEMCG_KERNEL_STACK_KB. Make the stat per-node and deprecate MEMCG_KERNEL_STACK_KB as memcg_stat_item is an extension of node_stat_item. In addition localize the kernel stack stats updates to account_kernel_stack(). Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200630161539.1759185-1-shakeelb@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
seccomp: Invalidate seccomp mode to catch death failures stable inclusion from stable-v5.10.211 commit 1dd3dc389211c8a62cfe1d8f72723e619fdb5532 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAF2J4 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1dd3dc389211c8a62cfe1d8f72723e619fdb5532 -------------------------------- [ Upstream commit 495ac3069a6235bfdf516812a2a9b256671bbdf9 ] If seccomp tries to kill a process, it should never see that process again. To enforce this proactively, switch the mode to something impossible. If encountered: WARN, reject all syscalls, and attempt to kill the process again even harder. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Fixes: 8112c4f140fa ("seccomp: remove 2-phase API") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng1 <sanglipeng1@jd.com> | 1 年前 | |
task_work: unconditionally run task_work from get_signal() stable inclusion from stable-v5.10.162 commit 6ef2b4728a00c98cc1163212443e39d79a4c2c94 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7P7OH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6ef2b4728a00c98cc1163212443e39d79a4c2c94 -------------------------------- [ Upstream commit 35d0b389f3b23439ad15b610d6e43fc72fc75779 ] Song reported a boot regression in a kvm image with 5.11-rc, and bisected it down to the below patch. Debugging this issue, turns out that the boot stalled when a task is waiting on a pipe being released. As we no longer run task_work from get_signal() unless it's queued with TWA_SIGNAL, the task goes idle without running the task_work. This prevents ->release() from being called on the pipe, which another boot task is waiting on. For now, re-instate the unconditional task_work run from get_signal(). For 5.12, we'll collapse TWA_RESUME and TWA_SIGNAL, as it no longer makes sense to have a distinction between the two. This will turn task_work notification into a simple boolean, whether to notify or not. Fixes: 98b89b649fce ("signal: kill JOBCTL_TASK_WORK") Reported-by: Song Liu <songliubraving@fb.com> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang version 11.0.1 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
isolation: Check whether there exists a housekeeping CPU online hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9NR7Q CVE: NA -------------------------------- We need to make sure that the housekeeper, one of the housekeeping CPUs, is online, as describe in the below Link. We add this check after secondary CPUs online is finished. Link: https://lore.kernel.org/all/20190504002733.GB19076@lenoir/ Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> | 2 年前 | |
sched/core: Initialize the idle task with preemption disabled stable inclusion from stable-5.10.50 commit 3c51d82d0b7862d7d246016c74b4390fb1fa1f11 bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3c51d82d0b7862d7d246016c74b4390fb1fa1f11 -------------------------------- [ Upstream commit f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674 ] As pointed out by commit de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread") init_idle() can and will be invoked more than once on the same idle task. At boot time, it is invoked for the boot CPU thread by sched_init(). Then smp_init() creates the threads for all the secondary CPUs and invokes init_idle() on them. As the hotplug machinery brings the secondaries to life, it will issue calls to idle_thread_get(), which itself invokes init_idle() yet again. In this case it's invoked twice more per secondary: at _cpu_up(), and at bringup_cpu(). Given smp_init() already initializes the idle tasks for all *possible* CPUs, no further initialization should be required. Now, removing init_idle() from idle_thread_get() exposes some interesting expectations with regards to the idle task's preempt_count: the secondary startup always issues a preempt_disable(), requiring some reset of the preempt count to 0 between hot-unplug and hotplug, which is currently served by idle_thread_get() -> idle_init(). Given the idle task is supposed to have preemption disabled once and never see it re-enabled, it seems that what we actually want is to initialize its preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove init_idle() from idle_thread_get(). Secondary startups were patched via coccinelle: @begone@ @@ -preempt_disable(); ... cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 8 年前 | |
arm64: Introduce Xint software solution hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/IB6JLE -------------------------------- Introduce xint software solution for kernel, it provides a lightweight interrupt processing framework for latency-sensitive interrupts, and enabled dynamically for each irq by /proc/irq/<irq>/xint interface. The main implementation schemes are as follows: 1. For a small number of latency-sensitive interrupts, it could be configured as xint state, and process irq by xint framework instead of the kernel general interrupt framework, so improve performance by remove unnecessary processes. It is not recommended to configure too many interrupts as xint in the system, as this will affect system stability to some extent. 2. For each SGI/PPI/SPI interrupts whoes irq numbers are consecutive and limited, use a bitmap to check whether a hwirq is xint. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Liao Chen <liaochen4@huawei.com> | 1 年前 | |
gcc-plugins/stackleak: Use noinstr in favor of notrace stable inclusion from stable-v5.10.102 commit 143aaf79bafa9839cabebd49aa10b36f8aaef3ce bugzilla: https://gitee.com/openeuler/kernel/issues/I567K6 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=143aaf79bafa9839cabebd49aa10b36f8aaef3ce -------------------------------- [ Upstream commit dcb85f85fa6f142aae1fe86f399d4503d49f2b60 ] While the stackleak plugin was already using notrace, objtool is now a bit more picky. Update the notrace uses to noinstr. Silences the following objtool warnings when building with: CONFIG_DEBUG_ENTRY=y CONFIG_STACK_VALIDATION=y CONFIG_VMLINUX_VALIDATION=y CONFIG_GCC_PLUGIN_STACKLEAK=y vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section Note that the plugin's addition of calls to stackleak_track_stack() from noinstr functions is expected to be safe, as it isn't runtime instrumentation and is self-contained. Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Wei Li <liwei391@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
stacktrace: move filter_irq_stacks() to kernel/stacktrace.c mainline inclusion from mainline-v5.16-rc1 commit f39f21b3ddc7fc0f87eb6dc75ddc81b5bbfb7672 category: feature bugzilla: 185780 https://gitee.com/openeuler/kernel/issues/I4EUY7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f39f21b3ddc7fc0f87eb6dc75ddc81b5bbfb7672 ----------------------------------------------- filter_irq_stacks() has little to do with the stackdepot implementation, except that it is usually used by users (such as KASAN) of stackdepot to reduce the stack trace. However, filter_irq_stacks() itself is not useful without a stack trace as obtained by stack_trace_save() and friends. Therefore, move filter_irq_stacks() to kernel/stacktrace.c, so that new users of filter_irq_stacks() do not have to start depending on STACKDEPOT only for filter_irq_stacks(). Link: https://lkml.kernel.org/r/20210923104803.2620285-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Jann Horn <jannh@google.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: Taras Madan <tarasmadan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peng Liu <liupeng256@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
static_call: Don't make __static_call_return0 static mainline inclusion from mainline-v5.18-rc2 commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9JNPZ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b ------------------------------------------------------ System.map shows that vmlinux contains several instances of __static_call_return0(): c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0 arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not: c0011520 <arch_static_call_transform>: c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it. In order to work properly, global single instance of __static_call_return0() is required. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu Conflicts: kernel/Makefile kernel/static_call.c kernel/static_call_inline.c Signed-off-by: liwei <liwei728@huawei.com> | 2 年前 | |
static_call: Handle module init failure correctly in static_call_del_module() stable inclusion from stable-v6.1.113 commit b566c7d8a2de403ccc9d8a06195e19bbb386d0e4 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAYR8E CVE: CVE-2024-50002 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b566c7d8a2de403ccc9d8a06195e19bbb386d0e4 -------------------------------- [ Upstream commit 4b30051c4864234ec57290c3d142db7c88f10d8a ] Module insertion invokes static_call_add_module() to initialize the static calls in a module. static_call_add_module() invokes __static_call_init(), which allocates a struct static_call_mod to either encapsulate the built-in static call sites of the associated key into it so further modules can be added or to append the module to the module chain. If that allocation fails the function returns with an error code and the module core invokes static_call_del_module() to clean up eventually added static_call_mod entries. This works correctly, when all keys used by the module were converted over to a module chain before the failure. If not then static_call_del_module() causes a #GP as it blindly assumes that key::mods points to a valid struct static_call_mod. The problem is that key::mods is not a individual struct member of struct static_call_key, it's part of a union to save space: union { /* bit 0: 0 = mods, 1 = sites */ unsigned long type; struct static_call_mod *mods; struct static_call_site *sites; }; key::sites is a pointer to the list of built-in usage sites of the static call. The type of the pointer is differentiated by bit 0. A mods pointer has the bit clear, the sites pointer has the bit set. As static_call_del_module() blidly assumes that the pointer is a valid static_call_mod type, it fails to check for this failure case and dereferences the pointer to the list of built-in call sites, which is obviously bogus. Cure it by checking whether the key has a sites or a mods pointer. If it's a sites pointer then the key is not to be touched. As the sites are walked in the same order as in __static_call_init() the site walk can be terminated because all subsequent sites have not been touched by the init code due to the error exit. If it was converted before the allocation fail, then the inner loop which searches for a module match will find nothing. A fail in the second allocation in __static_call_init() is harmless and does not require special treatment. The first allocation succeeded and converted the key to a module chain. That first entry has mod::mod == NULL and mod::next == NULL, so the inner loop of static_call_del_module() will neither find a module match nor a module chain. The next site in the walk was either already converted, but can't match the module, or it will exit the outer loop because it has a static_call_site pointer and not a static_call_mod pointer. Fixes: 9183c3f9ed71 ("static_call: Add inline static call infrastructure") Closes: https://lore.kernel.org/all/20230915082126.4187913-1-ruanjinjie@huawei.com Reported-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/87zfon6b0s.ffs@tglx Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Gu Bowen <gubowen5@huawei.com> | 1 年前 | |
stop_machine: Add stop_core_cpuslocked() for per-core operations mainline inclusion from mainline-v5.19-rc1 commit 2760f5a415c3b86c6394738c6cff740c8b4ce664 category: feature feature: Intel In Filed Scan(IFS) bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I651S7 CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=2760f5a415c3b86c6394738c6cff740c8b4ce664 Intel-SIG: commit 2760f5a415c3 ("stop_machine: Add stop_core_cpuslocked() for per-core operations") ------------------------------------- stop_machine: Add stop_core_cpuslocked() for per-core operations Hardware core level testing features require near simultaneous execution of WRMSR instructions on all threads of a core to initiate a test. Provide a customized cut down version of stop_machine_cpuslocked() that just operates on the threads of a single core. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Aichun Shi <aichun.shi@intel.com> | 3 年前 | |
getrusage: use sig->stats_lock rather than lock_task_sighand() stable inclusion from stable-v5.10.213 commit f610023e67ecb12a314ca60c431bdba2ea203a39 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAI4UL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f610023e67ecb12a314ca60c431bdba2ea203a39 -------------------------------- [ Upstream commit f7ec1cd5cc7ef3ad964b677ba82b8b77f1c93009 ] lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call getrusage() at the same time and the process has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS * NR_THREADS) time. Change getrusage() to use sig->stats_lock, it was specifically designed for this type of use. This way it runs lockless in the likely case. TODO: - Change do_task_stat() to use sig->stats_lock too, then we can remove spin_lock_irq(siglock) in wait_task_zombie(). - Turn sig->stats_lock into seqcount_rwlock_t, this way the readers in the slow mode won't exclude each other. See https://lore.kernel.org/all/20230913154907.GA26210@redhat.com/ - stats_lock has to disable irqs because ->siglock can be taken in irq context, it would be very nice to change __exit_signal() to avoid the siglock->stats_lock dependency. Link: https://lkml.kernel.org/r/20240122155053.GA26214@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dylan Hatch <dylanbhatch@google.com> Tested-by: Dylan Hatch <dylanbhatch@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng1 <sanglipeng1@jd.com> | 1 年前 | |
mm/mempolicy: wire up syscall set_mempolicy_home_node mainline inclusion from mainline-v5.17-rc1 commit 21b084fdf2a49ca1634e8e360e9ab6f9ff0dee11 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6I1Z2 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21b084fdf2a49ca1634e8e360e9ab6f9ff0dee11 -------------------------------- Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> | 3 年前 | |
kunit: allow kunit tests to be loaded as a module As tests are added to kunit, it will become less feasible to execute all built tests together. By supporting modular tests we provide a simple way to do selective execution on a running system; specifying CONFIG_KUNIT=y CONFIG_KUNIT_EXAMPLE_TEST=m ...means we can simply "insmod example-test.ko" to run the tests. To achieve this we need to do the following: o export the required symbols in kunit o string-stream tests utilize non-exported symbols so for now we skip building them when CONFIG_KUNIT_TEST=m. o drivers/base/power/qos-test.c contains a few unexported interface references, namely freq_qos_read_value() and freq_constraints_init(). Both of these could be potentially defined as static inline functions in include/linux/pm_qos.h, but for now we simply avoid supporting module build for that test suite. o support a new way of declaring test suites. Because a module cannot do multiple late_initcall()s, we provide a kunit_test_suites() macro to declare multiple suites within the same module at once. o some test module names would have been too general ("test-test" and "example-test" for kunit tests, "inode-test" for ext4 tests); rename these as appropriate ("kunit-test", "kunit-example-test" and "ext4-inode-test" respectively). Also define kunit_test_suite() via kunit_test_suites() as callers in other trees may need the old definition. Co-developed-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4 bits Acked-by: David Gow <davidgow@google.com> # For list-test Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> | 6 年前 | |
Revert "mm/memory-failure: support disabling soft offline for HugeTLB pages" hulk inclusion category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8489 -------------------------------- This reverts commit e14f9d96acb78e7810ccd2e11d70c9915752e5b8. Commit e14f9d96acb7 ("mm/memory-failure: support disabling soft offline for HugeTLB pages") introduced a new bit of sysctl_enable_soft_offline to support disabling soft offline for HugeTLB pages only. It is no longer needed. Let's revert it. Signed-off-by: Qi Xi <xiqi2@huawei.com> | 3 个月前 | |
task_work: remove legacy TWA_SIGNAL path stable inclusion from stable-v5.10.162 commit 61bdeb142e8f7550826373295cb35b3571d70f62 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7P7OH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=61bdeb142e8f7550826373295cb35b3571d70f62 -------------------------------- [ Upstream commit 03941ccfda161c2680147fa5ab92aead2a79cac1 ] All archs now support TIF_NOTIFY_SIGNAL. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
taskstats: move specifying netlink policy back to ops commit 3b0f31f2b8c9 ("genetlink: make policy common to family") had to work around removal of policy from ops by parsing in the pre_doit callback. Now that policy is back in full ops we can switch again. Set maxattr to actual size of the policies - both commands set GENL_DONT_VALIDATE_STRICT so out of range attributes will be silently ignored, anyway. v2: - remove stale comment Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net> | 5 年前 | |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it would be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 6 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154043.007767574@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 6 年前 | |
torture: Fix hang during kthread shutdown phase stable inclusion from stable-v5.10.193 commit c7e91047d345e50a7f9651b409a51c88b6de0037 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9399M Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c7e91047d345e50a7f9651b409a51c88b6de0037 -------------------------------- commit d52d3a2bf408ff86f3a79560b5cce80efb340239 upstream. During rcutorture shutdown, the rcu_torture_cleanup() function calls torture_cleanup_begin(), which sets the fullstop global variable to FULLSTOP_RMMOD. This causes the rcutorture threads for readers and fakewriters to exit all of their "while" loops and start shutting down. They then call torture_kthread_stopping(), which in turn waits for kthread_stop() to be called. However, rcu_torture_cleanup() has not yet called kthread_stop() on those threads, and before it gets a chance to do so, multiple instances of torture_kthread_stopping() invoke schedule_timeout_interruptible(1) in a tight loop. Tracing confirms that TIMER_SOFTIRQ can then continuously execute timer callbacks. If that TIMER_SOFTIRQ preempts the task executing rcu_torture_cleanup(), that task might never invoke kthread_stop(). This commit improves this situation by increasing the timeout passed to schedule_timeout_interruptible() from one jiffy to 1/20th of a second. This change prevents TIMER_SOFTIRQ from monopolizing its CPU, thus allowing rcu_torture_cleanup() to carry out the needed kthread_stop() invocations. Testing has shown 100 runs of TREE07 passing reliably, as oppose to the tens-of-percent failure rates seen beforehand. Cc: Paul McKenney <paulmck@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: <stable@vger.kernel.org> # 6.0.x Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
tracepoint: Use rcu get state and cond sync for static call updates mainline inclusion from mainline-5.10.62 commit 9a4f1dc8a17c8606ab0506cce63e554253677a53 bugzilla: 182217 https://gitee.com/openeuler/kernel/issues/I4EFOS Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a4f1dc8a17c8606ab0506cce63e554253677a53 -------------------------------- commit 7b40066c97ec66a44e388f82fcf694987451768f upstream. State transitions from 1->0->1 and N->2->1 callbacks require RCU synchronization. Rather than performing the RCU synchronization every time the state change occurs, which is quite slow when many tracepoints are registered in batch, instead keep a snapshot of the RCU state on the most recent transitions which belong to a chain, and conditionally wait for a grace period on the last transition of the chain if one g.p. has not elapsed since the last snapshot. This applies to both RCU and SRCU. This brings the performance regression caused by commit 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch") back to what it was originally. Before this commit: # trace-cmd start -e all # time trace-cmd start -p nop real 0m10.593s user 0m0.017s sys 0m0.259s After this commit: # trace-cmd start -e all # time trace-cmd start -p nop real 0m0.878s user 0m0.000s sys 0m0.103s Link: https://lkml.kernel.org/r/20210805192954.30688-1-mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/ Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Stefan Metzmacher <metze@samba.org> Fixes: 231264d6927f ("Fix: tracepoint: static call function vs data state mismatch") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
taskstats: Cleanup the use of task->exit_code stable inclusion from stable-v5.10.94 commit 69e7e979ed668656551ca141dc235a756da32eb0 bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=69e7e979ed668656551ca141dc235a756da32eb0 -------------------------------- commit 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 upstream. In the function bacct_add_task the code reading task->exit_code was introduced in commit f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats"), and it is not entirely clear what the taskstats interface is trying to return as only returning the exit_code of the first task in a process doesn't make a lot of sense. As best as I can figure the intent is to return task->exit_code after a task exits. The field is returned with per task fields, so the exit_code of the entire process is not wanted. Only the value of the first task is returned so this is not a useful way to get the per task ptrace stop code. The ordinary case of returning this value is returning after a task exits, which also precludes use for getting a ptrace value. It is common to for the first task of a process to also be the last task of a process so this field may have done something reasonable by accident in testing. Make ac_exitcode a reliable per task value by always returning it for every exited task. Setting ac_exitcode in a sensible mannter makes it possible to continue to provide this value going forward. Cc: Balbir Singh <bsingharora@gmail.com> Fixes: f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats") Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> | 4 年前 | |
kabi: use CONFIG_KABI_RESERVE to control reservation in ucount_type and user_table hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4MYYH CVE: NA ---------------------------------------- Commit 328e64817efe ("kabi: reserve space for cred and user_namespace"), besides reserving space for struct cred and struct user_namespace, adds new members to enum ucount_type and struct ctl_table user_table[] as well. Use CONFIG_KABI_RESERVE to control them. Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> | 1 年前 | |
fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers Using the fs-interal do_fchownat() wrapper allows us to get rid of fs-internal calls to the sys_fchownat() syscall. Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls. The ksys_ prefix denotes that these functions are meant as a drop-in replacement for the syscalls. In particular, they use the same calling convention as sys_{,l,f}chown(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> | 8 年前 | |
kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c Using these helpers allows us to avoid the in-kernel calls to these syscalls: sys_setregid(), sys_setgid(), sys_setreuid(), sys_setuid(), sys_setresuid(), sys_setresgid(), sys_setfsuid(), and sys_setfsgid(). The ksys_ prefix denotes that these function are meant as a drop-in replacement for the syscall. In particular, they use the same calling convention. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> | 8 年前 | |
usermodehelper: reset umask to default before executing user process Kernel threads intentionally do CLONE_FS in order to follow any changes that 'init' does to set up the root directory (or cwd). It is admittedly a bit odd, but it avoids the situation where 'init' does some extensive setup to initialize the system environment, and then we execute a usermode helper program, and it uses the original FS setup from boot time that may be very limited and incomplete. [ Both Al Viro and Eric Biederman point out that 'pivot_root()' will follow the root regardless, since it fixes up other users of root (see chroot_fs_refs() for details), but overmounting root and doing a chroot() would not. ] However, Vegard Nossum noticed that the CLONE_FS not only means that we follow the root and current working directories, it also means we share umask with whatever init changed it to. That wasn't intentional. Just reset umask to the original default (0022) before actually starting the usermode helper program. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
smp: Inline on_each_cpu_cond() and on_each_cpu() mainline inclusion from mainline-v5.13-rc1 commit a5aa5ce300597224ec76dacc8e63ba3ad7a18bbd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZV2C CVE: NA ------------------------------------------------- Simplify the code and avoid having an additional function on the stack by inlining on_each_cpu_cond() and on_each_cpu(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nadav Amit <namit@vmware.com> [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210220231712.2475218-10-namit@vmware.com conflict: include/linux/smp.h Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 6 年前 | |
user.c: make uidhash_table static Fix the following sparse warning: kernel/user.c:85:19: warning: symbol 'uidhash_table' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200413082146.22737-1-yanaijie@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | 5 年前 | |
Revert "user namespace: Add function that checks if the UID map is defined" hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4O25G CVE: NA -------------------------------- This reverts commit 44a41d572da65f3975c5e36fe3270a7fdaabe821. Signed-off-by: Zhang Tianxing <zhangtianxing3@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Xiu Jianfeng<xiujianfeng@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
switch file_open_root() to struct path mainline inclusion from mainline-5.14-rc1 commit ffb37ca3bd16ce6ea2df2f87fde9a31e94ebb54b category: bugfix bugzilla: 181657 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffb37ca3bd16ce6ea2df2f87fde9a31e94ebb54b --------------------------- ... and provide file_open_root_mnt(), using the root of given mount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: Documentation/filesystems/porting.rst [ Non-bugfix 14e43bf4356126("vfs: don't unnecessarily clone write access for writable fd") is not applied. ] Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> [Roberto Sassu: Adjust file_open_root() called by load_digest_list()] Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
nsproxy: add struct nsset Add a simple struct nsset. It holds all necessary pieces to switch to a new set of namespaces without leaving a task in a half-switched state which we will make use of in the next patch. This patch switches the existing setns logic over without causing a change in setns() behavior. This brings setns() closer to how unshare() works(). The prepare_ns() function is responsible to prepare all necessary information. This has two reasons. First it minimizes dependencies between individual namespaces, i.e. all install handler can expect that all fields are properly initialized independent in what order they are called in. Second, this makes the code easier to maintain and easier to follow if it needs to be changed. The prepare_ns() helper will only be switched over to use a flags argument in the next patch. Here it will still use nstype as a simple integer argument which was argued would be clearer. I'm not particularly opinionated about this if it really helps or not. The struct nsset itself already contains the flags field since its name already indicates that it can contain information required by different namespaces. None of this should have functional consequences. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com | 6 年前 | |
sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> | 6 年前 | |
watch_queue: fix pipe accounting mismatch stable inclusion from stable-v5.10.236 commit 8658c75343ed00e5e154ebbe24335f51ba8db547 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/10766 CVE: CVE-2025-23138 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8658c75343ed00e5e154ebbe24335f51ba8db547 -------------------------------- [ Upstream commit f13abc1e8e1a3b7455511c4e122750127f6bc9b0 ] Currently, watch_queue_set_size() modifies the pipe buffers charged to user->pipe_bufs without updating the pipe->nr_accounted on the pipe itself, due to the if (!pipe_has_watch_queue()) test in pipe_resize_ring(). This means that when the pipe is ultimately freed, we decrement user->pipe_bufs by something other than what than we had charged to it, potentially leading to an underflow. This in turn can cause subsequent too_many_pipe_buffers_soft() tests to fail with -EPERM. To remedy this, explicitly account for the pipe usage in watch_queue_set_size() to match the number set via account_pipe_buffers() (It's unclear why watch_queue_set_size() does not update nr_accounted; it may be due to intentional overprovisioning in watch_queue_set_size()?) Fixes: e95aada4cb93d ("pipe: wakeup wr_wait after setting max_usage") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/206682a8-0604-49e5-8224-fdbe0c12b460@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Yin Tirui <yintirui@huawei.com> | 4 个月前 | |
watchdog: Fix call trace when failed to initialize sdei kunpeng inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/1433 CVE: NA ---------------------------------------------------------------------- 1509d06c9c41 ("init: only move down lockup_detector_init() when sdei_watchdog is enabled") In the above commit, sdei_watchdog needs to move down lockup_detector_init (), while nmi_watchdog does not. So when sdei_watchdog fails to be initialized, nmi_watchdog should not be initialized. [ 0.706631][ T1] SDEI NMI watchdog: Disable SDEI NMI Watchdog in VM [ 0.707405][ T1] ------------[ cut here ]------------ [ 0.708020][ T1] WARNING: CPU: 0 PID: 1 at kernel/watchdog_perf.c:117 hardlockup_detector_event_create+0x24/0x108 [ 0.709230][ T1] Modules linked in: [ 0.709665][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0 #1 [ 0.710700][ T1] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 0.711625][ T1] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.712547][ T1] pc : hardlockup_detector_event_create+0x24/0x108 [ 0.713316][ T1] lr : watchdog_hardlockup_probe+0x28/0xa8 [ 0.714010][ T1] sp : ffff8000831cbdc0 [ 0.714501][ T1] pmr_save: 000000e0 [ 0.714957][ T1] x29: ffff8000831cbdc0 x28: 0000000000000000 x27: 0000000000000000 [ 0.715899][ T1] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 0.716839][ T1] x23: 0000000000000000 x22: 0000000000000000 x21: ffff80008218fab0 [ 0.717775][ T1] x20: ffff8000821af000 x19: ffff0000c0261900 x18: 0000000000000020 [ 0.718713][ T1] x17: 00000000cb551c45 x16: ffff800082625e48 x15: ffffffffffffffff [ 0.719663][ T1] x14: 0000000000000000 x13: 205d315420202020 x12: 5b5d313336363037 [ 0.720607][ T1] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffff800081b5f630 [ 0.721590][ T1] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 000000000005fff4 [ 0.722528][ T1] x5 : 00000000002bffa8 x4 : 0000000000000000 x3 : 0000000000000000 [ 0.723482][ T1] x2 : 0000000000000000 x1 : 0000000000000140 x0 : ffff0000c02c0000 [ 0.724426][ T1] Call trace: [ 0.724808][ T1] hardlockup_detector_event_create+0x24/0x108 [ 0.725535][ T1] watchdog_hardlockup_probe+0x28/0xa8 [ 0.726174][ T1] lockup_detector_init+0x110/0x158 [ 0.726776][ T1] kernel_init_freeable+0x208/0x288 [ 0.727387][ T1] kernel_init+0x2c/0x200 [ 0.727902][ T1] ret_from_fork+0x10/0x20 [ 0.728420][ T1] ---[ end trace 0000000000000000 ]--- Fixes: f61b11535a0b ("watchdog: Support watchdog_sdei coexist with existing watchdogs") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Jie Liu <liujie375@h-partners.com> Signed-off-by: Shi Yang <shiyang50@huawei.com> | 4 个月前 | |
watchdog/perf: Provide function for adjusting the event period driver inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/6755 ---------------------------------------------------------------------- Architecture's using perf events for hard lockup detection needs to convert the watchdog_thresh to the event's period, some architecture for example arm64 perform this conversion using the CPU's maximum frequency which will be acquired by cpufreq. However by the time the lockup detector's initialized the cpufreq driver may not be initialized, thus launch a watchdog with inaccurate period. Provide a function hardlockup_detector_perf_adjust_period() to allowing adjust the event period. Then architecture can update with more accurate period if cpufreq is initialized. Fixes: 94946f9eaac1 ("arm64: add hw_nmi_get_sample_period for preparation of lockup detector") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Hongye Lin <linhongye@h-partners.com> Signed-off-by: Shi Yang <shiyang50@huawei.com> | 4 个月前 | |
Revert "workqueue: remove unused cancel_work()" stable inclusion from stable-v5.10.203 commit 8a909c119827b452fc62d366d3ffd19d5d0acb71 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9GXII Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8a909c119827b452fc62d366d3ffd19d5d0acb71 -------------------------------- [ Upstream commit 73b4b53276a1d6290cd4f47dbbc885b6e6e59ac6 ] This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f. amdpgu need this function in order to prematurly stop pending reset works when another reset work already in progress. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Lai Jiangshan<jiangshanlai@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 91d3d149978b ("r8169: prevent potential deadlock in rtl8169_close") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: sanglipeng <sanglipeng1@jd.com> | 2 年前 | |
workqueue: Assign a color to barrier work items mainline inclusion from mainline-v5.15-rc1 commit d812796eb3906424cd3c0eea530983961e2f88bd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7LRJF Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d812796eb3906424cd3c0eea530983961e2f88bd --------------------------- There was no strong reason to or not to flush barrier work items in flush_workqueue(). And we have to make barrier work items not participate in nr_active so we had been using WORK_NO_COLOR for them which also makes them can't be flushed by flush_workqueue(). And the users of flush_workqueue() often do not intend to wait barrier work items issued by flush_work(). That made the choice sound perfect. But barrier work items have reference to internal structure (pool_workqueue) and the worker thread[s] is/are still busy for the workqueue user when the barrrier work items are not done. So it is reasonable to make flush_workqueue() also watch for flush_work() to make it more robust. And a problem[1] reported by Li Zhe shows that we need such robustness. The warning logs are listed below: WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0 ***** destroy_workqueue: test_workqueue9 has the following busy pwq pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2 in-flight: 5658:wq_barrier_func Showing busy workqueues and worker pools: ***** It shows that even after drain_workqueue() returns, the barrier work item is still in flight and the pwq (and a worker) is still busy on it. The problem is caused by flush_workqueue() not watching flush_work(): Thread A Worker /* normal work item with linked */ process_scheduled_works() destroy_workqueue() process_one_work() drain_workqueue() /* run normal work item */ /-- pwq_dec_nr_in_flight() flush_workqueue() <---/ /* the last normal work item is done */ sanity_check process_one_work() /-- raw_spin_unlock_irq(&pool->lock) raw_spin_lock_irq(&pool->lock) <-/ /* maybe preempt */ *WARNING* wq_barrier_func() /* maybe preempt by cond_resched() */ Thread A can get the pool lock after the Worker unlocks the pool lock before running wq_barrier_func(). And if there is any preemption happen around wq_barrier_func(), destroy_workqueue()'s sanity check is more likely to get the lock and catch it. (Note: preemption is not necessary to cause the bug, the unlocking is enough to possibly trigger the WARNING.) A simple solution might be just executing all linked barrier work items once without releasing pool lock after the head work item's pwq_dec_nr_in_flight(). But this solution has two problems: 1) the head work item might also be barrier work item when the user-queued work item is cancelled. For example: thread 1: thread 2: queue_work(wq, &my_work) flush_work(&my_work) cancel_work_sync(&my_work); /* Neiter my_work nor the barrier work is scheduled. */ destroy_workqueue(wq); /* This is an easier way to catch the WARNING. */ 2) there might be too much linked barrier work items and running them all once without releasing pool lock just causes trouble. The only solution is to make flush_workqueue() aslo watch barrier work items. So we have to assign a color to these barrier work items which is the color of the head (user-queued) work item. Assigning a color doesn't cause any problem in ative management, because the prvious patch made barrier work items not participate in nr_active via WORK_STRUCT_INACTIVE rather than reliance on the (old) WORK_NO_COLOR. [1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/ Reported-by: Li Zhe <lizhe.67@bytedance.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zeng Heng <zengheng4@huawei.com> | 2 年前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 1 个月前 | ||
| 27 天前 | ||
| 4 年前 | ||
| 1 年前 | ||
| 5 个月前 | ||
| 1 年前 | ||
| 2 个月前 | ||
| 1 个月前 | ||
| 2 年前 | ||
| 10 个月前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 29 天前 | ||
| 4 个月前 | ||
| 1 年前 | ||
| 4 个月前 | ||
| 24 天前 | ||
| 2 个月前 | ||
| 3 天前 | ||
| 4 年前 | ||
| 6 年前 | ||
| 6 年前 | ||
| 6 年前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 3 年前 | ||
| 4 年前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 5 年前 | ||
| 7 年前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 6 年前 | ||
| 5 年前 | ||
| 5 个月前 | ||
| 4 年前 | ||
| 5 个月前 | ||
| 6 年前 | ||
| 4 年前 | ||
| 6 年前 | ||
| 7 年前 | ||
| 7 年前 | ||
| 1 个月前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 5 年前 | ||
| 5 年前 | ||
| 5 年前 | ||
| 5 年前 | ||
| 6 年前 | ||
| 4 年前 | ||
| 3 年前 | ||
| 3 年前 | ||
| 5 年前 | ||
| 5 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 6 年前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 4 年前 | ||
| 6 年前 | ||
| 6 年前 | ||
| 1 个月前 | ||
| 5 年前 | ||
| 5 年前 | ||
| 5 年前 | ||
| 4 年前 | ||
| 6 个月前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 6 个月前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 1 个月前 | ||
| 5 年前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 2 年前 | ||
| 4 个月前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 4 年前 | ||
| 8 年前 | ||
| 1 年前 | ||
| 4 年前 | ||
| 4 年前 | ||
| 2 年前 | ||
| 1 年前 | ||
| 3 年前 | ||
| 1 年前 | ||
| 3 年前 | ||
| 6 年前 | ||
| 3 个月前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 6 年前 | ||
| 2 年前 | ||
| 4 年前 | ||
| 4 年前 | ||
| 1 年前 | ||
| 8 年前 | ||
| 8 年前 | ||
| 5 年前 | ||
| 4 年前 | ||
| 6 年前 | ||
| 5 年前 | ||
| 4 年前 | ||
| 4 年前 | ||
| 6 年前 | ||
| 6 年前 | ||
| 4 个月前 | ||
| 4 个月前 | ||
| 4 个月前 | ||
| 2 年前 | ||
| 2 年前 |