| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
sched: smart_grid: Prevent double-free in sched_grid_qos_free hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB3N2A ---------------------------------------- KASAN detected a double-free bug in the smart grid. This issue arises from the uninitialized use of p->grid_qos within the task {fork,free} processes. The sequence of events leading to the double-free is as follows: CPU0 CPU1 fork (in some error process) goto bad_fork_free call_rcu(__delayed_free_task) __delayed_free_task sched_grid_qos_free realse_task delayed_put_task_struct __put_task_struct sched_grid_qos_free(double free) When copy_process returns with an error, grid_qos is double-freed. To address this, grid_qos is initialized to NULL in dup_task_struct, and a NULL check is added for p->grid_qos before freeing. Bug report details: ================================================================== BUG: KASAN: double-free or invalid-free in sched_grid_qos_free+0x3c/0x90 CPU: 343 PID: 0 Comm: swapper/343 Kdump: loaded Tainted: G B E Call trace: dump_backtrace+0x0/0x3e0 show_stack+0x1c/0x28 dump_stack+0x13c/0x190 print_address_description.constprop.0+0x28/0x1f0 kasan_report_invalid_free+0x44/0x6c __kasan_slab_free+0x158/0x180 kasan_slab_free+0x10/0x20 kfree+0xe0/0x6e0 sched_grid_qos_free+0x3c/0x90 free_task+0xc4/0x164 __put_task_struct+0x264/0x31c delayed_put_task_struct+0x94/0x180 rcu_do_batch+0x2ec/0x9f0 rcu_core+0x34c/0x530 rcu_core_si+0x14/0x30 __do_softirq+0x284/0x900 irq_exit+0x2d4/0x35c __handle_domain_irq+0x108/0x1f0 gic_handle_irq+0x74/0x620 el1_irq+0xbc/0x140 arch_cpu_idle+0x14/0x3c default_idle_call+0x80/0x320 cpuidle_idle_call+0x244/0x2b0 do_idle+0x138/0x260 cpu_startup_entry+0x2c/0x70 secondary_start_kernel+0x35c/0x4e4 Allocated by task 44027: kasan_save_stack+0x24/0x50 __kasan_kmalloc.constprop.0+0xa0/0xcc kasan_kmalloc+0xc/0x14 kmem_cache_alloc_trace+0xdc/0x5d0 sched_grid_qos_fork+0x50/0x20c copy_process+0x8fc/0x3f60 kernel_clone+0x12c/0x660 __se_sys_clone+0xc0/0x110 __arm64_sys_clone+0xa8/0x110 invoke_syscall+0x70/0x274 el0_svc_common.constprop.0+0x1fc/0x2dc do_el0_svc+0xe8/0x140 el0_svc+0x1c/0x2c el0_sync_handler+0xb0/0xb4 el0_sync+0x168/0x180 Freed by task 1748: kasan_save_stack+0x24/0x50 kasan_set_track+0x24/0x34 kasan_set_free_info+0x24/0x4c __kasan_slab_free+0xf8/0x180 kasan_slab_free+0x10/0x20 kfree+0xe0/0x6e0 sched_grid_qos_free+0x3c/0x90 free_task+0xc4/0x164 __delayed_free_task+0x18/0x3c rcu_do_batch+0x2ec/0x9f0 rcu_core+0x34c/0x530 rcu_core_si+0x14/0x30 __do_softirq+0x284/0x900 Fixes: 700bfc4068cf ("sched: smart grid: init sched_grid_qos structure on QOS purpose") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> | 1 年前 | |
sched: topology: Build soft domain for LLC hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ICB7K1 -------------------------------- On Kunpeng server, each LLC domain contains multiple clusters. When multiple services are deployed within the same LLC domain, their tasks become distributed across all clusters. This results in: 1. High cache synchronization overheadbetween different tasks of the same service. 2. Severe cache contention among tasks from different services. The Soft Domain architecture partitions resources by clusters. Under low-load conditions, each service operates exclusively within its dedicated domain to prevent cross-service interference, thereby enhancing both CPU isolation and improving cache locality. Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> | 11 个月前 | |
sched: disable sched_autogroup by default hulk inclusion category: performance bugzilla: 32059, https://gitee.com/openeuler/kernel/issues/I65DOZ CVE: NA -------------------------------- This option optimizes the scheduler for common desktop workloads by automatically creating and populating task groups. This separation of workloads isolates aggressive CPU burners (like build jobs) from desktop applications. Task group autogeneration is currently based upon task session. We do not need this for mostly server workloads, so just disable by default. If you need this feature really, just enable it by sysctl: sysctl -w kernel.sched_autogroup_enabled=1 Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
sched/headers: Simplify and clean up header usage in the scheduler Do the following cleanups and simplifications: - sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again. - order the <linux/sched/*.h> headers alphabetically - add all <linux/sched/*.h> headers to kernel/sched/sched.h - remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h. Finally, make all scheduler .c files use a single common header: #include "sched.h" ... which now contains a union of the relied upon headers. This makes the various .c files easier to read and easier to handle. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> | 8 年前 | |
sched/ebpf: Add helper to set prefer cpumask for the task hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICFKYV -------------------------------- Add libbpf_sched_set_task_prefer_cpumask helper, which allows setting preferred cpumask for the task. Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> | 1 年前 | |
sched: programmable: Fix build error for nr_cpus_ids hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645C7 CVE: NA -------------------------------- When !CONFIG_SMP && CONFIG_BPF_SCHED, it will build error, as follows: ./include/linux/cpumask.h:37:33: error: expected identifier or ‘(’ before numeric constant 37 | #define nr_cpu_ids 1U | ^~ ./include/linux/bpf_topology.h:39:22: note: in expansion of macro ‘nr_cpu_ids’ 39 | unsigned int nr_cpu_ids; | ^~~~~~~~~~ kernel/sched/bpf_topology.c: In function ‘____bpf_get_cpumask_info’: ./include/linux/cpumask.h:37:33: error: expected identifier before numeric constant 37 | #define nr_cpu_ids 1U | ^~ kernel/sched/bpf_topology.c:75:15: note: in expansion of macro ‘nr_cpu_ids’ 75 | cpus->nr_cpu_ids = nr_cpu_ids; Fixes: f333bd6882e7 ("sched: programmable: Add helper function for...") Signed-off-by: Hui Tang <tanghui20@huawei.com> | 3 年前 | |
sched/clock: Use static_branch_likely() with sched_clock_running sched_clock_running is enabled early at bootup stage and never disabled. So hint that to the compiler by using static_branch_likely() rather than static_branch_unlikely(). The branch probability mis-annotation was introduced in the original commit that converted the plain sched_clock_running flag to a static key: 46457ea464f5 ("sched/clock: Use static key for sched_clock_running") Steve further notes: | Looks like the confusion was the moving of the "!": | | - if (unlikely(!sched_clock_running)) | + if (!static_branch_unlikely(&sched_clock_running)) | | Where, it was unlikely that !sched_clock_running would be true, but | because the "!" was moved outside the "unlikely()" it makes the test | "likely()". That is, if we added an intermediate step, it would have | been: | | if (!likely(sched_clock_running)) | | which would have prevented the mistake that this patch fixes. [ mingo: Edited the changelog. ] Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: juri.lelli@redhat.com Cc: mgorman@suse.de Cc: vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/1574843848-26825-1-git-send-email-zhenzhong.duan@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> | 6 年前 | |
completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() The warning was intended to spot complete_all() users from hardirq context on PREEMPT_RT. The warning as-is will also trigger in interrupt handlers, which are threaded on PREEMPT_RT, which was not intended. Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive context on PREEMPT_RT. Fixes: a5c6234e1028 ("completion: Use simple wait queues") Reported-by: kernel test robot <rong.a.chen@intel.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200323152019.4qjwluldohuh3by5@linutronix.de | 6 年前 | |
cpufreq: CPPC: Add support for frequency invariance mainline inclusion from mainline-v5.14-rc1 commit 1eb5dde674f57b1a1918dab33f09e35cdd64eb07 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8319 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1eb5dde674f57b1a1918dab33f09e35cdd64eb07 ---------------------------------------------------------------------- The Frequency Invariance Engine (FIE) is providing a frequency scaling correction factor that helps achieve more accurate load-tracking. Normally, this scaling factor can be obtained directly with the help of the cpufreq drivers as they know the exact frequency the hardware is running at. But that isn't the case for CPPC cpufreq driver. Another way of obtaining that is using the arch specific counter support, which is already present in kernel, but that hardware is optional for platforms. This patch updates the CPPC driver to register itself with the topology core to provide its own implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which gets called by the scheduler on every tick. Note that the arch specific counters have higher priority than CPPC counters, if available, though the CPPC driver doesn't need to have any special handling for that. On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we reach here from hard-irq context), which then schedules a normal work item and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable based on the counter updates since the last tick. To allow platforms to disable this CPPC counter-based frequency invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE, which is enabled by default. This also exports sched_setattr_nocheck() as the CPPC driver can be built as a module. Cc: linux-acpi@vger.kernel.org Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: huwentao <huwentao19@h-partners.com> | 4 个月前 | |
sched/core: Fix the bug that task won't enqueue into core tree when update cookie mainline inclusion from mainline-v6.0-rc1 commit 91caa5ae242465c3ab9fd473e50170faa7e944f4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=91caa5ae242465c3ab9fd473e50170faa7e944f4 -------------------------------------------------------------------------- In function sched_core_update_cookie(), a task will enqueue into the core tree only when it enqueued before, that is, if an uncookied task is cookied, it will not enqueue into the core tree until it enqueue again, which will result in unnecessary force idle. Here follows the scenario: CPU x and CPU y are a pair of SMT siblings. 1. Start task a running on CPU x without sleeping, and task b and task c running on CPU y without sleeping. 2. We create a cookie and share it to task a and task b, and then we create another cookie and share it to task c. 3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched And we will find out that core_forceidle_sum of task a takes 30% time of the sampling period, which shouldn't happen as task a and b have the same cookie. Then we migrate task a to CPU x', migrate task b and c to CPU y', where CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we will found out that core_forceidle_sum of task a and b are almost zero. To solve this problem, we enqueue the task into the core tree if it's on rq. Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management") Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com Conflicts: kernel/sched/core_sched.c [Feature 4feee7d12603d("sched/core: Forced idle accounting") is not applied.] Signed-off-by: Lin Shengwang <linshengwang1@huawei.com> Reviewed-by: lihua <hucool.lihua@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
mm: add config isolation for psi under cgroup v1 hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8BCV4 ------------------------------- Add CONFIG_PSI_CGROUP_V1 to separate feature of psi under cgroup v1 from baseline. Signed-off-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: Lu Jialin <lujialin4@huawei.com> | 2 年前 | |
sched/deadline: Implement fallback mechanism for !fit case When a task has a runtime that cannot be served within the scheduling deadline by any of the idle CPU (later_mask) the task is doomed to miss its deadline. This can happen since the SCHED_DEADLINE admission control guarantees only bounded tardiness and not the hard respect of all deadlines. In this case try to select the idle CPU with the largest CPU capacity to minimize tardiness. Favor task_cpu(p) if it has max capacity of !fitting CPUs so that find_later_rq() can potentially still return it (most likely cache-hot) early. Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200520134243.19352-6-dietmar.eggemann@arm.com | 5 年前 | |
sched/headers: Simplify and clean up header usage in the scheduler Do the following cleanups and simplifications: - sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again. - order the <linux/sched/*.h> headers alphabetically - add all <linux/sched/*.h> headers to kernel/sched/sched.h - remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h. Finally, make all scheduler .c files use a single common header: #include "sched.h" ... which now contains a union of the relied upon headers. This makes the various .c files easier to read and easier to handle. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> | 8 年前 | |
cpufreq: Avoid leaving stale IRQ work items during CPU offline The scheduler code calling cpufreq_update_util() may run during CPU offline on the target CPU after the IRQ work lists have been flushed for it, so the target CPU should be prevented from running code that may queue up an IRQ work item on it at that point. Unfortunately, that may not be the case if dvfs_possible_from_any_cpu is set for at least one cpufreq policy in the system, because that allows the CPU going offline to run the utilization update callback of the cpufreq governor on behalf of another (online) CPU in some cases. If that happens, the cpufreq governor callback may queue up an IRQ work on the CPU running it, which is going offline, and the IRQ work may not be flushed after that point. Moreover, that IRQ work cannot be flushed until the "offlining" CPU goes back online, so if any other CPU calls irq_work_sync() to wait for the completion of that IRQ work, it will have to wait until the "offlining" CPU is back online and that may not happen forever. In particular, a system-wide deadlock may occur during CPU online as a result of that. The failing scenario is as follows. CPU0 is the boot CPU, so it creates a cpufreq policy and becomes the "leader" of it (policy->cpu). It cannot go offline, because it is the boot CPU. Next, other CPUs join the cpufreq policy as they go online and they leave it when they go offline. The last CPU to go offline, say CPU3, may queue up an IRQ work while running the governor callback on behalf of CPU0 after leaving the cpufreq policy because of the dvfs_possible_from_any_cpu effect described above. Then, CPU0 is the only online CPU in the system and the stale IRQ work is still queued on CPU3. When, say, CPU1 goes back online, it will run irq_work_sync() to wait for that IRQ work to complete and so it will wait for CPU3 to go back online (which may never happen even in principle), but (worse yet) CPU0 is waiting for CPU1 at that point too and a system-wide deadlock occurs. To address this problem notice that CPUs which cannot run cpufreq utilization update code for themselves (for example, because they have left the cpufreq policies that they belonged to), should also be prevented from running that code on behalf of the other CPUs that belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so in that case the cpufreq_update_util_data pointer of the CPU running the code must not be NULL as well as for the CPU which is the target of the cpufreq utilization update in progress. Accordingly, change cpufreq_this_cpu_can_update() into a regular function in kernel/sched/cpufreq.c (instead of a static inline in a header file) and make it check the cpufreq_update_util_data pointer of the local CPU if dvfs_possible_from_any_cpu is set for the target cpufreq policy. Also update the schedutil governor to do the cpufreq_this_cpu_can_update() check in the non-fast-switch case too to avoid the stale IRQ work issues. Fixes: 99d14d0e16fa ("cpufreq: Process remote callbacks from any CPU if the platform permits") Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/ Reported-by: Anson Huang <anson.huang@nxp.com> Tested-by: Anson Huang <anson.huang@nxp.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> | 6 年前 | |
cpufreq/schedutil: Use a fixed reference frequency mainline inclusion from mainline-v6.8-rc6 commit b3edde44e5d4504c23a176819865cd603fd16d6c category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8319 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3edde44e5d4504c23a176819865cd603fd16d6c ---------------------------------------------------------------------- cpuinfo.max_freq can change at runtime because of boost as an example. This implies that the value could be different than the one that has been used when computing the capacity of a CPU. The new arch_scale_freq_ref() returns a fixed and coherent reference frequency that can be used when computing a frequency based on utilization. Use this arch_scale_freq_ref() when available and fallback to policy otherwise. Fixes: 1eb5dde674f5 ("cpufreq: CPPC: Add support for frequency invariance") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/20231211104855.558096-4-vincent.guittot@linaro.org Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Signed-off-by: Hongye Lin <linhongye@h-partners.com> Signed-off-by: huwentao <huwentao19@h-partners.com> | 4 个月前 | |
sched/rt: cpupri_find: Trigger a full search as fallback If we failed to find a fitting CPU, in cpupri_find(), we only fallback to the level we found a hit at. But Steve suggested to fallback to a second full scan instead as this could be a better effort. https://lore.kernel.org/lkml/20200304135404.146c56eb@gandalf.local.home/ We trigger the 2nd search unconditionally since the argument about triggering a full search is that the recorded fall back level might have become empty by then. Which means storing any data about what happened would be meaningless and stale. I had a humble try at timing it and it seemed okay for the small 6 CPUs system I was running on https://lore.kernel.org/lkml/20200305124324.42x6ehjxbnjkklnh@e107158-lin.cambridge.arm.com/ On large system this second full scan could be expensive. But there are no users outside capacity awareness for this fitness function at the moment. Heterogeneous systems tend to be small with 8cores in total. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20200310142219.syxzn5ljpdxqtbgx@e107158-lin.cambridge.arm.com | 6 年前 | |
sched/rt: Optimize cpupri_find() on non-heterogenous systems By introducing a new cpupri_find_fitness() function that takes the fitness_fn as an argument and only called when asym_system static key is enabled. cpupri_find() is now a wrapper function that calls cpupri_find_fitness() passing NULL as a fitness_fn, hence disabling the logic that handles fitness by default. LINK: https://lore.kernel.org/lkml/c0772fca-0a4b-c88d-fdf2-5715fcf8447b@arm.com/ Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware") Link: https://lkml.kernel.org/r/20200302132721.8353-4-qais.yousef@arm.com | 6 年前 | |
sched/cputime: Fix time overflow in cputime_adjust() hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8331 ------------------ When a process has hundreds of threads and runs for an extended period, the aggregated stime or utime of its thread group can become extremely large. Accessing /proc/xx/stat then triggers a divide error like the following: divide error: 0000 [#1] SMP NOPTI CPU: 273 PID: 4619 Comm: ... Tainted: 5.10.0 x86_64 RIP: 0010:cputime_adjust+0x55/0xb0 RSP: 0018:ffffae408e07bbc8 EFLAGS: 00010807 ... Call Trace: thread_group_cputime_adjusted+0x4b/0x70 do_task_stat+0x2d8/0xdc0 This is due to overflow in stime + utime, which causes mul_u64_u64_div_u64() to trigger a divide error. Add overflow detection logic and, upon overflow, right-shift the values before computation. This fixes the issue while preserving the original calculation semantics. Fixes: 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()") Signed-off-by: Xia Fukun <xiafukun@huawei.com> | 26 天前 | |
sched/deadline: Fix warning in migrate_enable for boosted tasks stable inclusion from stable-v6.6.66 commit b600d30402854415aa57548a6b53dc6478f65517 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAOS CVE: CVE-2024-56583 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b600d30402854415aa57548a6b53dc6478f65517 -------------------------------- [ Upstream commit 0664e2c311b9fa43b33e3e81429cd0c2d7f9c638 ] When running the following command: while true; do stress-ng --cyclic 30 --timeout 30s --minimize --quiet done a warning is eventually triggered: WARNING: CPU: 43 PID: 2848 at kernel/sched/deadline.c:794 setup_new_dl_entity+0x13e/0x180 ... Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? enqueue_dl_entity+0x631/0x6e0 ? setup_new_dl_entity+0x13e/0x180 ? __warn+0x7e/0xd0 ? report_bug+0x11a/0x1a0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 enqueue_dl_entity+0x631/0x6e0 enqueue_task_dl+0x7d/0x120 __do_set_cpus_allowed+0xe3/0x280 __set_cpus_allowed_ptr_locked+0x140/0x1d0 __set_cpus_allowed_ptr+0x54/0xa0 migrate_enable+0x7e/0x150 rt_spin_unlock+0x1c/0x90 group_send_sig_info+0xf7/0x1a0 ? kill_pid_info+0x1f/0x1d0 kill_pid_info+0x78/0x1d0 kill_proc_info+0x5b/0x110 __x64_sys_kill+0x93/0xc0 do_syscall_64+0x5c/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7f0dab31f92b This warning occurs because set_cpus_allowed dequeues and enqueues tasks with the ENQUEUE_RESTORE flag set. If the task is boosted, the warning is triggered. A boosted task already had its parameters set by rt_mutex_setprio, and a new call to setup_new_dl_entity is unnecessary, hence the WARN_ON call. Check if we are requeueing a boosted task and avoid calling setup_new_dl_entity if that's the case. Fixes: 295d6d5e3736 ("sched/deadline: Fix switching to -deadline") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20240724142253.27145-2-wander@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> | 1 年前 | |
sched: Support NUMA parallel scheduling for multiple processes hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICBBNL -------------------------------- For architectures with multiple NUMA node levels and large distances between nodes, a better approach is to support processes running in parallel on each NUMA node. The usage is restricted to the following scenarios: 1. No CPU binding for user-space processes; 2. It is applicable to distributed applications, such as business architectures with one master and multiple slaves running in parallel; 3. The existing "qos dynamic affinity" and "qos smart grid" features must not be used simultaneously. Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> | 1 年前 | |
sched/fair: remove qos_reweight logic under non-SMP scenario kylin inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID21SZ -------------------------------- In multi-level QoS scheduling, qos_reweight() was introduced to map tasks of different qos levels to weights. However, in update_cfs_group(), qos_reweight() is also called in non-SMP scenarios. Since CONFIG_QOS_SCHED_MULTILEVEL depends on CONFIG_QOS_SCHED, and CONFIG_QOS_SCHED itself depends on CONFIG_SMP, this logic is unnecessary in non-SMP scenario. Remove it. Fixes: c51ad9198a2f ("sched/fair: Introduce multiple qos level") Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> | 8 个月前 | |
sched: fair: Select idle cpu in soft domain hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/release-management/issues/ICB7K1 -------------------------------- Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> | 11 个月前 | |
sched: idle: Make skipping governor callbacks more consistent stable inclusion from stable-v5.10.253 commit 3e84116d45a2f35c6de5aef09e8f305d258fef08 category: bugfix bugzilla: https://atomgit.com/src-openeuler/kernel/issues/15208 CVE: CVE-2026-45968 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3e84116d45a2f35c6de5aef09e8f305d258fef08 -------------------------------- [ Upstream commit d557640e4ce589a24dca5ca7ce3b9680f471325f ] If the cpuidle governor .select() callback is skipped because there is only one idle state in the cpuidle driver, the .reflect() callback should be skipped as well, at least for consistency (if not for correctness), so do it. Fixes: e5c9ffc6ae1b ("cpuidle: Skip governor when only one idle state is available") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/12857700.O9o76ZdvQC@rafael.j.wysocki Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> | 30 天前 | |
isolation: Check whether there exists a housekeeping CPU online hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9NR7Q CVE: NA -------------------------------- We need to make sure that the housekeeper, one of the housekeeping CPUs, is online, as describe in the below Link. We add this check after secondary CPUs online is finished. Link: https://lore.kernel.org/all/20190504002733.GB19076@lenoir/ Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> | 2 年前 | |
sched: nohz: stop passing around unused "ticks" parameter. The "ticks" parameter was added in commit 0f004f5a696a ("sched: Cure more NO_HZ load average woes") since calc_global_nohz() was called and needed the "ticks" argument. But in commit c308b56b5398 ("sched: Fix nohz load accounting -- again!") it became unused as the function calc_global_nohz() dropped using "ticks". Fixes: c308b56b5398 ("sched: Fix nohz load accounting -- again!") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593628458-32290-1-git-send-email-paul.gortmaker@windriver.com | 5 年前 | |
sched/membarrier: reduce the ability to hammer on sys_membarrier stable inclusion from stable-v5.10.210 commit db896bbe4a9c67cee377e5f6a743350d3ae4acf6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IAE52H Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=db896bbe4a9c67cee377e5f6a743350d3ae4acf6 -------------------------------- commit 944d5fe50f3f03daacfea16300e656a1691c4a23 upstream. On some systems, sys_membarrier can be very expensive, causing overall slowdowns for everything. So put a lock on the path in order to serialize the accesses to prevent the ability for this to be called at too high of a frequency and saturate the machine. Reviewed-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Borislav Petkov <bp@alien8.de> Fixes: 22e4ebb97582 ("membarrier: Provide expedited private command") Fixes: c5f58bd58f43 ("membarrier: Provide GLOBAL_EXPEDITED command") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ converted to explicit mutex_*() calls - cleanup.h is not in this stable branch - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: sanglipeng1 <sanglipeng1@jd.com> | 1 年前 | |
sched: Introduce CONFIG_QOS_SCHED_NUMA_ICON hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GZAQ CVE: NA -------------------------------- Introduce NUMA isolation and consolidation. If enabled, scheduler will identify relationship between tasks, and track NUMA resource usage. With 'numa_icon=enable/disable' to control the feature. Signed-off-by: Hui Tang <tanghui20@huawei.com> | 2 年前 | |
sched: Introduce CONFIG_QOS_SCHED_NUMA_ICON hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GZAQ CVE: NA -------------------------------- Introduce NUMA isolation and consolidation. If enabled, scheduler will identify relationship between tasks, and track NUMA resource usage. With 'numa_icon=enable/disable' to control the feature. Signed-off-by: Hui Tang <tanghui20@huawei.com> | 2 年前 | |
sched: Add a tracepoint to track rq->nr_running Add a bare tracepoint trace_sched_update_nr_running_tp which tracks ->nr_running CPU's rq. This is used to accurately trace this data and provide a visualization of scheduler imbalances in, for example, the form of a heat map. The tracepoint is accessed by loading an external kernel module. An example module (forked from Qais' module and including the pelt related tracepoints) can be found at: https://github.com/auldp/tracepoints-helpers.git A script to turn the trace-cmd report output into a heatmap plot can be found at: https://github.com/jirvoz/plot-nr-running The tracepoints are added to add_nr_running() and sub_nr_running() which are in kernel/sched/sched.h. In order to avoid CREATE_TRACE_POINTS in the header a wrapper call is used and the trace/events/sched.h include is moved before sched.h in kernel/sched/core. Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200629192303.GC120228@lorien.usersys.redhat.com | 5 年前 | |
sched: Wrap rq::lock access mainline inclusion from mainline-v5.14-rc1 commit 5cb9eaa3d274f75539077a28cf01e3563195fa53 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5cb9eaa3d274f75539077a28cf01e3563195fa53 -------------------------------------------------------------------------- In preparation of playing games with rq->lock, abstract the thing using an accessor. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.136465446@infradead.org Conflicts: kernel/sched/core.c [Bugfix a7c81556ec4d3("sched: Fix migrate_disable() vs rt/dl balancing") is not applied. Bugfix 565790d28b1e3("sched: Fix balance_callback()") is not applied. Bugfix ae7927023243d("sched: Optimize finish_lock_switch()") is not applied. Bugfix 36c6e17bf1692("sched/core: Print out straggler tasks in sched_cpu_dying()") is not applied. Feature 2558aacff8586("sched/hotplug: Ensure only per-cpu kthreads run during hotplug") is not applied. Feature f2469a1fb43f8("sched/core: Wait for tasks being pushed away on hotplug") is not applied.] kernel/sched/deadline.c [Bugfix a7c81556ec4d3("sched: Fix migrate_disable() vs rt/dl balancing") is not applied.] kernel/sched/fair.c [Feature acf66d7048e08("sched/fair: Provide can_migrate_task_llc") Feature 0826530de3cbd("sched/fair: Remove update of blocked load from newidle_balance") s not applied. Feature 6864cf0161bad("sched/fair: Steal work from an overloaded CPU when CPU goes idle")] kernel/sched/rt.c [Bugfix a7c81556ec4d3("sched: Fix migrate_disable() vs rt/dl balancing") is not applied.] kernel/sched/sched.h [[Bugfix a7c81556ec4d3("sched: Fix migrate_disable() vs rt/dl balancing") is not applied.] Signed-off-by: Lin Shengwang <linshengwang1@huawei.com> Reviewed-by: lihua <hucool.lihua@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
sched: psi: fix bogus pressure spikes from aggregation race stable inclusion from stable-v6.6.55 commit 1f997b1d13e0b9819468e577622e747b546516fe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB6YDK Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1f997b1d13e0b9819468e577622e747b546516fe -------------------------------- [ Upstream commit 3840cbe24cf060ea05a585ca497814609f5d47d1 ] Brandon reports sporadic, non-sensical spikes in cumulative pressure time (total=) when reading cpu.pressure at a high rate. This is due to a race condition between reader aggregation and tasks changing states. While it affects all states and all resources captured by PSI, in practice it most likely triggers with CPU pressure, since scheduling events are so frequent compared to other resource events. The race context is the live snooping of ongoing stalls during a pressure read. The read aggregates per-cpu records for stalls that have concluded, but will also incorporate ad-hoc the duration of any active state that hasn't been recorded yet. This is important to get timely measurements of ongoing stalls. Those ad-hoc samples are calculated on-the-fly up to the current time on that CPU; since the stall hasn't concluded, it's expected that this is the minimum amount of stall time that will enter the per-cpu records once it does. The problem is that the path that concludes the state uses a CPU clock read that is not synchronized against aggregators; the clock is read outside of the seqlock protection. This allows aggregators to race and snoop a stall with a longer duration than will actually be recorded. With the recorded stall time being less than the last snapshot remembered by the aggregator, a subsequent sample will underflow and observe a bogus delta value, resulting in an erratic jump in pressure. Fix this by moving the clock read of the state change into the seqlock protection. This ensures no aggregation can snoop live stalls past the time that's recorded when the state concludes. Reported-by: Brandon Duffany <brandon@buildbuddy.io> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219194 Link: https://lore.kernel.org/lkml/20240827121851.GB438928@cmpxchg.org/ Fixes: d67515514ca5 ("psi: Reduce calls to sched_clock() in psi") Cc: stable@vger.kernel.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Conflicts: kernel/sched/psi.c [ Context conflicts and psi_cgroup_restart(re-enable) is not merged. Move the update_psi_stat_delta into the psi_group_change functions ] Signed-off-by: Chen Ridong <chenridong@huawei.com> | 1 年前 | |
sched: fix a deadlock in task_net_group() hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IATU6E CVE: NA -------------------------------- If req->tx_pid == req->tx_pid when sched_net_relationship_submit() called, which cause rship->net_lock AA deadlock in task_net_group(). Fixes: 2ac826b258e9 ("sched: Introduce task relationship by net and memory") Signed-off-by: Hui Tang <tanghui20@huawei.com> | 1 年前 | |
sched: Add ioctl to get relationship hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9GZAQ CVE: NA -------------------------------- Introduce ioctl interfaces to get relationship for task, which facilitating the acquisition and use of relationship in user mode. Signed-off-by: Hui Tang <tanghui20@huawei.com> | 2 年前 | |
sched/rt: Skip currently executing CPU in rto_next_cpu() mainline inclusion from mainline-v7.0-rc1 commit 94894c9c477e53bcea052e075c53f89df3d2a33e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8689 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=94894c9c477e53bcea052e075c53f89df3d2a33e -------------------------------- CPU0 becomes overloaded when hosting a CPU-bound RT task, a non-CPU-bound RT task, and a CFS task stuck in kernel space. When other CPUs switch from RT to non-RT tasks, RT load balancing (LB) is triggered; with HAVE_RT_PUSH_IPI enabled, they send IPIs to CPU0 to drive the execution of rto_push_irq_work_func. During push_rt_task on CPU0, if next_task->prio < rq->donor->prio, resched_curr() sets NEED_RESCHED and after the push operation completes, CPU0 calls rto_next_cpu(). Since only CPU0 is overloaded in this scenario, rto_next_cpu() should ideally return -1 (no further IPI needed). However, multiple CPUs invoking tell_cpu_to_push() during LB increments rd->rto_loop_next. Even when rd->rto_cpu is set to -1, the mismatch between rd->rto_loop and rd->rto_loop_next forces rto_next_cpu() to restart its search from -1. With CPU0 remaining overloaded (satisfying rt_nr_migratory && rt_nr_total > 1), it gets reselected, causing CPU0 to queue irq_work to itself and send self-IPIs repeatedly. As long as CPU0 stays overloaded and other CPUs run pull_rt_tasks(), it falls into an infinite self-IPI loop, which triggers a CPU hardlockup due to continuous self-interrupts. The trigging scenario is as follows: cpu0 cpu1 cpu2 pull_rt_task tell_cpu_to_push <------------irq_work_queue_on rto_push_irq_work_func push_rt_task resched_curr(rq) pull_rt_task rto_next_cpu tell_cpu_to_push <-------------------------- atomic_inc(rto_loop_next) rd->rto_loop != next rto_next_cpu irq_work_queue_on rto_push_irq_work_func Fix redundant self-IPI by filtering the initiating CPU in rto_next_cpu(). This solution has been verified to effectively eliminate spurious self-IPIs and prevent CPU hardlockup scenarios. Fixes: 4bdced5c9a29 ("sched/rt: Simplify the IPI based RT balancing logic") Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Chen Jinghuang <chenjinghuang2@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://patch.msgid.link/20260122012533.673768-1-chenjinghuang2@huawei.com Signed-off-by: Chen Jinghuang <chenjinghuang2@huawei.com> | 3 个月前 | |
sched/fair: Fix "runnable_avg_yN_inv" not used warnings runnable_avg_yN_inv[] is only used in kernel/sched/pelt.c but was included in several other places because they need other macros all came from kernel/sched/sched-pelt.h which was generated by Documentation/scheduler/sched-pelt. As the result, it causes compilation a lot of warnings, kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=] kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=] kernel/sched/sched-pelt.h:4:18: warning: 'runnable_avg_yN_inv' defined but not used [-Wunused-const-variable=] ... Silence it by appending the __maybe_unused attribute for it, so all generated variables and macros can still be kept in the same file. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1559596304-31581-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org> | 6 年前 | |
sched: Fix might sleep in atomic section issue hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICB7K1 -------------------------------- destroy_auto_affinity() is called from atomic section, and smart_grid_usage_dec() might sleep will cause a atomic sleep issue. Fixes: abd2d73ab235 ("sched: introduce smart grid qos zone") Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> | 11 个月前 | |
sched/headers: Split out open-coded prototypes into kernel/sched/smp.h Move the prototypes for sched_ttwu_pending() and send_call_function_single_ipi() into the newly created kernel/sched/smp.h header, to make sure they are all the same, and to architectures happy that use -Wmissing-prototypes. Signed-off-by: Ingo Molnar <mingo@kernel.org> | 5 年前 | |
sched: Fix incorrect cluster mask hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ID89XG -------------------------------- The cpu_clustergroup_mask() does not represent the actual CPU topology. Fixes: a042432f1f90 ("sched: topology: Build soft domain for LLC") Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> | 6 个月前 | |
sched: Provide sparsemask, a reduced contention bitmap hulk inclusion category: feature bugzilla: 38261, https://gitee.com/openeuler/kernel/issues/I49XPZ CVE: NA --------------------------- Provide struct sparsemask and functions to manipulate it. A sparsemask is a sparse bitmap. It reduces cache contention vs the usual bitmap when many threads concurrently set, clear, and visit elements, by reducing the number of significant bits per cacheline. For each cacheline chunk of the mask, only the first K bits of the first word are used, and the remaining bits are ignored, where K is a creation time parameter. Thus a sparsemask that can represent a set of N elements is approximately (N/K * CACHELINE) bytes in size. This type is simpler and more efficient than the struct sbitmap used by block drivers. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Chen Hui <judy.chenhui@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
sched/fair: introduce SCHED_STEAL hulk inclusion category: feature bugzilla: 38261, https://gitee.com/openeuler/kernel/issues/I49XPZ CVE: NA --------------------------- Introduce CONFIG_SCHED_STEAL to limit the impact of steal task. 1). If turn off CONFIG_SCHED_STEAL, then all the changes will not exist, for we use some empty functions, so this depends on compiler optimization. 2). enable CONFIG_SCHED_STEAL, but disable STEAL and schedstats, it will introduce some impact whith schedstat check. but this has little effect on performance. This will be our default choice. Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Chen Hui <judy.chenhui@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 4 年前 | |
add cpu fine grained stall tracking in pressure.stat hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8BCV4 ------------------------------- Introduce cpu fine grained stall tracking(cpu cfs bandwidth or cpu qos) in pressure.stat. For cpu fine grained stall tracking, only "full" information in pressure.stat. for example: /test # cat /tmp/cpuacct/test/pressure.stat cgroup_memory_reclaim some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 global_memory_reclaim some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 compact some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 cgroup_async_memory_reclaim some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 swap some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 cpu_cfs_bandwidth full avg10=21.76 avg60=4.58 avg300=0.98 total=3893827 cpu_qos full avg10=0.00 avg60=0.00 avg300=0.00 total=0 Signed-off-by: Lu Jialin <lujialin4@huawei.com> | 2 年前 | |
sched: Introduce sched_class::pick_task() mainline inclusion from mainline-v5.14-rc1 commit 21f56ffe4482e501b9e83737612493eeaac21f5a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21f56ffe4482e501b9e83737612493eeaac21f5a -------------------------------------------------------------------------- Because sched_class::pick_next_task() also implies sched_class::set_next_task() (and possibly put_prev_task() and newidle_balance) it is not state invariant. This makes it unsuitable for remote task selection. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Vineeth: folded fixes] Signed-off-by: Vineeth Remanan Pillai <viremana@linux.microsoft.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.437092775@infradead.org Signed-off-by: Lin Shengwang <linshengwang1@huawei.com> Reviewed-by: lihua <hucool.lihua@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> | 3 年前 | |
sched/swait: Prepare usage in completions As a preparation to use simple wait queues for completions: - Provide swake_up_all_locked() to support complete_all() - Make __prepare_to_swait() public available This is done to enable the usage of complete() within truly atomic contexts on a PREEMPT_RT enabled kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.228481202@linutronix.de | 6 年前 | |
sched/topology,schedutil: Wrap sched domains rebuild mainline inclusion from mainline-v5.11-rc1 commit 31f6a8c0a471be7d7d05c93eac50fcb729e79b9d category: cleanup bugzilla: https://atomgit.com/openeuler/kernel/issues/8319 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=31f6a8c0a471be7d7d05c93eac50fcb729e79b9d ---------------------------------------------------------------------- Add the rebuild_sched_domains_energy() function to wrap the functionality that rebuilds the scheduling domains if any of the Energy Aware Scheduling (EAS) initialisation conditions change. This functionality is used when schedutil is added or removed or when EAS is enabled or disabled through the sched_energy_aware sysctl. Therefore, create a single function that is used in both these cases and that can be later reused. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Quentin Perret <qperret@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20201027180713.7642-2-ionela.voinescu@arm.com Signed-off-by: huwentao <huwentao19@h-partners.com> | 4 个月前 | |
sched: remove wait bookmarks mainline inclusion from mainline-v6.7-rc1 commit 37acade0ce8938f00d6979bd02b8043b5b7089ae category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I93R1D CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=37acade0ce8938f00d6979bd02b8043b5b7089ae -------------------------------- There are no users of wait bookmarks left, so simplify the wait code by removing them. Link: https://lkml.kernel.org/r/20231010035829.544242-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Benjamin Segall <bsegall@google.com> Cc: Bin Lai <sclaibin@gmail.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Conflicts: include/linux/wait.h kernel/sched/wait.c | 2 年前 | |
wait_on_bit: Add wait_on_bit_acquire() to provide memory barrier hulk inclusion category: other bugzilla: https://gitee.com/openeuler/kernel/issues/IB5UKT -------------------------------- For the previously combined 8238b4579866 ("wait_on_bit: add an acquire memory barrier") mainline patch, in order to avoid additional effects, revert the test_bit_acquire() in the wait_on_bit_xx function to the original test_bit(). In addition, define wait_on_bit_acquire() to narrow down the scope, which contains test_bit_acquire() that provide memory barrier. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> | 1 年前 |
| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 1 年前 | ||
| 11 个月前 | ||
| 3 年前 | ||
| 8 年前 | ||
| 1 年前 | ||
| 3 年前 | ||
| 6 年前 | ||
| 6 年前 | ||
| 4 个月前 | ||
| 3 年前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 8 年前 | ||
| 6 年前 | ||
| 4 个月前 | ||
| 6 年前 | ||
| 6 年前 | ||
| 26 天前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 8 个月前 | ||
| 11 个月前 | ||
| 30 天前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 2 年前 | ||
| 5 年前 | ||
| 3 年前 | ||
| 1 年前 | ||
| 1 年前 | ||
| 2 年前 | ||
| 3 个月前 | ||
| 6 年前 | ||
| 11 个月前 | ||
| 5 年前 | ||
| 6 个月前 | ||
| 4 年前 | ||
| 4 年前 | ||
| 2 年前 | ||
| 3 年前 | ||
| 6 年前 | ||
| 4 个月前 | ||
| 2 年前 | ||
| 1 年前 |