[ English | 简体中文 ]
Scheduling Management API
openvela provides POSIX-compliant task scheduling interfaces, supporting multiple scheduling policies and task management capabilities.
Header file: #include <sched.h>
openvela Implementation Notes
- Scheduling policies: Supports
SCHED_FIFO(first-in, first-out),SCHED_RR(round-robin, requiresCONFIG_RR_INTERVAL > 0),SCHED_SPORADIC(requiresCONFIG_SCHED_SPORADIC), andSCHED_OTHER(mapped to SCHED_FIFO). - Priority range: Typically 1–255; larger numeric values indicate higher priority. Priority 0 is reserved for the idle task.
- Return value style: The
task_*family returns negative error codes (such as-EINVAL), while thesched_*family follows POSIX conventions by returning -1 and settingerrno. - SMP support: CPU affinity interfaces require
CONFIG_SMPto be enabled. - task vs pthread:
task_create()creates a native openvela task, whilepthread_create()creates a POSIX thread. They share the underlying scheduler, but tasks do not support pthread-specific features (such as TSD or cleanup handlers).
Task Management
task_create
int task_create(const char *name, int priority, int stack_size,
main_t entry, char * const argv[]);
Creates a new task and makes it ready. The new task starts executing from the entry function and may receive the argument array argv. After creation, the task is immediately in the ready state, and runs according to its priority and scheduling policy.
Unlike pthread_create(), task_create() is an openvela-specific lightweight task creation interface. The created task is not a POSIX thread but a native openvela task. The task stack is automatically allocated and managed by the system.
Parameters:
nameTask name (string), used for debugging and identification. Maximum length is determined byCONFIG_TASK_NAME_SIZE. The name may beNULL, but a meaningful name is recommended for debugging.priorityTask priority (integer). The valid range depends on the scheduling policy:- Real-time priorities: typically 1–255 (query via
sched_get_priority_min/max()) - Larger values mean higher priority
- Priority 0 is typically reserved for the idle task
- Use
SCHED_PRIORITY_DEFAULTor set according to system needs
- Real-time priorities: typically 1–255 (query via
stack_sizeTask stack size (bytes). Must be large enough to accommodate local variables, function calls, and interrupt handling. At least 2048 bytes is recommended; complex tasks may need larger stacks. The stack size is automatically aligned to system requirements.entryTask entry function of typemain_twith signatureint (*)(int argc, char *argv[]). The task terminates when this function returns; the return value serves as the task exit status.argvArgument array (pointer-to-string array) passed to the task. Must beNULL-terminated. Similar tomain()'sargv. The argument strings are copied, so the original strings can be freed after the call. PassNULLor an empty array{NULL}if no arguments are needed.
Returns:
- Success: Returns the new task's PID (process ID, positive integer). Can be used for subsequent task control operations (such as
task_delete(),sched_setparam()). - Failure: Returns a negative error code:
-EINVAL: Invalid argument (e.g., priority out of range, stack size is 0)-ENOMEM: Insufficient memory; failed to allocate the task control block or stack-EAGAIN: Insufficient system resources; task count limit reached
Notes:
- Task vs thread: Tasks created by
task_create()are native openvela tasks, not POSIX threads. Tasks are more lightweight than threads but do not support some pthread features (such as thread-local storage or thread cleanup handlers). Usepthread_create()for POSIX compatibility. - Stack allocation: The stack is automatically allocated by the system (typically from the heap) and released when the task terminates. Use
task_create_with_stack()to use a pre-allocated stack. - Argument passing: The
argvarray and its strings are copied into the task's context, so the caller may free or modify the originals after the function returns. Note that arguments are shallow-copied (pointers are copied, but pointed-to data is not). - Task scheduling: After creation, the task is immediately ready. If the new task's priority is higher than the current task, it preempts the current task (preemptive scheduling).
- Task termination: The task terminates automatically when the entry function returns. It can also be terminated by calling
exit(),task_delete(), or receiving a signal (such asSIGKILL). - Resource management: After termination, the system automatically reclaims core resources (stack, task control block), but other task-allocated resources (open files, allocated memory, etc.) must be cleaned up by the task itself.
- Initial scheduling policy: The new task's scheduling policy defaults to
SCHED_FIFO(or the system default). Modify withsched_setscheduler()after creation. - Typical usage:
char *argv[] = {"arg1", "arg2", NULL}; int pid = task_create("my_task", 100, 4096, task_main, argv); if (pid < 0) { printf("Failed to create task: %d\n", pid); } - Difference from fork(): Unlike
fork(),task_create()does not duplicate the parent's address space; the new task starts at the specified entry function and does not share resources other than code with the parent.
POSIX Compatibility: openvela extension interface (non-POSIX standard).
task_create_with_stack
int task_create_with_stack(const char *name, int priority,
void *stack, int stack_size,
main_t entry, char * const argv[]);
Creates a new task using a pre-allocated stack. Similar to task_create(), but allows the caller to provide the stack memory instead of having the system allocate it automatically. Useful when precise control over memory layout is required, when using special memory regions (shared memory, DMA-accessible memory), or when optimizing startup performance.
A pre-allocated stack gives programmers more control but also more responsibility (stack size validation, memory alignment, lifecycle management).
Parameters:
nameTask name, used for debugging and identification. The string is copied, so it may be a temporary buffer. Maximum length is typically defined byCONFIG_TASK_NAME_SIZE(e.g., 31 characters + NULL). IfNULL, the task will have an auto-generated name.priorityTask priority; larger values are higher priority. Valid range is typically 1 to 255, and may be queried viasched_get_priority_min()/sched_get_priority_max(). Priority determines the scheduling order.stackPointer to the pre-allocated stack memory. Must be:- Non-NULL: Cannot be NULL, otherwise an error is returned
- Large enough: At least
stack_sizebytes - Properly aligned: Usually must be aligned to the architecture's required boundary (e.g., 8 or 16 bytes)
- Writable: The stack memory must be readable and writable
- Lifecycle managed: The caller is responsible for releasing the stack memory after task termination (if dynamically allocated)
stack_sizeStack size (bytes). Must satisfy:- Minimum requirement: At least
PTHREAD_STACK_MIN(typically several hundred bytes) - Task needs: Large enough for local variables, function call depth, interrupt/exception handling
- Alignment: Some architectures may require the size to be aligned (e.g., a multiple of 8 bytes)
- Minimum requirement: At least
entryTask entry function with signatureint main(int argc, char *argv[]). Cannot beNULL, otherwise an error is returned.argvArgument array passed to the task (similar tomain()'sargv). The array must beNULL-terminated. Can beNULL, indicating no arguments (equivalent to an empty array).
Returns:
- Success: Returns the new task's PID (positive integer)
- Failure: Returns a negative error code:
-EINVAL: Invalid argument (e.g.,stackis NULL,entryis NULL,priorityout of range)-ENOMEM: Insufficient memory (the stack is provided, but the task control block and other structures still need allocation)-EAGAIN: System task count limit reached (CONFIG_MAX_TASKS)
Notes:
- Difference from task_create:
- task_create: System automatically allocates and frees the stack
- task_create_with_stack: Caller provides the stack and is responsible for freeing it
- Stack lifecycle management:
- The stack memory must remain valid for the entire lifetime of the task
- After termination, the caller is responsible for freeing the stack memory (if dynamically allocated)
- If the stack is a static array or global variable, no explicit free is required
- Typical usage (dynamically allocated stack):
void *stack = malloc(8192); if (stack == NULL) { perror("malloc"); return -1; } int pid = task_create_with_stack("worker", 100, stack, 8192, worker_func, NULL); if (pid < 0) { perror("task_create_with_stack"); free(stack); return -1; } // ... wait for the task to finish ... waitpid(pid, NULL, 0); free(stack); // release the stack - Static stack example:
static uint8_t worker_stack[4096] __attribute__((aligned(16))); int pid = task_create_with_stack("worker", 100, worker_stack, sizeof(worker_stack), worker_func, NULL); - Stack direction: Some architectures grow the stack downward, others upward. openvela handles stack direction automatically; the caller only provides the start address and size.
- Stack alignment: Ensure the stack address is properly aligned (typically 8 or 16 bytes), otherwise it may cause undefined behavior or performance degradation:
void *stack = aligned_alloc(16, 8192); // 16-byte aligned - Stack overflow protection: A pre-allocated stack does not automatically provide overflow protection (guard page). If needed, allocate additional guard pages at the top/bottom of the stack and mark them inaccessible:
void *stack_with_guard = malloc(8192 + 4096); // extra guard page mprotect(stack_with_guard, 4096, PROT_NONE); // inaccessible void *usable_stack = (char*)stack_with_guard + 4096; task_create_with_stack("worker", 100, usable_stack, 8192, worker_func, NULL); - Shared-memory stack: Shared memory can be used as a stack for cross-process stack sharing (advanced usage; requires careful synchronization):
int shm_fd = shm_open("/worker_stack", O_CREAT | O_RDWR, 0666); ftruncate(shm_fd, 8192); void *stack = mmap(NULL, 8192, PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0); task_create_with_stack("worker", 100, stack, 8192, worker_func, NULL); - Performance considerations: Using a pre-allocated stack reduces the memory allocation overhead during task creation and improves startup performance. In scenarios with frequent task creation/destruction (such as task pools), you can maintain a stack cache pool.
- Debugging suggestion: During debugging, fill stack boundaries with magic numbers (such as 0xDEADBEEF) and periodically check them to detect stack overflow early:
uint32_t *stack_end = (uint32_t*)((char*)stack + stack_size - sizeof(uint32_t)); *stack_end = 0xDEADBEEF; // ... task runs ... if (*stack_end != 0xDEADBEEF) { printf("Stack overflow detected!\n"); } - Real-time system optimization: In real-time systems, using a pre-allocated stack avoids the nondeterminism of dynamic allocation, improving the determinism and speed of task creation.
- Pitfalls:
- Too-small stack leads to stack overflow, which is hard to debug (usually manifests as random crashes or data corruption)
- Forgetting to free a dynamically allocated stack causes memory leaks
- Unaligned stacks may cause undefined behavior (crash on some architectures, just performance degradation on others)
- Freeing the stack while the task is still running causes serious errors
POSIX Compatibility: openvela extension interface (non-POSIX standard; similar to some RTOS interfaces).
task_delete
int task_delete(pid_t pid);
Deletes (terminates) the specified task, releasing its system resources. Similar to pthread_cancel() or kill(pid, SIGKILL), but this is the openvela-native interface, more direct and efficient.
Task deletion is forced and does not go through normal cleanup (such as atexit handlers) and should be used with care. Cooperative termination mechanisms (such as setting an exit flag so the task can exit itself) are generally preferred.
Parameters:
pidPID of the task to delete. Special value 0 means delete the calling task (equivalent toexit()). Must be a valid task PID.
Returns:
- Success: Returns 0 (
OK) - Failure: Returns a negative error code:
-EINVAL:pidis invalid (e.g., negative)-ESRCH: The specified task does not exist (invalid PID or already terminated)-EPERM: The caller has no permission to delete the target task (depends on system configuration)
Notes:
- Forced termination: The task is terminated immediately; cleanup code (such as handlers registered via
pthread_cleanup_pushoratexitcallbacks) is not executed. This may lead to resource leaks (unfreed memory, unclosed files, unlocked mutexes). - Resource cleanup: The kernel automatically reclaims core resources (stack memory, task control block), but application-level resources (heap memory, open files) may not be cleaned up automatically.
- Deleting self: If
pidis 0, the task deletes itself and never returns (similar to callingexit(0)):task_delete(0); // This line is never executed - Difference from pthread_cancel:
task_delete()is an immediate forced termination without cancellation point semanticspthread_cancel()takes effect at the next cancellation point, allowing cleanuptask_delete()is an openvela extension;pthread_cancel()is the POSIX standard
- Typical usage:
int pid = task_create("worker", 100, 2048, worker_func, NULL); // ... task running ... // Terminate the task if (task_delete(pid) == 0) { printf("Task %d deleted\n", pid); } else { perror("task_delete"); } - Mutex pitfall: If the deleted task was holding a mutex, other tasks waiting for the lock will block forever (deadlock). Ensure all locks are released before deletion.
- Child task cleanup: Deleting a parent task does not automatically delete its children. To clean up an entire task tree, delete all children explicitly.
- Alternatives:
- Cooperative exit: Set an exit flag so the task checks it and exits
- pthread_cancel: Use the POSIX cancellation mechanism, which allows cleanup handlers to run
- Signals: Send
SIGTERMso the task catches it and exits gracefully
- Real-time note: In real-time systems, forced deletion may affect determinism and predictability; use with care.
- Debugging suggestion: During debugging, add logs at task entry and exit to trace the task lifecycle.
- Bulk deletion: When deleting multiple tasks, delete them in dependency order (children first, parent later) to avoid dangling references.
POSIX Compatibility: openvela extension interface (non-POSIX standard).
task_restart
int task_restart(pid_t pid);
Restarts the specified task; the task re-executes from its original entry point with the original arguments, priority, and stack size. Equivalent to terminating the task and recreating it with the same arguments.
Task restart is a special recovery mechanism for handling task exceptions or scenarios requiring a reset of the task state. Compared with delete-and-recreate, restart preserves the original task configuration.
Parameters:
pidPID of the task to restart. Must be a valid task PID (cannot be 0, since a task cannot restart itself).
Returns:
- Success: Returns 0 (
OK) - Failure: Returns a negative error code:
-EINVAL:pidis invalid (e.g., 0 or negative)-ESRCH: The specified task does not exist (invalid PID or already terminated)-EPERM: The caller has no permission to restart the target task (depends on system configuration)-ENOMEM: Insufficient memory; unable to restart the task
Notes:
- Cannot restart self: Cannot restart oneself via
task_restart(0)ortask_restart(getpid()), because restart destroys the current execution context. Attempting to do so typically returns-EINVAL. - Task state reset: After restart, all task state (local variables, stack content, registers) is reset, as if just created. The task starts execution from the entry function.
- Configuration preserved: The restarted task preserves its original configuration:
- Task name (name)
- Priority (priority)
- Stack size (stack_size)
- Entry function (entry)
- Entry arguments (argv)
- PID remains unchanged: After restart, the task's PID remains unchanged, so references from other tasks remain valid.
- Typical usage:
int pid = task_create("monitor", 150, 2048, monitor_task, NULL); // ... task runs for a while, then an anomaly is detected ... // Restart the task if (task_restart(pid) == 0) { printf("Task %d restarted\n", pid); } else { perror("task_restart"); } - Difference from task_delete + task_create:
- task_restart: PID unchanged, original configuration preserved, one-step
- task_delete + task_create: PID changes, all arguments must be specified again, two-step
- Resource cleanup: Before restart, resources held by the task (open files, allocated memory, held locks) may not be released automatically, which may cause leaks. Consider exception recovery when designing tasks.
- Mutex pitfall: If the task was holding a mutex, restart causes the lock to never be released (deadlock). Ensure all locks are released before restart, or use robust mutexes.
- Watchdog scenario: Commonly used in watchdog systems. When a task is detected to be unresponsive or misbehaving, it is automatically restarted to recover service:
void watchdog_task(void *arg) { while (1) { if (check_task_health(worker_pid) == FAILED) { printf("Worker unhealthy, restarting...\n"); task_restart(worker_pid); } sleep(5); } } - Restart count: In production systems, the number of restarts should be limited to avoid infinite restart loops:
int restart_count = 0; const int MAX_RESTARTS = 3; if (task_restart(pid) == 0) { restart_count++; if (restart_count >= MAX_RESTARTS) { printf("Max restarts reached, giving up\n"); task_delete(pid); } } - Asynchronous operation: Restart is asynchronous; the task may still be initializing when the function returns. If waiting for initialization is required, use an additional synchronization mechanism (such as a semaphore).
- Debugging suggestion: Add logs in the task entry function to record each start time and reason, which helps analyze restart history.
- Real-time system impact: Frequent restarts may affect system real-time performance and predictability; improve task robustness to reduce the need for restarts.
POSIX Compatibility: openvela extension interface (non-POSIX standard).
Task Cancellation
task_setcancelstate
int task_setcancelstate(int state, int *oldstate);
Sets the calling task's cancellation state, controlling whether the task can be canceled. Similar to pthread_setcancelstate() but applies to all tasks (not only pthreads).
The cancellation state is part of the task cancellation mechanism and allows a task to temporarily disable cancellation to protect critical sections from interruption.
Parameters:
stateThe new cancellation state. Valid values:TASK_CANCEL_ENABLE(0): Cancellation allowed (default); the task can respond to cancellation requestsTASK_CANCEL_DISABLE(1): Cancellation disabled; the task ignores cancellation requests (the request is deferred)
oldstateIf non-NULL, receives the previous cancellation state. PassNULLif the old value is not needed.
Returns:
- Success: Returns 0 (
OK) - Failure: Returns a negative error code:
-EINVAL:stateis invalid (notTASK_CANCEL_ENABLEorTASK_CANCEL_DISABLE)
Notes:
- Protecting critical sections: When performing uninterruptible critical operations (such as updating shared data structures or holding mutexes), cancellation should be disabled:
task_setcancelstate(TASK_CANCEL_DISABLE, NULL); // Critical operation, not interruptible by cancellation update_critical_data(); task_setcancelstate(TASK_CANCEL_ENABLE, NULL); - Deferred cancellation: When the cancel state is
TASK_CANCEL_DISABLE, cancellation requests are not lost but deferred. When the state is set back toTASK_CANCEL_ENABLE, any pending cancellation request will cause the task to be canceled at the next cancellation point. - Interaction with cancel type: The cancel state and cancel type (
task_setcanceltype()) together determine cancellation behavior:- state=ENABLE, type=DEFERRED: Cancel only at cancellation points (default, safest)
- state=ENABLE, type=ASYNCHRONOUS: Can cancel at any time (dangerous)
- state=DISABLE: Never cancel, regardless of type
- Typical usage (save old state):
int oldstate; task_setcancelstate(TASK_CANCEL_DISABLE, &oldstate); // Critical operation critical_section(); task_setcancelstate(oldstate, NULL); // Restore original state - With mutex:
pthread_mutex_lock(&mutex); task_setcancelstate(TASK_CANCEL_DISABLE, NULL); // Protected operation modify_shared_data(); task_setcancelstate(TASK_CANCEL_ENABLE, NULL); pthread_mutex_unlock(&mutex); - Default state: Newly created tasks default to
TASK_CANCEL_ENABLE, allowing cancellation. - Does not affect signals: The cancel state affects only cancellation via
task_delete()or similar mechanisms, not signal handling (such asSIGTERM). - Compatibility with pthread: In pthread threads,
task_setcancelstate()andpthread_setcancelstate()are usually equivalent and operate on the same underlying state. - Nested disable:
TASK_CANCEL_DISABLEcan be called multiple times, but each call should be matched by a correspondingTASK_CANCEL_ENABLE(or restore the old state); otherwise cancellation may be permanently disabled:A simpler approach is to manipulate the cancel state only at the outermost scope.int old1, old2; task_setcancelstate(TASK_CANCEL_DISABLE, &old1); // First disable task_setcancelstate(TASK_CANCEL_DISABLE, &old2); // Second disable (no effect) task_setcancelstate(old2, NULL); // Restore to old2 (still disabled) task_setcancelstate(old1, NULL); // Restore to old1 (possibly enabled) - Cleanup handlers: Even when cancellation is disabled, cleanup handlers (
pthread_cleanup_push) still run when the task exits normally. - Performance: Setting the cancel state is lightweight, but avoid frequent switches in tight loops to prevent performance impact.
POSIX Compatibility: Similar to pthread_setcancelstate(), but applies to all task types.
task_setcanceltype
int task_setcanceltype(int type, int *oldtype);
Sets the calling task's cancellation type, controlling when cancellation requests take effect. Similar to pthread_setcanceltype() but applies to all tasks (not only pthreads).
The cancellation type determines when a task responds to cancellation requests, affecting cancellation safety and responsiveness.
Parameters:
typeThe new cancellation type. Valid values:TASK_CANCEL_DEFERRED(0): Deferred cancellation (default); responds to requests only at cancellation points, such aspthread_testcancel(),sleep(),read(), and other blocking callsTASK_CANCEL_ASYNCHRONOUS(1): Asynchronous cancellation; the task can be canceled at any time (dangerous; may cause resource leaks or data inconsistency)
oldtypeIf non-NULL, receives the previous cancellation type. PassNULLif not needed.
Returns:
- Success: Returns 0 (
OK) - Failure: Returns a negative error code:
-EINVAL:typeis invalid (notTASK_CANCEL_DEFERREDorTASK_CANCEL_ASYNCHRONOUS)
Notes:
- Default type (DEFERRED) is safest: Deferred cancellation is the default and recommended type; it only takes effect at well-defined cancellation points, allowing the task to complete the current operation and clean up resources before being canceled.
- Danger of asynchronous cancellation:
TASK_CANCEL_ASYNCHRONOUSis extremely dangerous because the task may be canceled at any moment:- May be canceled while holding a mutex, causing deadlock
- May be canceled mid-update of a data structure, causing data inconsistency
- May be canceled after allocating memory but before saving the pointer, causing a memory leak
- Only very specific code (such as pure computation tasks that do not access shared resources) should use asynchronous cancellation
- Cancellation points: Common cancellation points include:
task_testcancel()/pthread_testcancel(): Explicit cancellation points- Blocking system calls:
sleep(),usleep(),read(),write(),recv(),send() - Synchronization primitives:
pthread_cond_wait(),sem_wait() - Some library functions:
printf()(possibly)
- Typical usage (temporarily enable asynchronous cancel):
But it is usually better to callint oldtype; task_setcanceltype(TASK_CANCEL_ASYNCHRONOUS, &oldtype); // Pure computation task, no shared resources perform_long_computation(); task_setcanceltype(oldtype, NULL); // Restore original typetask_testcancel()periodically inside a loop. - Recommended approach (add cancellation points in loops):
This is safer than asynchronous cancellation and still responds promptly.while (processing) { process_chunk(); task_testcancel(); // Periodically check for cancellation requests } - Interaction with cancel state: The cancel type and state together determine cancellation behavior:
- state=ENABLE, type=DEFERRED: Cancel only at cancellation points (default, safest)
- state=ENABLE, type=ASYNCHRONOUS: Can cancel at any time (dangerous)
- state=DISABLE: Never cancel, regardless of type
- Cleanup handlers: Regardless of cancellation type, cleanup handlers (
pthread_cleanup_push) run when the task is canceled. However, asynchronous cancellation may trigger cleanup in an inconsistent state, causing issues. - Real-time considerations: In real-time systems, asynchronous cancellation may impact determinism; avoid using it. Prefer deferred cancellation or cooperative exit mechanisms.
- Compatibility with pthread: In pthread threads,
task_setcanceltype()andpthread_setcanceltype()are usually equivalent. - Default value: Newly created tasks default to
TASK_CANCEL_DEFERRED. - Avoid mixing: Do not mix deferred and asynchronous cancel types in the same task; it makes the code hard to understand and maintain. Pick one and stick with it.
- Async-safe code: If you must use asynchronous cancellation, ensure the task code is async-cancel-safe, similar to signal handler requirements:
- Do not call non-async-safe functions (such as
malloc(),printf()) - Do not access shared data (or use atomic operations)
- Do not hold any locks
- Do not call non-async-safe functions (such as
POSIX Compatibility: Similar to pthread_setcanceltype(), but applies to all task types.
task_testcancel
void task_testcancel(void);
Creates an explicit cancellation point. If there is a pending cancellation request and the task's cancel state is TASK_CANCEL_ENABLE, the task is canceled and terminates here.
This is the core of the deferred cancellation mechanism, allowing the task to respond to cancellation requests at safe locations, ensuring correct resource release and state consistency.
Parameters:
None.
Returns:
- No return value (if canceled, the function never returns)
- If there is no pending cancellation request, or if cancellation is disabled, the function returns normally
Notes:
- Explicit cancellation point: This function is a programmer-invoked cancellation point, distinct from implicit cancellation points (such as
sleep(),read()). Explicit points provide precise control, allowing the task to check for cancellation at safe locations. - Typical usage (long-running loop):
This ensures timely response to cancellation while still completing each loop iteration.while (processing) { process_data_chunk(); task_testcancel(); // Periodically check for cancellation requests } - Cancellation conditions: The task is canceled only when all of the following are met:
- There is a pending cancellation request (via
task_delete()or a similar mechanism) - The cancel state is
TASK_CANCEL_ENABLE(default) - The cancel type is
TASK_CANCEL_DEFERRED(default), or regardless of type (if the state is ENABLE)
- There is a pending cancellation request (via
- Cleanup handlers: If the task is canceled, cleanup handlers (registered via
pthread_cleanup_push) run before termination. Register cleanup handlers for critical resources (such as mutexes):pthread_mutex_lock(&mutex); pthread_cleanup_push((void(*)(void*))pthread_mutex_unlock, &mutex); // Possibly-canceled operation while (condition) { process_data(); task_testcancel(); } pthread_cleanup_pop(1); // Unlock on normal exit too - Does not affect asynchronous cancellation: If the cancel type is
TASK_CANCEL_ASYNCHRONOUS, the task may be canceled at any moment, not only at cancellation points. - Compatibility with pthread_testcancel:
task_testcancel()andpthread_testcancel()are usually equivalent and interchangeable. - Performance: Checking for a cancellation request is lightweight, but avoid calling it in extremely tight loops (e.g., microsecond-level iterations). Call once every N items processed:
for (int i = 0; i < items; i++) { process_item(i); if (i % 100 == 0) task_testcancel(); // Check every 100 items } - Cancel-safe locations: Call
task_testcancel()at locations where:- No mutex is held
- All data structures are consistent
- No unreleased temporary resources exist
- Real-time system impact: In real-time systems, cancellation points add (small) uncertainty to response time. For the highest determinism, avoid the cancellation mechanism and use cooperative exit.
- Debugging suggestion: When debugging cancellation issues, add logs before and after
task_testcancel()to trace cancellation point execution:printf("Before testcancel\n"); task_testcancel(); printf("After testcancel (not cancelled)\n"); - Comparison with exit flag:
- task_testcancel(): Kernel mechanism, lighter-weight, integrates with cleanup handlers
- Exit flag: User-space mechanism, more flexible, but cleanup must be managed manually
// Exit flag approach volatile bool should_exit = false; while (!should_exit) { process_data(); } - No-op return: If there is no cancellation request, the function returns immediately with almost no overhead (just a flag check).
POSIX Compatibility: Similar to pthread_testcancel(), but applies to all task types.
Scheduling Policies and Parameters
sched_setscheduler
int sched_setscheduler(pid_t pid, int policy, const struct sched_param *param);
Sets the scheduling policy and parameters (such as priority) of the specified task. This is the primary interface to control task scheduling behavior, allowing dynamic adjustment of task scheduling characteristics at runtime.
The scheduling policy determines how the task competes for CPU time. Different policies suit different types of tasks (real-time, batch, interactive, etc.). Modifying the scheduling policy typically requires appropriate privileges.
Parameters:
pidPID of the target task. Special value 0 means the calling task. Must be a valid task PID.policyThe new scheduling policy. Valid values include:SCHED_FIFO(0): First-in, first-out real-time scheduling, no time slice, suitable for hard real-time tasksSCHED_RR(1): Round-robin real-time scheduling, with time slice, suitable for real-time tasks needing fairnessSCHED_SPORADIC(2): Sporadic scheduling, suitable for periodic real-time tasks (requiresCONFIG_SCHED_SPORADIC)SCHED_OTHER(3): Standard time-sharing scheduling, mapped to SCHED_FIFO or SCHED_RRSCHED_NORMAL(3): Alias for SCHED_OTHERSCHED_BATCH(4): Batch scheduling (if supported)SCHED_IDLE(5): Idle scheduling, lowest priority (if supported)
paramPointer to astruct sched_paramcontaining scheduling parameters. At minimum, set thesched_priorityfield (basic priority). For SCHED_SPORADIC, also set the sporadic server parameters (low priority, replenish period, initial budget, maximum replenishments).
Returns:
- Success: Returns the task's previous scheduling policy (
SCHED_FIFO,SCHED_RR, etc.) - Failure: Returns -1 and sets
errno:EINVAL:policyis invalid, or the priority inparamis out of range for the policyESRCH: The specified task does not exist (invalid PID or already terminated)EPERM: The caller lacks permission to modify the target task's scheduling policy (typically requires superuser or same user)EFAULT:parampoints to invalid memory
Notes:
- Priority range: Different scheduling policies have different valid priority ranges; query via
sched_get_priority_min(policy)andsched_get_priority_max(policy). Setting an out-of-range priority returnsEINVAL. - Policy switch effects:
- Switching to a higher-priority policy may immediately preempt the current task
- Switching to a lower priority may cause the task to be preempted by others
- Policy changes do not affect task ready state; blocked tasks remain blocked
- SCHED_SPORADIC parameters: When using this policy, set in
param:sched_ss_low_priority: Low priority after budget is exhaustedsched_ss_repl_period: Replenish periodsched_ss_init_budget: Initial budgetsched_ss_max_repl: Maximum pending replenishments (<=SS_REPL_MAX)
- Real-time policies: SCHED_FIFO and SCHED_RR are real-time policies and typically require elevated privileges. Real-time tasks can affect system responsiveness; use with care.
- Policy inheritance: Child tasks (created via
fork()ortask_create()) typically inherit the parent's policy and priority unless specified at creation or modified later. - Typical usage:
struct sched_param param; param.sched_priority = 150; // Set high priority int old_policy = sched_setscheduler(0, SCHED_FIFO, ¶m); if (old_policy < 0) { perror("sched_setscheduler"); } else { printf("Changed from policy %d to SCHED_FIFO\n", old_policy); } - Querying current policy: Use
sched_getscheduler(pid)to query the task's current policy. - Modify priority only: To modify only the priority without changing the policy, use
sched_setparam(); it is more efficient and its semantics are clearer. - Atomicity: Policy and parameter modifications are atomic; no intermediate state is observable.
- Effect on running task: Modifying the currently running task (pid=0) causes the scheduler to reevaluate priorities immediately, which may preempt the task.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_getscheduler
int sched_getscheduler(pid_t pid);
Queries the current scheduling policy of the specified task. A lightweight query interface used to obtain the task's scheduling policy type (such as SCHED_FIFO or SCHED_RR).
The scheduling policy determines the task's scheduling behavior. Understanding the current policy helps debugging and monitoring task real-time characteristics.
Parameters:
pidPID of the target task. Special value 0 queries the calling task. Must be a valid task PID.
Returns:
- Success: Returns the task's current scheduling policy (non-negative integer):
SCHED_FIFO(0): First-in, first-out real-time schedulingSCHED_RR(1): Round-robin real-time schedulingSCHED_SPORADIC(2): Sporadic scheduling (if supported)SCHED_OTHER(3): Standard time-sharing schedulingSCHED_BATCH(4): Batch scheduling (if supported)SCHED_IDLE(5): Idle scheduling (if supported)
- Failure: Returns -1 and sets
errno:ESRCH: The specified task does not exist (invalid PID or already terminated)EINVAL:pidis negative
Notes:
- Read-only query: Does not modify task state; a pure query with very low overhead.
- Use with others: Typically used with
sched_getparam()to obtain complete scheduling info (policy + parameters). - Typical usage:
int policy = sched_getscheduler(0); // Query own policy if (policy >= 0) { const char *policy_names[] = {"SCHED_FIFO", "SCHED_RR", "SCHED_SPORADIC", "SCHED_OTHER"}; printf("Current policy: %s\n", policy_names[policy]); } else { perror("sched_getscheduler"); } - Task diagnostics: When debugging real-time systems, use this function to verify that a task runs under the expected scheduling policy.
- Monitoring tools: System monitoring tools often use this function to display task scheduling policy to aid in analyzing scheduling behavior.
- Modifying policy: To modify the scheduling policy, use
sched_setscheduler(). - Policy name mapping: Use a switch or array to map returned integer values to policy names for readability.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_setparam
int sched_setparam(pid_t pid, const struct sched_param *param);
Modifies the scheduling parameters (primarily priority) of the specified task, without changing the scheduling policy. This is the standard interface for adjusting task priority; lighter and more clearly semantic than sched_setscheduler().
Priority is the most important parameter in the scheduler; it determines execution order among tasks under the same policy. Dynamically adjusting priority is common in real-time systems (for example, implementing priority inheritance or priority ceiling protocols).
Parameters:
pidPID of the target task. Special value 0 modifies the calling task. Must be a valid task PID.paramPointer to astruct sched_paramcontaining the new scheduling parameters. Main fields:sched_priority: New priority value (required); must be within the current policy's valid range- For SCHED_SPORADIC, includes
sched_ss_low_priority,sched_ss_repl_period,sched_ss_init_budget,sched_ss_max_repl, and other sporadic server parameters
Returns:
- Success: Returns 0
- Failure: Returns -1 and sets
errno:EINVAL: Priority inparamout of range for the current policy, or invalid SCHED_SPORADIC parametersESRCH: The specified task does not exist (invalid PID or already terminated)EPERM: Caller lacks permission to modify the target task's scheduling parameters (typically requires superuser or same user)EFAULT:parampoints to invalid memory
Notes:
- Policy unchanged: Modifies scheduling parameters only; does not change the policy. To change both, use
sched_setscheduler(). - Priority range: Each policy has its valid priority range; query via
sched_get_priority_min()andsched_get_priority_max(). Out-of-range values returnEINVAL. - Takes effect immediately: Priority modifications take effect immediately; the scheduler reevaluates priorities:
- If the new priority is higher, the task may preempt the current task immediately
- If lower, the task may be preempted by higher-priority tasks
- For blocked tasks, the new priority takes effect upon resumption
- Priority inheritance: When implementing priority inheritance for mutexes, this function is often used to temporarily raise a low-priority task's priority to avoid priority inversion.
- Typical usage:
struct sched_param param; param.sched_priority = 100; // Set new priority if (sched_setparam(0, ¶m) == 0) { printf("Priority changed to %d\n", param.sched_priority); } else { perror("sched_setparam"); } - Query first: Use
sched_getparam()to get current parameters, then modify only the fields you need. - Real-time tuning: In real-time systems, dynamically adjusting priorities based on actual execution can improve responsiveness and throughput.
- Permission requirement: Modifying another task's priority typically requires privilege. In embedded systems, where all tasks often run at the same privilege level, this restriction may be looser.
- Effect on running task: Modifying the currently running task's priority (pid=0) may trigger immediate rescheduling if a higher-priority task is ready.
- SCHED_SPORADIC parameters: For sporadic tasks, other fields in
param(such assched_ss_low_priority) can also be modified via this function, allowing dynamic adjustment of sporadic server behavior.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_getparam
int sched_getparam(pid_t pid, struct sched_param *param);
Queries the current scheduling parameters (primarily priority) of the specified task. The standard interface to obtain task priority and other scheduling parameters, commonly used for monitoring, debugging, and dynamic scheduling decisions.
Scheduling parameters include the basic priority and policy-specific parameters (such as the replenish period for sporadic scheduling). Understanding these parameters helps explain task scheduling behavior.
Parameters:
pidPID of the target task. Special value 0 queries the calling task. Must be a valid task PID.paramPointer to astruct sched_paramthat receives the query results. The function fills this structure:sched_priority: Basic priority (always set)- For SCHED_SPORADIC, also includes
sched_ss_low_priority,sched_ss_repl_period,sched_ss_init_budget,sched_ss_max_repl, and other sporadic server parameters - For other policies (SCHED_FIFO, SCHED_RR, SCHED_OTHER), typically only
sched_priorityis set
Returns:
- Success: Returns 0 and fills
paramwith the task's scheduling parameters - Failure: Returns -1 and sets
errno:ESRCH: The specified task does not exist (invalid PID or already terminated)EINVAL:pidis negativeEFAULT:parampoints to invalid memory (NULL or unwritable)
Notes:
- Read-only query: Does not modify task state; lightweight.
- Full scheduling info: Typically used with
sched_getscheduler()for complete scheduling info (policy + parameters):int policy = sched_getscheduler(0); struct sched_param param; sched_getparam(0, ¶m); printf("Policy: %d, Priority: %d\n", policy, param.sched_priority); - Typical usage:
struct sched_param param; if (sched_getparam(0, ¶m) == 0) { printf("Current priority: %d\n", param.sched_priority); } else { perror("sched_getparam"); } - Dynamic adjustment reference: Before dynamically adjusting priority, use this function to read current parameters, then modify specific fields, then apply with
sched_setparam(). - Monitoring tools: System monitoring tools often use this function to show task priorities, aiding scheduling analysis and diagnosis of issues such as priority inversion.
- Sporadic parameters: For SCHED_SPORADIC tasks, the function returns the full sporadic server configuration (budget, replenish period, etc.).
- Parameter initialization: No need to initialize
parambefore calling; the function fully overwrites its contents. Just ensureparampoints to valid memory. - Task diagnostics: When debugging real-time systems, periodically query the priorities of key tasks to verify mechanisms like priority inheritance.
- Atomicity: The query is atomic; the returned parameters form a consistent snapshot without partial updates.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
Scheduling Control
sched_yield
int sched_yield(void);
Voluntarily yields the CPU, putting the calling task back on the ready queue so the scheduler can select other tasks of equal or higher priority to run. This is a cooperative scheduling mechanism for fairness and responsiveness between tasks.
For SCHED_FIFO, the task is moved to the end of its priority queue; for SCHED_RR, the effect is similar to time slice expiration. This allows tasks of the same priority to get a chance to run.
Parameters:
None.
Returns:
- Success: Returns 0 (
OK) - Failure: Returns -1 and sets
errno(typically always succeeds)
Notes:
- Cooperative scheduling: Key to cooperative multitasking; allows tasks to yield the CPU voluntarily, improving overall system responsiveness.
- Does not lower priority:
sched_yield()does not change the task's priority; it only temporarily yields execution. The task remains in the ready queue at the same priority. - Policy-specific behavior:
- SCHED_FIFO: Task moves to the end of its priority queue; if other tasks at the same priority are ready, they run first
- SCHED_RR: Similar to SCHED_FIFO, and the time slice counter is reset
- Single-task case: If no other equal- or higher-priority tasks are ready, the calling task continues immediately (yield has no effect)
- Typical usage:
while (processing) { // Process a batch of data process_data_chunk(); // Voluntarily yield CPU to other tasks sched_yield(); } - Improving responsiveness: Calling
sched_yield()periodically in long-running loops avoids starving low-priority tasks and improves interactivity. - Polling optimization: Using
sched_yield()in polling loops reduces CPU usage and gives other tasks more chances to run:Usually a better solution is blocking waits (semaphore, condition variable).while (!flag_is_set()) { sched_yield(); // Avoid busy-waiting } - Difference from sleep:
sched_yield()does not guarantee the task is blocked; if no other ready task exists, it continues immediatelysleep()/usleep()puts the task to sleep for at least the given time, not consuming CPU during that period
- Real-time note: In real-time systems, overuse of
sched_yield()can cause unpredictability; use sparingly. Prefer explicit synchronization (mutexes, condition variables, semaphores). - Highest-priority task: If the calling task is the highest-priority task in the system,
sched_yield()typically has no practical effect (returns immediately and continues). - Time slice reset: For SCHED_RR,
sched_yield()resets the time slice counter, effectively giving up the current slice. - Non-preemptive check: Even if
sched_yield()returns immediately, a context-switch check occurs, and the scheduler re-evaluates decisions.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_get_priority_max
int sched_get_priority_max(int policy);
Queries the maximum (highest) priority allowed for the specified scheduling policy. Different policies have different priority ranges; this function returns the valid upper bound to ensure configured priorities fall within the legal range.
In openvela, larger numeric values indicate higher priority (same as Linux, but opposite some RTOSes). Knowing the priority range is essential for correctly configuring real-time tasks.
Parameters:
policyScheduling policy. Valid values include:SCHED_FIFO(0)SCHED_RR(1)SCHED_SPORADIC(2) (if supported)SCHED_OTHER(3)- Other system-supported policies
Returns:
- Success: Returns the maximum priority value for the specified policy (positive integer, typically 255)
- Failure: Returns -1 and sets
errno:EINVAL:policyis invalid or unsupported
Notes:
- Use together with the min version: Typically used with
sched_get_priority_min()to get the full priority range:int min_prio = sched_get_priority_min(SCHED_FIFO); int max_prio = sched_get_priority_max(SCHED_FIFO); printf("SCHED_FIFO priority range: %d to %d\n", min_prio, max_prio); - openvela default range: In openvela, most policies have a priority range of 1 to 255:
- 1: Lowest priority (typically the IDLE task)
- 255: Highest priority (urgent real-time tasks)
- 100–200: Typical application task priority range
- Policy independence: In openvela, all real-time policies (SCHED_FIFO, SCHED_RR, SCHED_SPORADIC) usually share the same priority space, so the max value is the same.
- Parameter validation: Before calling
sched_setparam()orsched_setscheduler(), use this function to validate priority to avoidEINVAL:int new_priority = 200; if (new_priority <= sched_get_priority_max(SCHED_FIFO)) { struct sched_param param = {.sched_priority = new_priority}; sched_setscheduler(0, SCHED_FIFO, ¶m); } - Portability: Priority ranges differ between operating systems; using this function produces portable code without hardcoded priority values.
- Typical usage:
int max = sched_get_priority_max(SCHED_FIFO); if (max < 0) { perror("sched_get_priority_max"); } else { printf("Max priority for SCHED_FIFO: %d\n", max); } - Real-time configuration: Key real-time tasks are usually configured near the max priority so they preempt other tasks.
- Priority layering: In complex systems, priorities can be partitioned into layers (system, driver, application), each using a distinct subrange.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_get_priority_min
int sched_get_priority_min(int policy);
Queries the minimum (lowest) priority allowed for the specified scheduling policy. Different policies have different priority ranges; this function returns the valid lower bound to ensure configured priorities fall within the legal range.
In openvela, smaller numeric values indicate lower priority. The minimum priority is typically reserved for background or idle tasks.
Parameters:
policyScheduling policy (same values assched_get_priority_max).
Returns:
- Success: Returns the minimum priority for the specified policy (positive integer, typically 1)
- Failure: Returns -1 and sets
errno:EINVAL:policyis invalid or unsupported
Notes:
- Use with the max version:
int min_prio = sched_get_priority_min(SCHED_RR); int max_prio = sched_get_priority_max(SCHED_RR); printf("SCHED_RR priority range: [%d, %d]\n", min_prio, max_prio); - openvela default value: In openvela, the minimum priority is typically 1 (priority 0 is sometimes reserved for the system or special uses).
- IDLE task: The system idle task typically runs at the minimum priority and only runs when no other tasks are ready.
- Background tasks: Low-priority background tasks (logging, statistics) typically use a near-minimum priority to avoid impacting foreground tasks.
- Parameter validation: Before setting priority, validate with this function:
int new_priority = 5; if (new_priority >= sched_get_priority_min(SCHED_FIFO) && new_priority <= sched_get_priority_max(SCHED_FIFO)) { struct sched_param param = {.sched_priority = new_priority}; sched_setparam(0, ¶m); } - Portability: Priority ranges differ across operating systems; use this function for portable code.
- Typical usage:
int min = sched_get_priority_min(SCHED_FIFO); if (min < 0) { perror("sched_get_priority_min"); } else { printf("Min priority for SCHED_FIFO: %d\n", min); } - Priority assignment strategy: Avoid using the minimum priority (unless for truly background tasks) to prevent CPU starvation.
- Policy independence: In openvela, all real-time policies typically share the same priority space, so the minimum is the same.
- Debugging tool: These functions can power generic priority-checking tools to validate system configuration.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_rr_get_interval
int sched_rr_get_interval(pid_t pid, struct timespec *interval);
Queries the time slice length for a task using the SCHED_RR scheduling policy. The time slice is the maximum continuous runtime before the task is forced to yield under SCHED_RR.
This function helps understand the system's scheduling granularity and tune real-time application performance and responsiveness.
Parameters:
pidPID of the target task. Special value 0 queries the calling task. Must be a valid task PID.intervalPointer to astruct timespecthat receives the time slice length. The function fills this structure:tv_sec: Seconds part (typically 0, since time slices are usually less than 1 second)tv_nsec: Nanoseconds part (e.g., 10,000,000 ns = 10 ms)
Returns:
- Success: Returns 0 (
OK) and fillsinterval - Failure: Returns -1 and sets
errno:ESRCH: The specified task does not existEINVAL:pidis negativeEFAULT:intervalpoints to invalid memory (NULL or unwritable)ENOSYS: System does not support this feature (SCHED_RR not enabled)
Notes:
- SCHED_RR only: The time slice concept applies only to SCHED_RR. Under SCHED_FIFO, tasks run until they block or are preempted by a higher-priority task; there is no time slice.
- System-level configuration: Time slice length is typically a system-wide configuration (compile-time or startup). Querying different tasks typically returns the same value.
- Typical value: In openvela, the default time slice is usually 10 ms (10,000,000 ns), adjustable via configuration (such as
CONFIG_RR_INTERVAL). - Typical usage:
struct timespec ts; if (sched_rr_get_interval(0, &ts) == 0) { long ms = ts.tv_sec * 1000 + ts.tv_nsec / 1000000; printf("Time slice: %ld ms\n", ms); } else { perror("sched_rr_get_interval"); } - Performance tuning: Knowing the time slice helps tune task design:
- If a task's critical operations approach the time slice, consider optimizing the algorithm or using SCHED_FIFO
- If multiple same-priority SCHED_RR tasks need to share CPU fairly, ensure work units are smaller than the time slice
- Real-time analysis: The time slice is an important parameter in response-time analysis and affects worst-case response time.
- Slice exhaustion: When a SCHED_RR task's time slice is exhausted, the task moves to the end of its priority queue and the time slice is reset.
- sched_yield(): For SCHED_RR tasks, calling
sched_yield()resets the time slice counter. - Non-SCHED_RR tasks: Even if the task is not currently using SCHED_RR, this function typically returns the system default time slice, but the value is not meaningful for non-SCHED_RR tasks.
POSIX Compatibility: Compatible with the POSIX interface of the same name.
sched_lock
void sched_lock(void);
Disables task scheduling (preemption). After calling, the current task cannot be preempted by other tasks of equal or higher priority until sched_unlock() is called to restore scheduling. Supports nested calls; each sched_lock() must be paired with a corresponding sched_unlock().
Parameters:
None.
Returns:
No return value.
Notes:
- Nested support:
sched_lock()maintains a lock counter that increments on each call;sched_unlock()decrements it. Scheduling only truly resumes when the counter reaches zero. - Does not affect interrupts:
sched_lock()only disables task-level preemption, not interrupts. Interrupt handlers may still execute. - Difference from disabling interrupts:
sched_lock(): Disables task switching; interrupts still respondenter_critical_section(): Disables interrupts; stronger protection but longer latency
- Typical usage:
sched_lock(); // Critical section: not preempted by other tasks update_shared_data(); sched_lock(); // Nested call do_more_work(); sched_unlock(); // Counter decrements; still locked sched_unlock(); // Counter reaches 0; scheduling resumes - SMP note: On multi-core systems,
sched_lock()only protects scheduling on the current CPU; tasks on other CPUs still run. For cross-core protection, use spinlocks or other SMP synchronization. - Avoid long holds: Disabling scheduling for long periods impacts real-time performance; keep critical sections short.
POSIX Compatibility: openvela/NuttX extension interface (non-POSIX standard).
sched_unlock
void sched_unlock(void);
Resumes task scheduling (preemption). Decrements the scheduling lock counter; when it reaches zero, normal scheduling resumes. Must be paired with sched_lock().
Parameters:
None.
Returns:
No return value.
Notes:
- Each
sched_unlock()corresponds to onesched_lock(); do not call extra. - When the counter reaches zero, if a higher-priority task is ready, a context switch occurs immediately.
POSIX Compatibility: openvela/NuttX extension interface (non-POSIX standard).
sched_lockcount
int sched_lockcount(void);
Queries the current task's scheduling lock nesting count. The return value indicates how many sched_lock() calls have not yet been matched by a corresponding sched_unlock().
Parameters:
None.
Returns:
Returns the current task's scheduling lock count (non-negative integer). 0 means scheduling is not locked.
Notes:
- Mainly used for debugging to verify that
sched_lock()/sched_unlock()are correctly paired. - Typical usage:
int count = sched_lockcount(); if (count > 0) { printf("Scheduler locked, count=%d\n", count); }
POSIX Compatibility: openvela/NuttX extension interface (non-POSIX standard).
CPU Affinity
sched_getcpu
int sched_getcpu(void);
Gets the CPU core number on which the calling task is currently running. Only meaningful in SMP (symmetric multiprocessing) systems; used to query task-to-CPU binding.
On multi-core systems, knowing which CPU a task runs on helps with performance analysis, debugging, and optimizing CPU affinity strategies.
Parameters:
None.
Returns:
- Success: Returns the current CPU number (non-negative integer, starting from 0)
- 0: First CPU core
- 1: Second CPU core
- ...and so on
- Failure: Returns -1 and sets
errno(typically does not fail)ENOSYS: System does not support SMP (CONFIG_SMPnot enabled)
Notes:
- SMP only: Valid only in multi-core systems with
CONFIG_SMP. On single-core systems, typically returns 0. - Instantaneous value: The returned CPU number is the value at query time. In preemptive multitasking, the task may migrate to another CPU immediately after the function returns.
- Typical usage:
int cpu = sched_getcpu(); if (cpu >= 0) { printf("Running on CPU %d\n", cpu); } else { perror("sched_getcpu"); } - Performance analysis: Recording which CPU a task runs on helps analyze CPU load distribution and migration frequency.
- Debugging: When debugging CPU affinity issues, periodically query the CPU number to verify a task runs on the expected CPU.
- Used with affinity: Typically combined with
sched_setaffinity()andsched_getaffinity():cpu_set_t set; CPU_ZERO(&set); CPU_SET(2, &set); // Bind to CPU 2 sched_setaffinity(0, sizeof(set), &set); int cpu = sched_getcpu(); assert(cpu == 2); // Verify binding succeeded - Not reliable for synchronization: Do not use this for synchronization; the return value may become stale before use.
- NUMA systems: On NUMA architectures, knowing which CPU a task is on helps optimize memory access patterns and reduce cross-node memory access.
- Hotspot analysis: If multiple tasks frequently run on the same CPU, this may indicate load imbalance, needing affinity or priority tuning.
- Interrupt context: This function can also be called in interrupt handlers to query on which CPU the interrupt is handled.
- Thread binding: In multi-threaded apps, different threads can be bound to different CPUs, then verified using this function.
POSIX Compatibility: Compatible with the Linux extension interface (non-POSIX standard, but widely supported).
sched_setaffinity
int sched_setaffinity(pid_t pid, size_t cpusetsize, const cpu_set_t *mask);
Sets the CPU affinity mask of a task, specifying the set of CPU cores on which the task may run. Valid only in SMP systems; requires CONFIG_SMP.
CPU affinity allows binding a task to specific CPU cores, useful for optimizing cache locality, reducing context-switch overhead, and isolating critical tasks.
Parameters:
pidPID of the target task. Special value 0 sets the calling task's affinity. Must be a valid task PID.cpusetsizeSize (in bytes) of the CPU setmaskpoints to. Typicallysizeof(cpu_set_t). Allows future expansion to support more CPUs.maskPointer to the CPU affinity mask (cpu_set_t). Each bit corresponds to a CPU:- Bit 1: Task allowed to run on that CPU
- Bit 0: Not allowed
Manipulate the mask with
CPU_ZERO(),CPU_SET(),CPU_CLR(), etc.
Returns:
- Success: Returns 0 (
OK) - Failure: Returns -1 and sets
errno:ESRCH: The specified task does not existEINVAL: The CPU set inmaskis invalid (for example, all zeros or contains non-existent CPUs)EFAULT:maskpoints to invalid memory (NULL or unreadable)EPERM: Caller lacks permission to modify the target task's affinity (typically requires superuser or same user)ENOSYS: System does not support SMP (CONFIG_SMPnot enabled)
Notes:
- SMP only: Valid only in multi-core systems with
CONFIG_SMP. Single-core systems typically returnENOSYS. - Takes effect immediately: Affinity changes take effect immediately. If the task currently runs on a CPU not in the new mask, the scheduler migrates it to an allowed CPU.
- Typical usage:
cpu_set_t set; CPU_ZERO(&set); // Clear mask CPU_SET(0, &set); // Allow CPU 0 CPU_SET(1, &set); // Allow CPU 1 if (sched_setaffinity(0, sizeof(set), &set) == 0) { printf("Affinity set to CPU 0 and 1\n"); } else { perror("sched_setaffinity"); } - Bind to a single CPU:
cpu_set_t set; CPU_ZERO(&set); CPU_SET(2, &set); // Only allow CPU 2 sched_setaffinity(0, sizeof(set), &set); - Performance optimization:
- Cache locality: Binding to a specific CPU improves cache hit rate and reduces cache invalidation
- Less migration overhead: Avoids frequent task migration between CPUs and reduces context-switch cost
- Load isolation: Binding critical real-time tasks to dedicated CPUs avoids interference from other tasks
- NUMA systems: On NUMA architectures, bind tasks to CPUs near their accessed memory to reduce cross-node latency.
- Permission: Modifying another task's affinity typically requires privileges. In embedded systems, this restriction may be looser.
- Child inheritance: Children (created via
fork()ortask_create()) typically inherit the parent's CPU affinity. - Used with getaffinity: Typically combined with
sched_getaffinity()to read, modify, then set back:cpu_set_t set; sched_getaffinity(0, sizeof(set), &set); CPU_CLR(3, &set); // Remove CPU 3 sched_setaffinity(0, sizeof(set), &set); - Verify the setting: After setting, use
sched_getcpu()to verify the task runs on the expected CPU. - Pitfalls:
- If
maskis all zeros (no CPU allowed), returnsEINVAL - If the mask specifies a CPU outside system range (e.g., CPU 5 on a 4-core system), returns
EINVAL - Over-binding may cause load imbalance with some CPUs overloaded and others idle
- If
- Dynamic adjustment: Adjusting affinity at runtime based on load enables flexible load balancing.
POSIX Compatibility: Compatible with the Linux extension interface (non-POSIX standard, but widely supported).
sched_getaffinity
int sched_getaffinity(pid_t pid, size_t cpusetsize, cpu_set_t *mask);
Queries the CPU affinity mask of a task. Valid only in SMP systems; requires CONFIG_SMP.
Querying CPU affinity helps understand the task's CPU binding strategy for monitoring, debugging, and dynamically adjusting affinity.
Parameters:
pidPID of the target task. Special value 0 queries the calling task. Must be a valid task PID.cpusetsizeSize (in bytes) of the CPU setmaskpoints to. Typicallysizeof(cpu_set_t).maskPointer to the CPU affinity mask that receives the query results. The function fills thiscpu_set_t, where:- Bit 1: Task allowed to run on that CPU
- Bit 0: Not allowed
Returns:
- Success: Returns 0 (
OK) and fillsmask - Failure: Returns -1 and sets
errno:ESRCH: The specified task does not existEINVAL:cpusetsizeis too small to hold the system's CPU countEFAULT:maskpoints to invalid memory (NULL or unwritable)ENOSYS: System does not support SMP
Notes:
- SMP only: Valid only in multi-core systems with
CONFIG_SMP. Single-core systems typically returnENOSYS. - Read-only query: Does not modify task state; lightweight.
- Typical usage:
cpu_set_t set; if (sched_getaffinity(0, sizeof(set), &set) == 0) { printf("Task can run on CPUs: "); for (int i = 0; i < CPU_SETSIZE; i++) { if (CPU_ISSET(i, &set)) { printf("%d ", i); } } printf("\n"); } else { perror("sched_getaffinity"); } - Check a specific CPU:
cpu_set_t set; sched_getaffinity(0, sizeof(set), &set); if (CPU_ISSET(2, &set)) { printf("Task can run on CPU 2\n"); } - Query before modify: When modifying affinity, typically query first and then modify relative to the current value:
cpu_set_t set; sched_getaffinity(0, sizeof(set), &set); CPU_CLR(1, &set); // Remove CPU 1 sched_setaffinity(0, sizeof(set), &set); - Monitoring tools: System monitoring tools often use this function to show task CPU binding and help analyze load distribution.
- Debugging affinity issues: If a task's performance is anomalous, query affinity to check whether it is bound to an overloaded CPU.
- Count CPUs:
cpu_set_t set; sched_getaffinity(0, sizeof(set), &set); int count = CPU_COUNT(&set); printf("Task can run on %d CPUs\n", count); - Default affinity: Newly created tasks default to all CPUs allowed (unless the parent has restrictions). All CPU bits will be 1.
- CPU_SETSIZE constant:
CPU_SETSIZEdefines the maximum CPU count supported bycpu_set_t(typically 1024), but the actual number of system CPUs may be smaller. - CPU set macros:
CPU_ZERO(&set): Clear the setCPU_SET(cpu, &set): Add a CPUCPU_CLR(cpu, &set): Remove a CPUCPU_ISSET(cpu, &set): Check whether a CPU is in the setCPU_COUNT(&set): Count CPUs in the setCPU_EQUAL(&set1, &set2): Compare two sets for equality
- Portability:
cpu_set_tsizes differ across systems; always usesizeof(cpu_set_t)rather than hardcoding. - Used with getcpu:
int cpu = sched_getcpu(); cpu_set_t set; sched_getaffinity(0, sizeof(set), &set); assert(CPU_ISSET(cpu, &set)); // Current CPU should be in the affinity set
POSIX Compatibility: Compatible with the Linux extension interface (non-POSIX standard, but widely supported).
sched_cpucount
int sched_cpucount(const cpu_set_t *set);
Counts the number of CPUs in a CPU set. Equivalent to Linux's CPU_COUNT() macro.
Parameters:
setPointer to a CPU set.
Returns:
Returns the number of CPUs set in the set.
Notes:
- On non-SMP systems, the macro is defined to always return 1.
- Typical usage:
cpu_set_t set; sched_getaffinity(0, sizeof(set), &set); printf("Can run on %d CPUs\n", sched_cpucount(&set));
POSIX Compatibility: Compatible with the Linux extension interface (non-POSIX standard).
Debugging and Diagnostics
sched_backtrace
int sched_backtrace(pid_t tid, void **buffer, int size, int skip);
Gets the call stack backtrace of the specified task. Stack frame addresses are stored into the buffer array for debugging and crash analysis.
Parameters:
tidPID of the target task. 0 means the current task.bufferPointer to an array of pointers that stores stack frame addresses.sizeMaximum capacity ofbuffer(number of elements).skipNumber of stack frames to skip (from the top), used to filter the debug framework's own frames.
Returns:
Returns the number of frames actually obtained (non-negative). If the returned value equals size, more frames may exist.
Notes:
- Requires
CONFIG_SCHED_BACKTRACEto be enabled. When disabled, the macro expands to return 0. - When getting another task's call stack, the target task should be blocked; otherwise results may be inaccurate.
POSIX Compatibility: openvela/NuttX extension interface (non-POSIX standard).
sched_dumpstack
void sched_dumpstack(pid_t tid);
Prints the call stack of the specified task to the system log. Internally calls sched_backtrace() to get frames and then formats the output.
Parameters:
tidPID of the target task. 0 means the current task.
Returns:
No return value.
Notes:
- Mainly used for debugging and crash analysis; output goes to the system log (syslog).
- Requires
CONFIG_SCHED_BACKTRACEto be enabled.
POSIX Compatibility: openvela/NuttX extension interface (non-POSIX standard).