#include <sys/prctl.h> int prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5);
If the capability specified in arg2 is not valid, then the call fails with the error EINVAL.
The call fails with the error: EPERM if the calling thread does not have the CAP_SETPCAP; EINVAL if arg2 does not represent a valid capability; or EINVAL if file capabilities are not enabled in the kernel, in which case bounding sets are not supported.
For more information, see the kernel source file Documentation/prctl/no_new_privs.txt.
For further information, see the kernel source file Documentation/security/Yama.txt.
The seccomp mode is selected via arg2. (The seccomp constants are defined in <linux/seccomp.h>.)
With arg2 set to SECCOMP_MODE_STRICT, the only system calls that the thread is permitted to make are read(2) , write(2) , _exit(2), and sigreturn(2) . Other system calls result in the delivery of a SIGKILL signal. Strict secure computing mode is useful for number-crunching applications that may need to execute untrusted byte code, perhaps obtained by reading from a pipe or socket. This operation is available only if the kernel is configured with CONFIG_SECCOMP enabled.
With arg2 set to SECCOMP_MODE_FILTER (since Linux 3.5), the system calls allowed are defined by a pointer to a Berkeley Packet Filter passed in arg3. This argument is a pointer to struct sock_fprog; it can be designed to filter arbitrary system calls and system call arguments. This mode is available only if the kernel is configured with CONFIG_SECCOMP_FILTER enabled.
If SECCOMP_MODE_FILTER filters permit fork(2) , then the seccomp mode is inherited by children created by fork(2) ; if execve(2) is permitted, then the seccomp mode is preserved across execve(2) . If the filters permit prctl() calls, then additional filters can be added; they are run in order until the first non-allow result is seen.
For further information, see the kernel source file Documentation/prctl/seccomp_filter.txt.
Since Linux 3.8, the Seccomp field of the /proc/[pid]/status file provides a method of obtaining the same information, without the risk that the process is killed; see proc(5) .
The timer expirations affected by timer slack are those set by select(2) , pselect(2) , poll(2) , ppoll(2) , epoll_wait(2) , epoll_pwait(2) , clock_nanosleep(2) , nanosleep(2) , and futex(2) (and thus the library functions implemented via futexes, including pthread_cond_timedwait(3) , pthread_mutex_timedlock(3) , pthread_rwlock_timedrdlock(3) , pthread_rwlock_timedwrlock(3) , and sem_timedwait(3) ).
Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2) ).
Each thread has two associated timer slack values: a "default" value, and a "current" value. The current value is the one that governs grouping of timer expirations. When a new thread is created, the two timer slack values are made the same as the current value of the creating thread. Thereafter, a thread can adjust its current timer slack value via PR_SET_TIMERSLACK (the default value can’t be changed). The timer slack values of init (PID 1), the ancestor of all processes, are 50,000 nanoseconds (50 microseconds). The timer slack values are preserved across execve(2) .
The following options are available since Linux 3.5.
MPX is a hardware-assisted mechanism for performing bounds checking on pointers. It consists of a set of registers storing bounds information and a set of special instruction prefixes that tell the CPU on which instructions it should do bounds enforcement. There is a limited number of these registers and when there are more pointers than registers, their contents must be "spilled" into a set of tables. These tables are called "bounds tables" and the MPX prctl() operations control whether the kernel manages their allocation and freeing.
When management is enabled, the kernel will take over allocation and freeing of the bounds tables. It does this by trapping the #BR exceptions that result at first use of missing bounds tables and instead of delivering the exception to user space, it allocates the table and populates the bounds directory with the location of the new table. For freeing, the kernel checks to see if bounds tables are present for memory which is not allocated, and frees them if so.
Before enabling MPX management using PR_MPX_ENABLE_MANAGEMENT, the application must first have allocated a user-space buffer for the bounds directory and placed the location of that directory in the bndcfgu register.
These calls will fail if the CPU or kernel does not support MPX. Kernel support for MPX is enabled via the CONFIG_X86_INTEL_MPX configuration option. You can check whether the CPU supports MPX by looking for the ’mpx’ CPUID bit, like with the following command:
cat /proc/cpuinfo | grep ’ mpx ’
A thread may not switch in or out of long (64-bit) mode while MPX is enabled.
All threads in a process are affected by these calls.
The child of a fork(2) inherits the state of MPX management. During execve(2) , MPX management is reset to a state as if PR_MPX_DISABLE_MANAGEMENT had been called.
For further information on Intel MPX, see the kernel source file Documentation/x86/intel_mpx.txt.
ptrdiff_t prctl(int option, int arg2, int arg3);
and options to get the maximum number of processes per user, get the maximum number of processors the calling process can use, find out whether a specified process is currently blocked, get or set the maximum stack size, and so on.