IPC · Section 2
futex(2)
Fast userspace mutex — the kernel primitive behind every pthread mutex, condvar, and modern lock on Linux.
Signature
#include <linux/futex.h>
#include <sys/syscall.h>
long futex(uint32_t * uaddr, int futex_op, uint32_t val, const struct timespec * timeout, uint32_t * uaddr2, uint32_t val3);- uaddr
- Pointer to the futex word (uint32_t) in the calling process's userspace memory.
- futex_op
- Operation code (see flags table). The bottom 7 bits select the operation; high bits carry FUTEX_PRIVATE_FLAG / FUTEX_CLOCK_REALTIME.
- val
- Operation-specific value. For FUTEX_WAIT it is the value uaddr must equal for the wait to actually take effect; for FUTEX_WAKE it is the maximum number of waiters to wake.
- timeout
- For WAIT-family operations, a relative or absolute timeout. NULL means wait forever.
- uaddr2
- For requeue operations (FUTEX_REQUEUE, FUTEX_CMP_REQUEUE), the secondary futex address. Otherwise unused.
- val3
- Operation-specific. For FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET it carries the bit mask for selective wakeup.
Description
futex() is the building block for every userspace synchronisation primitive on Linux. The idea: the lock state (a uint32_t at uaddr) is read and modified by the application from userspace using atomic instructions; the kernel is only invoked for the slow path — when a thread must sleep waiting for the lock, or wake another thread. FUTEX_WAIT atomically checks that *uaddr still equals val and, if so, blocks the calling thread; FUTEX_WAKE wakes up to val threads waiting on uaddr. FUTEX_PRIVATE_FLAG marks a futex as process-private (the kernel can skip the inter-process address-translation overhead), which all modern pthread implementations use by default. Many operations exist (REQUEUE for condvar broadcast, LOCK_PI / UNLOCK_PI for priority-inheritance mutexes, WAIT_BITSET for selective wakeups, FUTEX_WAKE_OP for atomic compound operations); see futex(2) for the full set.
Architecture mapping
| Architecture | Number | ABI | Entry point |
|---|---|---|---|
| x86 (i386) | 240 | i386 | sys_futex_time32 |
| x64 (x86_64) | 202 | common | sys_futex |
| ARM64 (aarch64) | 98 | — | — |
Kernel history
Introduced in Linux 2.5.7.
2.5.7
futex() was added in 2.5.7 (released in 2.6) to replace the older heavyweight semaphore syscalls. It is the single most consequential locking primitive in modern Linux, making userspace mutexes essentially zero-cost in the uncontended case.
2.6.7
Robust futex support added: set_robust_list / get_robust_list let a thread register a per-thread list of mutexes it holds. If the thread dies (or its process is killed) mid-section, the kernel walks the list and marks each as owner-died, letting other waiters detect and recover.
2.6.18
PI mutex operations (FUTEX_LOCK_PI / FUTEX_UNLOCK_PI / FUTEX_TRYLOCK_PI) were added to support real-time threads — the kernel boosts a lock holder's scheduling priority when a higher-priority waiter blocks, avoiding classic priority inversion deadlocks.
5.16
futex_waitv() landed: wait on multiple futexes simultaneously and wake on the first ready one — used by emulators and game engines for efficient multi-event waits without the FUTEX_REQUEUE acrobatics.
seccomp & containers
Docker default profile
Allowed
Podman default profile
Allowed
futex() is on every default profile and cannot be blocked: every pthread call uses it. The newer futex_waitv() (Linux 5.16) is similarly fundamental and should be allowed. No useful argument-level filtering — the op codes and addresses are application-internal.
libseccomp
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex_waitv), 0);strace example
$ strace -e futex -p $(pidof firefox) 2>&1 | head -5
futex(0x7f3a14002910, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x7f3a14002914, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55d4e8c1d0a0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=…}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUTfutex() lines dominate any strace of a multithreaded program — use -c summary mode to see whether contention dominates time. A 'stuck' program often shows a thread in FUTEX_WAIT with no matching FUTEX_WAKE; comparing the uaddr against /proc/<pid>/maps and the holding thread's stack (via gdb) identifies the deadlock. Filter with -e trace=futex if you specifically want to see the contention pattern.
Security & observability
futex() is not generally a security primitive — it's so universal that its absence is more interesting than its presence. That said, kernel CVEs in the futex implementation have been impactful historically (the famous Dirty COW pivot used futex semantics, and the CVE-2014-3153 PI race gave root on Android for years), so kernel-version posture matters. eBPF tracepoint sys_enter_futex is too noisy for general monitoring — most workloads emit thousands per second. For pthread debugging, the userspace approach (glibc's mutex debug mode, or rr) is more productive than syscall tracing.
Errors
- EACCES
- —
- EAGAIN
- FUTEX_WAIT: *uaddr was not equal to val at the moment the kernel checked (a wake happened just before; the userspace path should retry).
- EFAULT
- uaddr is not in the calling process's address space.
- EINTR
- Wait was interrupted by a signal. Userspace retries.
- EINVAL
- Bad operation, misaligned uaddr (must be 4-byte aligned), or invalid combination of flags.
- ENOSYS
- —
- ETIMEDOUT
- FUTEX_WAIT with timeout elapsed before being woken.
Flags
- FUTEX_WAIT
- 0
- Atomically check *uaddr == val and, if so, sleep until woken (or until timeout). The lock-contention path of pthread_mutex_lock().
- FUTEX_WAKE
- 1
- Wake up to val threads currently waiting on uaddr. The wake-up path of pthread_mutex_unlock().
- FUTEX_REQUEUE
- 3
- Wake up to val waiters, then move up to val2 (in timeout's slot) more waiters from uaddr to uaddr2. The broadcast path of pthread_cond_broadcast().
- FUTEX_CMP_REQUEUE
- 4
- —
- FUTEX_WAKE_OP
- 5
- —
- FUTEX_LOCK_PI
- 6
- Priority-inheritance variant — when a high-priority thread blocks on a futex held by a low-priority thread, the kernel temporarily boosts the holder's priority to avoid priority inversion.
- FUTEX_UNLOCK_PI
- 7
- —
- FUTEX_TRYLOCK_PI
- 8
- —
- FUTEX_WAIT_BITSET
- 9
- —
- FUTEX_WAKE_BITSET
- 10
- —
- FUTEX_PRIVATE_FLAG
- 128
- Process-private optimisation — avoids the cost of resolving the address into the kernel's global futex hash, used by every modern glibc.
- FUTEX_CLOCK_REALTIME
- 256
- —