Skip to content
/linux-syscalls

IPC · Section 2

futex(2)

Fast userspace mutex — the kernel primitive behind every pthread mutex, condvar, and modern lock on Linux.

Signature

#include <linux/futex.h>
#include <sys/syscall.h>

long futex(uint32_t * uaddr, int futex_op, uint32_t val, const struct timespec * timeout, uint32_t * uaddr2, uint32_t val3);
uaddr
Pointer to the futex word (uint32_t) in the calling process's userspace memory.
futex_op
Operation code (see flags table). The bottom 7 bits select the operation; high bits carry FUTEX_PRIVATE_FLAG / FUTEX_CLOCK_REALTIME.
val
Operation-specific value. For FUTEX_WAIT it is the value uaddr must equal for the wait to actually take effect; for FUTEX_WAKE it is the maximum number of waiters to wake.
timeout
For WAIT-family operations, a relative or absolute timeout. NULL means wait forever.
uaddr2
For requeue operations (FUTEX_REQUEUE, FUTEX_CMP_REQUEUE), the secondary futex address. Otherwise unused.
val3
Operation-specific. For FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET it carries the bit mask for selective wakeup.

Description

futex() is the building block for every userspace synchronisation primitive on Linux. The idea: the lock state (a uint32_t at uaddr) is read and modified by the application from userspace using atomic instructions; the kernel is only invoked for the slow path — when a thread must sleep waiting for the lock, or wake another thread. FUTEX_WAIT atomically checks that *uaddr still equals val and, if so, blocks the calling thread; FUTEX_WAKE wakes up to val threads waiting on uaddr. FUTEX_PRIVATE_FLAG marks a futex as process-private (the kernel can skip the inter-process address-translation overhead), which all modern pthread implementations use by default. Many operations exist (REQUEUE for condvar broadcast, LOCK_PI / UNLOCK_PI for priority-inheritance mutexes, WAIT_BITSET for selective wakeups, FUTEX_WAKE_OP for atomic compound operations); see futex(2) for the full set.

Architecture mapping

ArchitectureNumberABIEntry point
x86 (i386)240i386sys_futex_time32
x64 (x86_64)202commonsys_futex
ARM64 (aarch64)98

Kernel history

Introduced in Linux 2.5.7.

  1. 2.5.7

    futex() was added in 2.5.7 (released in 2.6) to replace the older heavyweight semaphore syscalls. It is the single most consequential locking primitive in modern Linux, making userspace mutexes essentially zero-cost in the uncontended case.

  2. 2.6.7

    Robust futex support added: set_robust_list / get_robust_list let a thread register a per-thread list of mutexes it holds. If the thread dies (or its process is killed) mid-section, the kernel walks the list and marks each as owner-died, letting other waiters detect and recover.

  3. 2.6.18

    PI mutex operations (FUTEX_LOCK_PI / FUTEX_UNLOCK_PI / FUTEX_TRYLOCK_PI) were added to support real-time threads — the kernel boosts a lock holder's scheduling priority when a higher-priority waiter blocks, avoiding classic priority inversion deadlocks.

  4. 5.16

    futex_waitv() landed: wait on multiple futexes simultaneously and wake on the first ready one — used by emulators and game engines for efficient multi-event waits without the FUTEX_REQUEUE acrobatics.

seccomp & containers

Docker default profile

Allowed

Podman default profile

Allowed

futex() is on every default profile and cannot be blocked: every pthread call uses it. The newer futex_waitv() (Linux 5.16) is similarly fundamental and should be allowed. No useful argument-level filtering — the op codes and addresses are application-internal.

libseccomp

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex),    0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex_waitv), 0);

strace example

$ strace -e futex -p $(pidof firefox) 2>&1 | head -5
futex(0x7f3a14002910, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x7f3a14002914, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55d4e8c1d0a0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=…}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT

futex() lines dominate any strace of a multithreaded program — use -c summary mode to see whether contention dominates time. A 'stuck' program often shows a thread in FUTEX_WAIT with no matching FUTEX_WAKE; comparing the uaddr against /proc/<pid>/maps and the holding thread's stack (via gdb) identifies the deadlock. Filter with -e trace=futex if you specifically want to see the contention pattern.

Security & observability

futex() is not generally a security primitive — it's so universal that its absence is more interesting than its presence. That said, kernel CVEs in the futex implementation have been impactful historically (the famous Dirty COW pivot used futex semantics, and the CVE-2014-3153 PI race gave root on Android for years), so kernel-version posture matters. eBPF tracepoint sys_enter_futex is too noisy for general monitoring — most workloads emit thousands per second. For pthread debugging, the userspace approach (glibc's mutex debug mode, or rr) is more productive than syscall tracing.

Errors

EACCES
EAGAIN
FUTEX_WAIT: *uaddr was not equal to val at the moment the kernel checked (a wake happened just before; the userspace path should retry).
EFAULT
uaddr is not in the calling process's address space.
EINTR
Wait was interrupted by a signal. Userspace retries.
EINVAL
Bad operation, misaligned uaddr (must be 4-byte aligned), or invalid combination of flags.
ENOSYS
ETIMEDOUT
FUTEX_WAIT with timeout elapsed before being woken.

Flags

FUTEX_WAIT
0
Atomically check *uaddr == val and, if so, sleep until woken (or until timeout). The lock-contention path of pthread_mutex_lock().
FUTEX_WAKE
1
Wake up to val threads currently waiting on uaddr. The wake-up path of pthread_mutex_unlock().
FUTEX_REQUEUE
3
Wake up to val waiters, then move up to val2 (in timeout's slot) more waiters from uaddr to uaddr2. The broadcast path of pthread_cond_broadcast().
FUTEX_CMP_REQUEUE
4
FUTEX_WAKE_OP
5
FUTEX_LOCK_PI
6
Priority-inheritance variant — when a high-priority thread blocks on a futex held by a low-priority thread, the kernel temporarily boosts the holder's priority to avoid priority inversion.
FUTEX_UNLOCK_PI
7
FUTEX_TRYLOCK_PI
8
FUTEX_WAIT_BITSET
9
FUTEX_WAKE_BITSET
10
FUTEX_PRIVATE_FLAG
128
Process-private optimisation — avoids the cost of resolving the address into the kernel's global futex hash, used by every modern glibc.
FUTEX_CLOCK_REALTIME
256

Related syscalls