Process & Thread · Section 2
clone(2)
Create a new process that can selectively share memory, file descriptors, namespaces, and other resources with its parent.
Signature
#include <sched.h>
long clone(int (*)(void *) fn, void * stack, int flags, void * arg, pid_t * parent_tid, void * tls, pid_t * child_tid);- fn
- Entry function called in the child (glibc wrapper). The raw syscall does not take a function pointer.
- stack
- Pointer to the highest byte of the child's stack. The child runs on this stack; for processes (non-CLONE_VM), pass NULL.
- flags
- Bitmask of CLONE_* sharing flags, ORed with the termination signal sent to the parent (typically SIGCHLD).
- arg
- Argument passed to fn in the child (glibc wrapper only).
- parent_tid
- If CLONE_PARENT_SETTID, the kernel writes the child's TID here, visible to the parent.
- tls
- Thread-local-storage descriptor used with CLONE_SETTLS; architecture-specific.
- child_tid
- If CLONE_CHILD_SETTID, written in the child's address space; if CLONE_CHILD_CLEARTID, cleared and futex-woken on exit (pthread_join wakeup).
Description
clone() is the Linux primitive behind fork(), pthread_create(), and every container runtime. Unlike POSIX fork(), the new task can selectively share the parent's address space (CLONE_VM), file descriptor table (CLONE_FILES), filesystem context (CLONE_FS), signal handlers (CLONE_SIGHAND), and namespaces (CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWUSER, etc.). With CLONE_THREAD plus CLONE_VM|CLONE_SIGHAND it produces a thread; without those flags it produces a process. clone() returns the child's TID in the parent and 0 in the child, or -1 with errno set. The glibc wrapper takes a function pointer fn that becomes the child's entry; the raw syscall in inline assembly is structured differently and is documented under "C library/kernel differences" in the man page.
Architecture mapping
| Architecture | Number | ABI | Entry point |
|---|---|---|---|
| x86 (i386) | 120 | i386 | sys_clone |
| x64 (x86_64) | 56 | common | sys_clone |
| ARM64 (aarch64) | 220 | — | sys_clone |
Kernel history
Introduced in Linux 2.0.
2.0
clone() was introduced in 2.0 as the Linux-specific generalisation of fork(), enabling threads with shared address spaces — the building block on which NPTL was later built.
2.4.20
CLONE_NEWNS introduced mount namespaces, the first of the Linux namespace primitives later extended into the full container toolkit.
2.6.24
CLONE_NEWPID and CLONE_NEWUSER landed, completing the namespace set required for full containers; user namespaces in particular made rootless containers possible.
5.3
clone3() was added with a clone_args struct passed by reference, expanding flags beyond the 32-bit register limit and enabling future extension fields without further syscalls.
seccomp & containers
Docker default profile
Allowed
Podman default profile
Allowed
clone() is on Docker / Podman default allow-lists, but the namespace-creation flags (CLONE_NEWUSER, CLONE_NEWNET, CLONE_NEWPID, CLONE_NEWNS, etc.) are filtered out by the default profiles via argument filtering. Allowing these inside a container is what enables nested containers — and what enables a compromised container to construct a new user namespace as PID 1 and elevate to root inside it. For workloads that have no legitimate need to spawn containers, deny the namespace-creation flag set explicitly.
libseccomp
// Allow clone but deny namespace-creation flags (defence-in-depth for containers)
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(clone),
1, SCMP_A0(SCMP_CMP_MASKED_EQ,
CLONE_NEWUSER|CLONE_NEWNET|CLONE_NEWPID|CLONE_NEWNS, 0));strace example
$ strace -f -e clone /bin/sh -c 'true'
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb2a1c7da10) = 14982
[pid 14982] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14982, si_uid=1000, si_status=0, …} ---strace -f follows clone() children automatically; the syntax in the output decodes the flag bitmask. Note that pthread_create shows up as clone(child_stack=…, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, …) — a useful fingerprint for distinguishing thread vs process creation.
Security & observability
clone() with CLONE_NEWUSER is the foundation of user-namespace privilege-escalation primitives — a process that gains CLONE_NEWUSER can become root inside the new namespace and, on misconfigured systems, leverage that to break out. The eBPF tracepoint sys_enter_clone with arg0 (flags) is the right observation point; rule on the namespace-creation bitmask. Container-runtime detections (Falco, Tracee) lean on this. Rootkits also use clone() to fork hidden background processes that share the parent's fd table for stealth (no new /proc/<pid>/exe). For host-level hardening, /proc/sys/kernel/unprivileged_userns_clone = 0 prevents unprivileged user-namespace creation entirely on Debian-based distros that ship that knob.
Errors
- EAGAIN
- RLIMIT_NPROC reached or temporary kernel resource exhaustion.
- EINVAL
- An invalid flag combination (e.g. CLONE_THREAD without CLONE_VM and CLONE_SIGHAND).
- ENOMEM
- Insufficient kernel memory to allocate task structures.
- ENOSPC
- —
- EPERM
- CLONE_NEWUSER requested without privilege (unless user_namespaces are enabled for unprivileged callers), or CLONE_NEW* requested without CAP_SYS_ADMIN in the parent userns.
- EUSERS
- —
Flags
- CLONE_VM
- 0x00000100
- Share the address space with the parent — this is what makes the new task a thread.
- CLONE_FS
- 0x00000200
- —
- CLONE_FILES
- 0x00000400
- Share the file descriptor table. Closing an fd in one task closes it in all.
- CLONE_SIGHAND
- 0x00000800
- —
- CLONE_PIDFD
- 0x00001000
- Allocate a pidfd referring to the child and store it in parent_tid (modern alternative to PID-based signalling).
- CLONE_PTRACE
- 0x00002000
- —
- CLONE_VFORK
- 0x00004000
- —
- CLONE_PARENT
- 0x00008000
- —
- CLONE_THREAD
- 0x00010000
- Place the new task in the same thread group (same TGID/PID); requires CLONE_VM and CLONE_SIGHAND.
- CLONE_NEWNS
- 0x00020000
- Create a new mount namespace.
- CLONE_SYSVSEM
- 0x00040000
- —
- CLONE_SETTLS
- 0x00080000
- —
- CLONE_PARENT_SETTID
- 0x00100000
- —
- CLONE_CHILD_CLEARTID
- 0x00200000
- —
- CLONE_CHILD_SETTID
- 0x01000000
- —
- CLONE_NEWCGROUP
- 0x02000000
- —
- CLONE_NEWUTS
- 0x04000000
- —
- CLONE_NEWIPC
- 0x08000000
- —
- CLONE_NEWUSER
- 0x10000000
- Create a new user namespace — the basis for rootless containers.
- CLONE_NEWPID
- 0x20000000
- Create a new PID namespace; the child becomes PID 1 in it.
- CLONE_NEWNET
- 0x40000000
- Create a new network namespace with its own interfaces and routing table.
- CLONE_IO
- 0x80000000
- —