Skip to content
/linux-syscalls

Process & Thread · Section 2

clone(2)

Create a new process that can selectively share memory, file descriptors, namespaces, and other resources with its parent.

Signature

#include <sched.h>

long clone(int (*)(void *) fn, void * stack, int flags, void * arg, pid_t * parent_tid, void * tls, pid_t * child_tid);
fn
Entry function called in the child (glibc wrapper). The raw syscall does not take a function pointer.
stack
Pointer to the highest byte of the child's stack. The child runs on this stack; for processes (non-CLONE_VM), pass NULL.
flags
Bitmask of CLONE_* sharing flags, ORed with the termination signal sent to the parent (typically SIGCHLD).
arg
Argument passed to fn in the child (glibc wrapper only).
parent_tid
If CLONE_PARENT_SETTID, the kernel writes the child's TID here, visible to the parent.
tls
Thread-local-storage descriptor used with CLONE_SETTLS; architecture-specific.
child_tid
If CLONE_CHILD_SETTID, written in the child's address space; if CLONE_CHILD_CLEARTID, cleared and futex-woken on exit (pthread_join wakeup).

Description

clone() is the Linux primitive behind fork(), pthread_create(), and every container runtime. Unlike POSIX fork(), the new task can selectively share the parent's address space (CLONE_VM), file descriptor table (CLONE_FILES), filesystem context (CLONE_FS), signal handlers (CLONE_SIGHAND), and namespaces (CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWUSER, etc.). With CLONE_THREAD plus CLONE_VM|CLONE_SIGHAND it produces a thread; without those flags it produces a process. clone() returns the child's TID in the parent and 0 in the child, or -1 with errno set. The glibc wrapper takes a function pointer fn that becomes the child's entry; the raw syscall in inline assembly is structured differently and is documented under "C library/kernel differences" in the man page.

Architecture mapping

ArchitectureNumberABIEntry point
x86 (i386)120i386sys_clone
x64 (x86_64)56commonsys_clone
ARM64 (aarch64)220sys_clone

Kernel history

Introduced in Linux 2.0.

  1. 2.0

    clone() was introduced in 2.0 as the Linux-specific generalisation of fork(), enabling threads with shared address spaces — the building block on which NPTL was later built.

  2. 2.4.20

    CLONE_NEWNS introduced mount namespaces, the first of the Linux namespace primitives later extended into the full container toolkit.

  3. 2.6.24

    CLONE_NEWPID and CLONE_NEWUSER landed, completing the namespace set required for full containers; user namespaces in particular made rootless containers possible.

  4. 5.3

    clone3() was added with a clone_args struct passed by reference, expanding flags beyond the 32-bit register limit and enabling future extension fields without further syscalls.

seccomp & containers

Docker default profile

Allowed

Podman default profile

Allowed

clone() is on Docker / Podman default allow-lists, but the namespace-creation flags (CLONE_NEWUSER, CLONE_NEWNET, CLONE_NEWPID, CLONE_NEWNS, etc.) are filtered out by the default profiles via argument filtering. Allowing these inside a container is what enables nested containers — and what enables a compromised container to construct a new user namespace as PID 1 and elevate to root inside it. For workloads that have no legitimate need to spawn containers, deny the namespace-creation flag set explicitly.

libseccomp

// Allow clone but deny namespace-creation flags (defence-in-depth for containers)
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(clone),
    1, SCMP_A0(SCMP_CMP_MASKED_EQ,
               CLONE_NEWUSER|CLONE_NEWNET|CLONE_NEWPID|CLONE_NEWNS, 0));

strace example

$ strace -f -e clone /bin/sh -c 'true'
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb2a1c7da10) = 14982
[pid 14982] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14982, si_uid=1000, si_status=0, …} ---

strace -f follows clone() children automatically; the syntax in the output decodes the flag bitmask. Note that pthread_create shows up as clone(child_stack=…, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, …) — a useful fingerprint for distinguishing thread vs process creation.

Security & observability

clone() with CLONE_NEWUSER is the foundation of user-namespace privilege-escalation primitives — a process that gains CLONE_NEWUSER can become root inside the new namespace and, on misconfigured systems, leverage that to break out. The eBPF tracepoint sys_enter_clone with arg0 (flags) is the right observation point; rule on the namespace-creation bitmask. Container-runtime detections (Falco, Tracee) lean on this. Rootkits also use clone() to fork hidden background processes that share the parent's fd table for stealth (no new /proc/<pid>/exe). For host-level hardening, /proc/sys/kernel/unprivileged_userns_clone = 0 prevents unprivileged user-namespace creation entirely on Debian-based distros that ship that knob.

Errors

EAGAIN
RLIMIT_NPROC reached or temporary kernel resource exhaustion.
EINVAL
An invalid flag combination (e.g. CLONE_THREAD without CLONE_VM and CLONE_SIGHAND).
ENOMEM
Insufficient kernel memory to allocate task structures.
ENOSPC
EPERM
CLONE_NEWUSER requested without privilege (unless user_namespaces are enabled for unprivileged callers), or CLONE_NEW* requested without CAP_SYS_ADMIN in the parent userns.
EUSERS

Flags

CLONE_VM
0x00000100
Share the address space with the parent — this is what makes the new task a thread.
CLONE_FS
0x00000200
CLONE_FILES
0x00000400
Share the file descriptor table. Closing an fd in one task closes it in all.
CLONE_SIGHAND
0x00000800
CLONE_PIDFD
0x00001000
Allocate a pidfd referring to the child and store it in parent_tid (modern alternative to PID-based signalling).
CLONE_PTRACE
0x00002000
CLONE_VFORK
0x00004000
CLONE_PARENT
0x00008000
CLONE_THREAD
0x00010000
Place the new task in the same thread group (same TGID/PID); requires CLONE_VM and CLONE_SIGHAND.
CLONE_NEWNS
0x00020000
Create a new mount namespace.
CLONE_SYSVSEM
0x00040000
CLONE_SETTLS
0x00080000
CLONE_PARENT_SETTID
0x00100000
CLONE_CHILD_CLEARTID
0x00200000
CLONE_CHILD_SETTID
0x01000000
CLONE_NEWCGROUP
0x02000000
CLONE_NEWUTS
0x04000000
CLONE_NEWIPC
0x08000000
CLONE_NEWUSER
0x10000000
Create a new user namespace — the basis for rootless containers.
CLONE_NEWPID
0x20000000
Create a new PID namespace; the child becomes PID 1 in it.
CLONE_NEWNET
0x40000000
Create a new network namespace with its own interfaces and routing table.
CLONE_IO
0x80000000

Related syscalls