Memory · Section 2

mprotect(2)

Change the access protection of a memory region.

Signature

#include <sys/mman.h>

int mprotect(void * addr, size_t len, int prot);

addr: Page-aligned start address.
len: Length in bytes. Rounded up to a page-size multiple.
prot: Desired protection: bitwise OR of PROT_READ, PROT_WRITE, PROT_EXEC, or PROT_NONE (alone).

Description

mprotect() changes the access protection of the pages in the range [addr, addr+len) to prot. addr must be page-aligned; len is rounded up to a page-size multiple. The range must lie within mappings created by mmap() (or the process's stack/heap). On success returns 0; -1 with errno on failure. mprotect() is the runtime knob behind every dynamic-loader symbol-resolution path (the linker writes the GOT, then mprotect()s it read-only — RELRO), behind every JIT (write the code, then flip to PROT_READ|PROT_EXEC), and behind stack guards that flip a page to PROT_NONE around hot regions. It is also the canonical step where compromised processes flip a writable shellcode buffer to executable.

Architecture mapping

Architecture	Number	ABI	Entry point
x86 (i386)	125	i386	sys_mprotect
x64 (x86_64)	10	common	sys_mprotect
ARM64 (aarch64)	226	—	sys_mprotect

Kernel history

Introduced in Linux 1.0.

1.0
mprotect() has been part of Linux since 1.0 with POSIX semantics.
4.9
pkey_mprotect() was introduced (Linux 4.9, Intel MPK on supported hardware) to attach a memory protection key to a region — letting a thread enable/disable protection on a group of pages with a single register write rather than per-syscall mprotect calls. The basis for libmpk and arm64 PAuth-based variants.
5.13
mprotect()'s behaviour at the boundary of stack VMAs (PROT_GROWSDOWN) was tightened to prevent ambiguous extension that allowed older kernels to be tricked into expanding the stack into adjacent allocations.

seccomp & containers

Docker default profile

Allowed

Podman default profile

Allowed

mprotect() is on all default profiles and cannot be blocked outright — every program calls it via the dynamic loader. The high-value hardening is argument filtering on PROT_EXEC: blocking mprotect(..., PROT_EXEC) at the seccomp layer enforces W^X across the process, blocking the canonical shellcode-loading step (mmap RW → write → mprotect RX). Combine with the matching mmap() PROT_EXEC mask for full coverage. Workloads that legitimately JIT (Node, Java HotSpot, V8) need an exemption; pure data-handlers (a database, an Nginx without modules) do not.

libseccomp

// Block mprotect(..., PROT_EXEC) to enforce W^X at the seccomp layer
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(mprotect),
    1, SCMP_A2(SCMP_CMP_MASKED_EQ, PROT_EXEC, PROT_EXEC));

strace example

$ strace -e mprotect /bin/true 2>&1 | head -5
mprotect(0x7f8c2a1d0000, 16384, PROT_READ) = 0
mprotect(0x7f8c2a212000, 8192, PROT_READ) = 0
mprotect(0x55d4e8e4a000, 4096, PROT_READ) = 0

strace decodes prot symbolically (PROT_READ|PROT_WRITE|PROT_EXEC). A startup sequence on a normal binary shows ~10–30 mprotect() calls from ld.so for RELRO and TLS; anything beyond is application activity. To isolate JIT activity, filter -e trace=mprotect after the first few seconds of process life.

Security & observability

mprotect() is the post-exploitation primitive that turns a writable buffer into an executable one — almost every shellcode loader on Linux ends with mprotect(PROT_READ|PROT_EXEC). eBPF tracepoint sys_enter_mprotect with prot=PROT_EXEC is a high-signal feed; pair with the process binary fingerprint to whitelist known JITs. RELRO writes (linker turning the GOT read-only at startup) generate mprotect() calls during process init; these are benign and predictable. /proc/<pid>/maps shows current permissions — comparing snapshots before/after suspicious activity reveals which regions changed.

Errors

EACCES: The requested protection is incompatible with the underlying mapping (e.g. PROT_WRITE on a mapping of a file opened O_RDONLY with MAP_SHARED).
EINVAL: addr not page-aligned, the range does not cover existing mappings, or prot is invalid.
ENOMEM: Internal kernel allocation failure (rare; usually means vm.max_map_count is exhausted because the call would split a VMA).

Flags

PROT_READ: 0x1; Pages may be read.
PROT_WRITE: 0x2; Pages may be written.
PROT_EXEC: 0x4; Pages may be executed. The single most security-relevant flag — see W^X.
PROT_NONE: 0x0; No access. Generates SIGSEGV on any reference. Useful for guard pages.
PROT_GROWSDOWN: 0x01000000; —
PROT_GROWSUP: 0x02000000; —