Process Lifecycle & States

Mar 16, 2026

Linux Mastery Road to Cloud — Domain 4

Before We Start

Every time you run a command, the kernel creates something called a process — a running instance of a program. That process gets born, does its job, and eventually dies. But between birth and death, a lot can happen. It can sleep, get stuck, turn into a zombie, or get orphaned. Understanding this lifecycle is not optional if you’re serious about Linux. These concepts show up in troubleshooting, in interviews, and in real production incidents.

Let’s go through it question by question.

1. What are the possible states a process can be in?

At any moment, every process on your system is in exactly one state. The kernel tracks this in the process table, and you can see it in the STAT column when you run ps aux.

Here are the states you need to know:

R — Running or Runnable The process is either actively executing on a CPU right now, or it’s sitting in the run queue waiting for its turn. Both cases get the same R label. The scheduler decides who gets CPU time and when.

S — Interruptible Sleep The process is waiting for something to happen — a keypress, a network packet, a timer to fire. It’s not doing anything. But importantly, it can be woken up by a signal. This is the most common state you’ll see for healthy, idle processes. Your nginx worker processes sitting there waiting for HTTP requests? They’re in S state.

D — Uninterruptible Sleep The process is waiting on a kernel-level operation — almost always disk I/O or a network filesystem call — and it cannot be interrupted. Not even by SIGKILL. This state is supposed to be very short-lived. If you see a process stuck in D state for a long time, something is wrong at the hardware or kernel level. More on this in section 6.

Z — Zombie The process has finished running. It’s dead. But its parent hasn’t collected its exit status yet, so the kernel keeps a tiny record of it in the process table. It’s not using CPU or memory, just a table entry. This is covered in detail in section 2.

T — Stopped The process has been paused. This happens when you press Ctrl+Z in the terminal (which sends SIGTSTP), or when a debugger like gdb pauses a process to inspect it. The process is frozen but still exists.

ps aux
# Look at the STAT column — you'll see these letters in action

2. What is a zombie process? How is it created, and how is it removed?

This one trips a lot of people up because the word “zombie” sounds scarier than the reality.

How a process actually exits

When a process finishes — either normally or because it crashed — it calls exit(). The kernel marks it as done and stores its exit code (was it successful? did it error out?). But the kernel doesn’t immediately erase it from the process table. It waits for the parent process to come and collect that exit status by calling wait().

This handoff makes sense. The parent might need to know: “Did my child process succeed?” If the kernel just threw away that information, the parent would have no way to find out.

The zombie state

Between the child calling exit() and the parent calling wait(), the child is in Z state — zombie. It’s dead. It’s not using CPU time, it’s not running any code. It’s just a row in the process table holding the exit code until the parent picks it up.

Once the parent calls wait(), the kernel removes the entry completely. That’s the process properly cleaned up.

When does a zombie become a problem?

A few zombies at any point in time is completely normal. The issue is when a parent process has a bug — it creates many child processes but never calls wait() to collect their exit codes. Those zombies pile up and eventually you can exhaust the process table. The system then can’t create new processes, which is a serious problem.

How do you remove a zombie?

Here’s the important part: you cannot kill a zombie directly. It’s already dead. Sending it SIGKILL does nothing because it has no code running to receive signals.

The only way to clean up a zombie is to get its parent to call wait(). In practice this means:

If the parent is a well-written program, it will call wait() eventually on its own
If the parent has a bug and refuses to, you kill the parent process — then init (PID 1) adopts the orphaned zombies and immediately reaps them

# Find zombie processes
ps aux | grep 'Z'

# Find the parent of a zombie (replace 1234 with the zombie's PID)
ps -o ppid= -p 1234

# If the parent is the problem, killing it forces init to clean up the zombies
kill -9 <parent_PID>

3. What is an orphan process? What happens after its parent dies?

An orphan process is a process whose parent has died before it did. The parent is gone, but the child is still running.

Think of it this way: you open a terminal, you start a long-running script, and then the terminal crashes. Your script was a child of that terminal’s shell. Now the shell is gone — the script is an orphan.

What happens to it?

The kernel doesn’t leave orphans homeless. The moment a parent process dies, all of its still-running children are immediately re-parented to PID 1 — which is init (or systemd on modern systems). Init becomes their new parent.

Init is specifically designed to handle this. It runs a wait() loop in the background, so when those orphaned processes eventually finish, init collects their exit codes properly. No zombie pileup.

Orphan vs. Zombie — know the difference

These two are commonly confused. Keep them straight:

Zombie: the child is dead, the parent is alive but hasn’t called wait() yet
Orphan: the child is alive, the parent is dead — the child gets adopted by init

A zombie is a cleanup problem. An orphan is usually not a problem at all — init handles it automatically.

4. What is the difference between a process and a thread?

A process is an independent program in execution. It has its own isolated memory space — its own code, its own heap, its own stack, its own file descriptors. If two processes need to talk to each other, they have to use specific inter-process communication mechanisms like pipes, sockets, or shared memory. They are isolated from each other by design.

A thread is a unit of execution inside a process. A single process can have multiple threads running concurrently. The key thing: all threads within a process share the same memory space. They share the heap, the same file descriptors, the same global variables. Only the stack is separate per thread.

Why does this matter in practice?

Because of that shared memory, threads are fast and cheap to communicate with each other — they can just read and write shared variables. But this also makes them dangerous. If two threads write to the same memory at the same time without coordination, you get race conditions and data corruption. This is why multi-threaded programming is notoriously tricky.

Processes are safer because they’re isolated, but communicating between them has overhead.

On Linux specifically, threads are implemented using a syscall called clone(). The kernel actually doesn’t have a strict separation between “process” and “thread” internally — a thread is just a clone() call that shares the parent’s memory space. What most people call threads, Linux calls LWPs — Light Weight Processes.

When you run ps aux, you typically see one line per process. To also see threads, use:

ps aux -L
# or
ps -eLf

5. When you run a command in the shell, what happens under the hood?

This is one of the most important things to understand about Linux. Every time you type a command and press Enter, a very specific sequence happens involving two syscalls: fork() and exec().

Step 1 — fork()

The shell calls fork(). This creates an almost exact copy of the shell process. Both processes are now running the same code, with the same memory, same file descriptors, same everything — except one thing: fork() returns a different value to each of them.

In the parent (the shell): fork() returns the child’s PID
In the child (the new copy): fork() returns 0

This return value is how the two processes know which one they are, and they immediately take different paths in the code.

Step 2 — exec()

The child process, now knowing it’s the child, calls exec() (specifically execve()). This replaces the child’s entire program with the new command you typed. The memory, the code, everything — overwritten with the new program. Only some things survive the exec, like file descriptors (which is how stdout/stdin stays connected to your terminal).

The parent (your shell) sits and waits using wait() for the child to finish. When it does, the shell collects the exit code, prints a new prompt, and waits for your next command.

This fork-then-exec pattern is called the fork-exec model, and it’s fundamental to how Unix/Linux works.

# You can observe this yourself with strace
strace -e trace=clone,execve ls

# You'll see the clone() call (Linux's fork) and then execve() replacing it with /bin/ls

Why fork instead of just launching the new program directly?

Because forking first gives you a chance to set things up in the child before exec — like redirecting stdout to a file, setting up pipes, changing the working directory, or adjusting permissions. All of that happens between fork and exec, in the child, before the new program even starts. It’s elegant.

6. What is the D state, and why can’t you kill it with SIGKILL?

This is the state that confuses people the most, especially when they’re staring at a frozen server and SIGKILL isn’t working.

What D state means

D stands for uninterruptible sleep. A process enters this state when it’s waiting on a kernel-level I/O operation that the kernel has decided cannot be safely interrupted. The most common examples:

Waiting for a local disk read/write to complete
Waiting for a response from an NFS (Network File System) mount
Waiting on certain hardware drivers

The key word is uninterruptible. The process has handed control to the kernel, and the kernel is in the middle of something that it cannot safely stop midway through.

Why SIGKILL doesn’t work

Here’s the thing about signals: a process can only receive a signal when it’s running in user space — when it’s executing its own code. SIGKILL works by telling the kernel to immediately terminate the process the next time it gets scheduled.

But a process in D state is not in user space. It’s deep inside kernel code — a system call that hasn’t returned yet. The kernel is doing something on its behalf, and the kernel cannot just abandon that mid-operation without potentially corrupting data structures. Until that kernel operation completes (or fails), the process cannot be reached by any signal at all.

In normal conditions, D state is very brief — a disk read finishes in milliseconds, the process wakes up, moves back to R or S. You’d never notice it.

When it becomes a problem is when the underlying operation never completes:

A disk is failing and retrying indefinitely
An NFS server went down and the client is stuck waiting for a response that will never come
A kernel bug causes a wait to never be satisfied

In those cases, the process stays in D state and there is genuinely nothing you can do short of rebooting the machine — or in the case of NFS, fixing/disconnecting the problematic mount.

# Spot D state processes
ps aux | awk '$8 == "D"'

# Or in top, look for D in the S column
top

# If you see multiple D state processes, check your disk and mount health
dmesg | tail -50          # kernel messages — look for I/O errors
df -h                      # is a filesystem hung?
cat /proc/mounts           # what's mounted?

The practical lesson: if a production server has processes stuck in D state and the system is sluggish or unresponsive, don’t waste time trying to kill them. Go investigate the storage or network filesystem. That’s where the real problem is.

TL;DR — The Whole Thing in One Place

Process states: R (running/queued), S (sleeping, interruptible), D (sleeping, uninterruptible), Z (zombie), T (stopped)
Zombie: child is dead, parent hasn’t called wait() yet — kill the parent to force cleanup, you can’t kill the zombie directly
Orphan: parent died while child is still running — kernel re-parents it to init, which cleans it up automatically
Process vs Thread: processes have isolated memory; threads share memory within the same process — threads are faster to communicate but harder to get right
fork + exec: shell forks itself, child execs the new program — this is how every command you run gets launched
D state: process is inside a kernel syscall waiting on I/O — unreachable by signals including SIGKILL — usually means a storage or NFS problem

Part of the Linux Mastery Road to Cloud series — limonlab.online

← Linux User Management: Commands, Concepts and Real-World Traps

Domain 5, Round 2: Package Management Internals — States, Locks, Failures, and Recovery →