Domain 6 — Systemd & Services: What I Actually Learned in Round 2

Mar 18, 2026

The Three States Nobody Explains Clearly

Every service has three state fields, not one. When you run systemctl status nginx you see something like:

Loaded: loaded (/lib/systemd/system/nginx.service; enabled)
Active: inactive (dead)

These are independent of each other. Loaded means systemd found and parsed the unit file. Enabled means it will start on boot. Active means it is running right now.

A service can be enabled but inactive — you enabled it but never started it, or it crashed. A service can be active but disabled — someone started it manually this session but it will not survive a reboot. These combinations confuse a lot of people because they expect one single “is it working” answer. Systemd gives you three separate answers on purpose.

Writing Your Own Unit File

This is where most tutorials skip the important details.

ExecStart= must always be an absolute path. You cannot write ExecStart=python3 app.py. You must write ExecStart=/usr/bin/python3 /opt/myapp/app.py.

You cannot chain commands with && or ;. Systemd does not run your command through a shell. If you write ExecStart=/usr/bin/python3 app.py && echo done the unit will fail to load. If you need shell features use ExecStart=/bin/bash -c 'command && echo done'.

If your script uses relative paths internally, add WorkingDirectory=. Without it, systemd runs your process with / as the current directory and relative paths break. This is a common silent failure — the service starts, systemd marks it active, but the script cannot find its files.

Always run services as a dedicated non-root user. Two directives in [Service]:

User=appuser
Group=appuser

Never run application services as root. This is not optional in production.

Dependencies and Ordering Are Not the Same Thing

This tripped me up. After=network-online.target tells systemd the order to start things — your service starts after the network is online. But if the network target was never started, After= alone does not pull it in. Your service just starts whenever it gets to it.

Requires=network-online.target is what actually says “I need this to exist.” Combine both:

After=network-online.target
Requires=network-online.target

Wants= is the relaxed version of Requires=. If the dependency fails, your service still starts. Use Wants= when the dependency is helpful but not critical.

Restart Policies and Failure Handling

Restart=on-failure is what you almost always want for production services. It restarts the service if it crashes, but not if you manually stop it with systemctl stop. This is the important distinction.

Restart=always restarts no matter what — including manual stops. Useful in very specific cases but dangerous if you do not understand it.

Systemd does not restart infinitely. StartLimitBurst= and StartLimitIntervalSec= control how many restart attempts are allowed in a time window. Once the limit is hit, the service enters failed state and stops trying. To let it try again without rebooting:

systemctl reset-failed service_name

Targets Are Not Runlevels — But They Map to Them

Systemd targets replaced the old SysV runlevels. The ones that matter:

multi-user.target — equivalent of runlevel 3, no GUI, what most servers use
graphical.target — equivalent of runlevel 5, with desktop environment
rescue.target — minimal environment, filesystems mounted, for recovery
emergency.target — absolute bare minimum, root filesystem read-only, when rescue fails

To see your default boot target: systemctl get-default To change it: systemctl set-default multi-user.target To switch right now without rebooting: systemctl isolate rescue.target

Your service connects to a target through the [Install] section:

[Install]
WantedBy=multi-user.target

This is what systemctl enable actually does — it creates a symlink inside multi-user.target.wants/ pointing to your unit file. If your WantedBy= target does not match the system default target, your service will be enabled but will silently never start on boot. Classic trap.

Drop-in Overrides — Never Edit Vendor Files

If you need to change a setting in a package-provided service like nginx or mysql, do not touch the file in /lib/systemd/system/. It will be overwritten on the next package update.

Instead:

systemctl edit nginx.service

This creates a drop-in file at /etc/systemd/system/nginx.service.d/override.conf. You only put the directives you want to change. Everything else inherits from the original.

If you want to completely replace ExecStart= in a drop-in, you must first clear it then set the new value:

[Service]
ExecStart=
ExecStart=/usr/bin/nginx -c /etc/nginx/custom.conf

After any unit file change, always run:

systemctl daemon-reload

This does not restart the service. It just makes systemd aware of the new configuration.

Timers Instead of Cron

Systemd timers are the modern replacement for cron. For every timer you need two files — a .service and a .timer with the same base name.

backup.timer:

[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true

[Install]
WantedBy=timers.target

Persistent=true is important — if the system was off when the timer was supposed to fire, it will run immediately on next boot instead of waiting until the next scheduled time.

To see all timers and their next run time:

systemctl list-timers --all

Journalctl Is a Diagnostic Tool, Not Just a Log Viewer

Most people use journalctl -u service_name and stop there. Round 2 means knowing how to slice it.

Follow live logs:

journalctl -f -u nginx

Current boot only:

journalctl -b -u nginx

Previous boot:

journalctl -b -1 -u nginx

Errors and above only:

journalctl -p err -u nginx

Specific time window:

journalctl -u nginx --since "09:00" --until "10:00"

By default on a fresh Ubuntu install, journal logs do not survive reboots. To make them persistent, set Storage=persistent in /etc/systemd/journald.conf. Systemd will then write logs to /var/log/journal/.

To check disk usage and clean up:

journalctl --disk-usage
journalctl --vacuum-size=500M

Boot Analysis

Three commands, each tells you something different:

systemd-analyze              # total boot time split into kernel, initrd, userspace
systemd-analyze blame        # which services took the longest to start
systemd-analyze critical-chain  # which chain of dependencies is blocking boot

If your server is booting slow, start with blame to find the offender, then critical-chain to understand why it is in the critical path.

The Diagnostic Workflow

Service fails. What do you do?

systemctl status service_name     # quick overview, last few log lines, exit code
journalctl -u service_name -xe    # full logs with context
systemctl cat service_name        # see the actual unit file including drop-ins

In that order. Most failures reveal themselves by step two.

← Domain 5, Round 2: Package Management Internals — States, Locks, Failures, and Recovery

Networking Fundamentals · Round 2 →