Real Systemd Troubleshooting: 4 Lessons That Actually Stuck

I have been studying Linux since January 2026. I finished the theoretical side of all 12 domains I planned. But theory is one thing — actually diagnosing broken services on a real terminal is a completely different skill.

So I changed my approach. Instead of reading more, I started doing ticket-style troubleshooting. Someone gives me a broken scenario, I diagnose it with real commands, paste real output, and fix it. No multiple choice. No fabricated logs.

In this post I want to share four specific lessons from my Domain 6 (Systemd) session that genuinely changed how I think about Linux services.

Lesson 1: 203/EXEC With No Journal Output Means the Binary Itself Is Missing

I created a test service unit file pointing to /usr/bin/node as the binary and /opt/webapp/server.js as the script. When I ran systemctl start, it failed immediately.

Process: 12895 ExecStart=/usr/bin/node /opt/webapp/server.js (code=exited, status=203/EXEC)

Earlier in the same session I had seen 203/EXEC before — but that time the journal showed a clear Python error message:

python3[9231]: /usr/bin/python3: can't open file '/opt/testapp.py': [Errno 2] No such file or directory

This time the journal was completely silent. No error message at all.

The difference taught me something important. When status=203/EXEC appears with journal output, the binary ran but the script was missing. When it appears without any journal output, the binary itself does not exist on the system. There is nothing to run, so there is nothing to log.

This is a useful diagnostic pattern. If you see 203/EXEC and journalctl -u servicename shows nothing useful — check the binary first:

which node
ls -l /usr/bin/node

Lesson 2: A Timer Showing “Active” Tells You Nothing About Whether the Job Will Succeed

I set up a systemd timer called dbbackup.timer that was supposed to trigger dbbackup.service every day. When I checked the timer status:

Active: active (waiting) since Thu 2026-03-26 20:04:23 EET
Trigger: Fri 2026-03-27 00:00:00 EET; 3h 55min left

Everything looked fine. But when I manually triggered the service to test it before the scheduled midnight run:

sudo systemctl start dbbackup.service

It failed immediately. The script at /opt/scripts/dbbackup.sh did not exist.

The timer had no idea. It was just sitting there waiting to fire at midnight — it never checks whether the script it is supposed to run actually exists. It just schedules and fires. The error only happens at execution time.

This is why you should always test your timer-triggered services manually before relying on the schedule:

# Trigger the service directly — bypass the timer completely
sudo systemctl start dbbackup.service
systemctl status dbbackup.service
journalctl -u dbbackup.service -n 20

Also worth knowing: output from the script (like echo or print) does not appear in your terminal. It goes to the journal. So after running the service manually, check the journal to confirm it actually did what it was supposed to do.

Lesson 3: A Silent Misconfiguration Can Sit Undetected for Weeks

This one I discovered by accident during my own experiment — and it turned out to be the most valuable lesson of the session.

I was checking the nginx and php-fpm socket configuration on my live server stadiumbuzz.live. I found this:

# nginx config
grep fastcgi_pass /etc/nginx/sites-available/stadiumbuzz.live
fastcgi_pass unix:/run/php/php8.3-fpm-stadiumbuzz.sock;

# php-fpm pool config
grep listen /etc/php/8.3/fpm/pool.d/stadiumbuzz.conf
listen = /run/php/php8.3-fpm-stadiumbuz.sock   ← one letter missing

There was a typo — stadiumbuz instead of stadiumbuzz. The socket paths did not match.

But my website was working perfectly fine. No errors. No 502. Nothing.

I wanted to understand why. So I ran:

ls /run/php/

The old correct socket file was still there from the previous php-fpm start. The running php-fpm process was still listening on the correct path. The config file had the typo but nobody had restarted php-fpm yet — so the broken config was never actually loaded.

I then fixed the typo and restarted php-fpm safely. Site stayed up.

But here is the dangerous version of this scenario: if I had left the typo and php-fpm restarted for any reason — a server reboot, a kernel update, an automatic security patch — php-fpm would have created the new broken socket path, the old correct one would disappear, nginx would not find it, and the site would go down at 3am with a 502 error.

The lesson: a config file with a typo is not a harmless typo. It is a time bomb waiting for the next service restart.

This is why nginx -t and php-fpm8.3 -t are important before restarting — but they have limits. Neither tool can catch a socket path mismatch because each tool only validates its own config. They do not talk to each other.

The real safety net is always testing after a restart:

sudo systemctl restart php8.3-fpm
curl -I https://yourdomain.com

If the site returns 200, you are fine. If it returns 502, you caught it before anyone else did.

Lesson 4: A 502 Error With All Services Running Points to a Communication Problem

If nginx, MySQL, and php-fpm are all showing as running in systemctl status — but the site is down — the problem is likely how these services are talking to each other, not the services themselves.

For a WordPress site running nginx and php-fpm, the communication happens through a Unix socket file. Nginx passes PHP requests to php-fpm through a socket path defined in the nginx config. php-fpm listens on a socket path defined in its pool config. If these paths do not match, nginx cannot reach php-fpm and returns a 502 to the browser.

To check this:

# What socket is nginx expecting to pass requests to?
grep fastcgi_pass /etc/nginx/sites-available/yourdomain.conf

# What socket is php-fpm actually listening on?
grep "^listen" /etc/php/8.3/fpm/pool.d/www.conf

Both outputs must show the exact same socket path. If they do not match, fix one of them, restart the affected service, and test with curl immediately.

What Changed in My Thinking

Before these sessions I thought troubleshooting meant memorizing commands. Now I understand it is about reading real output and following the clues.

The journal is almost always the first place to look. journalctl -u servicename -n 50 tells you what actually happened — not what you assume happened.

And the most dangerous problems are not the ones that crash loudly. They are the ones that sit quietly in a config file, waiting for the next restart to cause real damage.

Lesson 1: 203/EXEC With No Journal Output Means the Binary Itself Is Missing

Lesson 2: A Timer Showing “Active” Tells You Nothing About Whether the Job Will Succeed

Lesson 3: A Silent Misconfiguration Can Sit Undetected for Weeks

Lesson 4: A 502 Error With All Services Running Points to a Communication Problem

What Changed in My Thinking

Leave a Reply Cancel reply