You can usually find the cause of a pod eviction in five minutes. requests not set. JVM heap leak. Log volume past ephemeral-storage. The kubelet’s own log line tells you which.

This post is about the other kind. Every pod has correct requests and limits. Dashboards look fine. free -m says there’s plenty of memory. Pods still die in waves at 14:23 on a Tuesday, and there is no Friday-night-pager glory to make up for it.

Read more →

2025-08-04

15 min

3144 words

The database had been running on three KVM domains for nine years. It was provisioned by FAI off a debian-installer preseed that nobody on the current team had written, configured by a Puppet module last meaningfully edited in 2019, and patched only when the SRE on call had the energy. PostgreSQL 9.6, on Debian 8 (jessie), past the end of LTS, past the end of ELTS, past the end of any reasonable explanation. The boxes were fine. They were always fine. They had been fine for so long that nobody touched them on principle.

Then somebody from finance asked why we had three idle Xeons in a rack in Frankfurt, and we got eight months to move it to RDS.

I am writing this in March 2025 about a migration that started in early 2023 and ended, eventually, with the same database back on three new KVM domains in late 2024. We did the round trip. Both directions hurt. This is the long version of what broke.

Read more →

2025-03-12

27 min

5736 words

Somewhere to dump notes. Production postmortems, opinions about tooling that get repeated in Slack often enough to deserve a permalink, the occasional bit of postgres or kubernetes pathology worth being able to find again in two years when it bites someone else. So: a blog. About time.

Read more →

2024-09-15

4 min

759 words