The Great SSL Certificate Mystery: A Tale of Dokku, Domains, and DevOps Drama
Origin and Evolution
It all started in the dead of night when our production server p12 ran out of disk space. The culprit? A misconfigured Prometheus instance happily hoarding metrics data, blissfully ignoring its retention parameters. This seemingly simple storage issue would spiral into a cascade of problems that would take hours to fully unravel.
To make matters worse, the server became completely inaccessible during our recovery attempts. We suspect Prometheus's WAL (Write-Ahead Log) rebuilding might have been the culprit, but for hours we were stuck in a frustrating loop of similar troubleshooting steps, unable to maintain a stable connection to the server.
The breakthrough came when we decided to fall back to our infrastructure-as-code approach and run the Ansible playbook. Like magic, it not only restored server connectivity but also revealed an unexpected surprise - four abandoned applications that had been silent for months suddenly sprang back to life!
During our attempts to restore monitoring after cleaning up the disk space, we tried accessing prometheus.kaido.team/targets. But instead of the expected monitoring dashboard, we were greeted with an unexpected surprise: a Let's Encrypt certificate for... Anytracker? This kicked off a hours-long investigation that would reveal some interesting quirks in our Dokku-based infrastructure.
The setup seemed simple enough: a Dokku installation managing multiple applications, each with its own domain and SSL certificate. But as we would discover, the devil was in the details of domain configuration and certificate management.
