Skip to main content

10 posts tagged with "devops"

View All Tags

OpenLens: The Kubernetes IDE That Changed My Life

· 2 min read
Max Kaido
Architect

As a DevOps engineer who spends countless hours wrestling with Kubernetes clusters, I've tried every tool under the sun. But nothing quite compares to OpenLens - the open-source Kubernetes IDE that transformed how I interact with clusters. Let me take you on a journey through my experience.

The "Aha!" Moment

Remember the first time you used a proper IDE after coding in Notepad? That's exactly how it feels switching to OpenLens from kubectl commands. No more juggling between terminal windows or struggling with YAML indentation. Just pure, visual Kubernetes management.

Mastering Ansible Tags: A Practical Guide to Infrastructure Management

· 2 min read
Max Kaido
Architect

When managing complex infrastructure with Ansible, the ability to selectively run specific parts of your playbooks becomes crucial. This is where Ansible tags come in - they allow you to organize and execute tasks with surgical precision. Let's explore how we implemented a comprehensive tagging system in our infrastructure.

The Problem

Our infrastructure includes various components:

  • AI services (Ollama, OpenWebUI)
  • Monitoring systems
  • Development environments
  • Basic system configurations

Running the entire playbook for small changes was:

  • Time-consuming
  • Potentially disruptive
  • Resource-intensive

The Hidden Complexity of Dokku Networking: A Tale of Redis and DNS

· 2 min read
Max Kaido
Architect

Today, we faced a classic example of the hidden complexity in Dokku's networking model. What seemed like a simple task - connecting a NestJS application to Redis - turned into an adventure through container networking, DNS resolution, and service discovery.

The Challenge

Our Mercury Bot service needed to communicate with both Redis and ChromaDB on the same network. Simple, right? Not quite. The initial setup led to DNS resolution errors:

Error: getaddrinfo EAI_AGAIN dokku-redis-mercury

This cryptic error message was just the beginning of our journey into Dokku's networking internals.

The Great SSL Certificate Mystery: A Tale of Dokku, Domains, and DevOps Drama

· 4 min read
Max Kaido
Architect

Origin and Evolution

It all started in the dead of night when our production server p12 ran out of disk space. The culprit? A misconfigured Prometheus instance happily hoarding metrics data, blissfully ignoring its retention parameters. This seemingly simple storage issue would spiral into a cascade of problems that would take hours to fully unravel.

To make matters worse, the server became completely inaccessible during our recovery attempts. We suspect Prometheus's WAL (Write-Ahead Log) rebuilding might have been the culprit, but for hours we were stuck in a frustrating loop of similar troubleshooting steps, unable to maintain a stable connection to the server.

The breakthrough came when we decided to fall back to our infrastructure-as-code approach and run the Ansible playbook. Like magic, it not only restored server connectivity but also revealed an unexpected surprise - four abandoned applications that had been silent for months suddenly sprang back to life!

During our attempts to restore monitoring after cleaning up the disk space, we tried accessing prometheus.kaido.team/targets. But instead of the expected monitoring dashboard, we were greeted with an unexpected surprise: a Let's Encrypt certificate for... Anytracker? This kicked off a hours-long investigation that would reveal some interesting quirks in our Dokku-based infrastructure.

The setup seemed simple enough: a Dokku installation managing multiple applications, each with its own domain and SSL certificate. But as we would discover, the devil was in the details of domain configuration and certificate management.