Thursday Jan 08, 2026

HolmesGPT: AI Root Cause Analysis for Kubernetes

Deep dive into HolmesGPT, the CNCF Sandbox AI agent that revolutionizes cloud-native troubleshooting. This episode covers what it is, its 40+ integrations, the project roadmap, and how to set it up today.

News Segment:

  • AirFrance-KLM's secure automation platform with Terraform, Vault, and Ansible
  • AWS ECS tmpfs mounts on Fargate for secure secrets handling
  • Qwen 30B running on Raspberry Pi - democratizing edge AI
  • AWS European Sovereign Cloud with independent EU governance

Main Topic - HolmesGPT:

  • CNCF Sandbox project (accepted October 2025) with 1,600+ GitHub stars
  • Agentic architecture: creates investigation task lists, queries systems, synthesizes findings
  • 40+ built-in toolsets: Prometheus, Grafana Loki/Tempo, Kubernetes, ArgoCD, DataDog, and more
  • Privacy-first: bring your own LLM keys, read-only access, respects RBAC
  • End-to-end automation with AlertManager, PagerDuty, OpsGenie integration
  • Installation options: pip, Homebrew, Helm, Web UI, K9s plugin

Resources:

Episode Type: full Episode Number: 83 Season: 1 Tags: HolmesGPT, CNCF, Kubernetes, root cause analysis, AI ops, troubleshooting, observability, SRE, platform engineering, Robusta, agentic AI

Comment (0)

No comments yet. Be the first to say something!

Copyright 2025 All rights reserved.

Podcast Powered By Podbean

Version: 20241125