Platform Engineering Playbook Podcast
The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution.
Read the playbook at https://platformengineeringplaybook.com
Episodes

Saturday Jan 03, 2026
Saturday Jan 03, 2026
OpenTelemetry has won the instrumentation wars with 95% adoption predicted for 2026. But winning data collection doesn't solve observability's real problems: spiraling costs, signal-to-noise ratios declining, and too much distance between seeing a problem and fixing it.
In this episode, we break down:• Netflix's evolution to high-cardinality analytics processing 1M+ spans per episode• The cost-control chokepoint that OTel enables for telemetry optimization• Why 40% of organizations are targeting autonomous remediation by end of 2026• How SLOs are becoming business conversations, not just engineering metrics
Plus news on GitHub Actions 39% pricing reduction and Jaeger v2.14.0 legacy removal.
Key takeaways:→ OTel adoption is near-universal, but 43% haven't seen cost savings→ Netflix treats observability as data engineering with Flink pipelines→ AI agents becoming first-class consumers of observability data→ Platform engineers becoming translators between telemetry and business impact
Full transcript and resources: https://platformengineering.playbook.com/docs/podcasts/00078-observability-opentelemetry-2026

Friday Jan 02, 2026
Friday Jan 02, 2026
Why did an API key management platform abandon edge serverless for stateful containers? Unkey hit 30ms p99 cache latency when they needed sub-10ms—so they rebuilt everything on AWS Fargate. This episode covers the technical decision-making framework for choosing between serverless and containers, plus a deep dive into Kubernetes 1.35's new structured z-pages for debugging.
In This Episode:- The serverless constraint: stateless = network request for every cache read- Unkey's complexity tax: Workers, Durable Objects, Queues, custom proxies- The container solution: Fargate + Global Accelerator = 6x performance- Decision framework: latency targets, data hotness, complexity budget- K8s 1.35 z-pages: JSON structured responses for compliance automation
Key Statistics:- 30ms p99 cache latency before migration (target: <10ms)- 6x performance improvement after moving to containers- Self-hosting unlocked as unexpected bonus
New episodes drop weekly. Subscribe to stay current on platform engineering.
Links:Full show notes: https://platformengineeringplaybook.com/podcasts/00077-unkey-serverless-containers-migrationContribute: Open a PR on GitHub

Thursday Jan 01, 2026
Thursday Jan 01, 2026
Why do 73% of organizations experience outages from alerts they ignored? This episode breaks down the technical shift from reactive thresholds to SLO-driven observability. Learn multi-window burn-rate alerting patterns, AIOps implementations that actually work, and an 8-week migration path to cut alert noise by 80%.
In This Episode:- The alert fatigue paradox: 2000+ weekly alerts with only 3% actionable- Technical causes: static thresholds, compound rule blind spots, alert storms- SLO-driven observability: error budgets and multi-window burn-rate alerting- AIOps patterns that work: anomaly detection, event correlation, RCA acceleration- Practical 8-week migration path from threshold alerts to signal-driven ops
Key Statistics:- 73% of organizations experience outages from ignored alerts (Splunk 2025)- Teams receive 2000+ alerts weekly, only 3% need immediate action- 27% of alerts in mid-size companies are simply ignored- 80% reduction in alert noise achievable with proper SLO-based design- $5,600/minute cost of unplanned downtime
New episodes drop weekly. Subscribe to stay current on platform engineering.
Links:Full show notes: https://platformengineeringplaybook.com/podcasts/00076-alert-fatigue-signal-driven-observabilityContribute: Open a PR on GitHub

Wednesday Dec 31, 2025
Wednesday Dec 31, 2025
Platform engineers who understand security operations—secrets management, vulnerability scanning, and compliance automation—are commanding premium salaries in 2026. This episode breaks down the security ops specialty: what it includes, why organizations are desperate for it, and how to build these skills alongside your existing platform engineering expertise.
In this episode:• Security ops specialty encompasses secrets management, vulnerability scanning, policy-as-code, and compliance automation• Organizations are struggling to find platform engineers with security depth—creating a supply-demand gap• The 2025 State of Secrets report shows 70% of organizations experienced a secrets-related incident• Key tools include HashiCorp Vault, Trivy, OPA/Gatekeeper, Falco, and SOPS• Building security skills alongside platform engineering creates a rare and valuable combination
Perfect for senior platform engineers, SREs, DevOps engineers looking to level up their platform engineering skills.
New episodes every week. Subscribe wherever you listen to stay current on platform engineering.
Episode URL: https://platformengineering.org/podcasts/00075-security-ops-specialty-platform-engineers
Duration: 19:05
Host: Alex and Jordan
Category: TechnologySubcategory: Software How-To
Keywords: security ops, platform engineering, secrets management, HashiCorp Vault, vulnerability scanning, Trivy, OPA, Gatekeeper, Falco, compliance automation, DevSecOps, shift-left security, policy-as-code, SOPS, supply chain security

Tuesday Dec 30, 2025
Tuesday Dec 30, 2025
The Linux Foundation announced the Agentic AI Foundation (AAIF) on December 9, 2025, bringing together AWS, Anthropic, Google, Microsoft, OpenAI, Block, Cloudflare, and Bloomberg. This episode breaks down MCP (Model Context Protocol) - the "HTTP for AI" with 97M+ monthly downloads.
📰 NEWS: Docker hardened images now free, MongoBleed CVE patch alert, Cloudflare "Fail Small" resilience plan, DORA metrics with Process Behavior Charts
🎯 Key Topics:• What AAIF and MCP mean for platform teams• MCP architecture: Hosts, Clients, and Servers• The N×M to N+M integration simplification• Security: OAuth flows, permission scopes, audit logging• Practical next steps for platform engineers
📊 Key Stats:• 97M+ monthly MCP SDK downloads• 10,000+ public MCP servers• 8 platinum members including all major AI/cloud players
🔗 Show notes: https://platformengineering.org/podcasts/00074-agentic-ai-foundation-mcp-platform-engineers
#PlatformEngineering #MCP #AgenticAI #AAIF #DevOps #AI #LinuxFoundation

Monday Dec 29, 2025
Monday Dec 29, 2025
FinOps is becoming an essential skill for platform engineers in 2026. This episode provides a complete guide to the skills, certifications, and tools you need to add cloud cost management to your platform engineering toolkit.
📰 News Segment:• GPG.fail documents 14 critical GnuPG vulnerabilities - check your signing tools• MongoBleed CVE-2025-14847: Critical MongoDB exploit - patch immediately• The Dangers of SSL Certificates: Catastrophic failure modes in automation• Google Multi-Cluster Orchestrator: Cross-region K8s management (KubeCon 2025)• GPG cleartext signature parsing vulnerabilities found
💡 Key Takeaways:• Platform teams own 70%+ of cloud spending decisions• FinOps + Platform Engineering = $175K+ compound skill premium• Senior FinOps Engineers average $150K, top earners reach $250K• 76% of organizations are increasing FinOps investment• New certifications: FinOps for AI (March 2026), FinOps for Containers
🎯 Skills Covered:• Tier 1: Cloud billing data, K8s cost allocation, unit economics• Tier 2: FOCUS specification v1.3, OpenCost/Kubecost, showback/chargeback• Tier 3: Automated rightsizing, committed use discounts, AI workload optimization
🔗 Resources:• FinOps Foundation: finops.org• OpenCost (CNCF): opencost.io• FOCUS Specification: focus.finops.org• Episode page: platformengineering.org/podcasts/00073-finops-2026-platform-engineers-guide
#FinOps #PlatformEngineering #CloudCost #Kubernetes #DevOps #CNCF #OpenCost

Sunday Dec 28, 2025
Sunday Dec 28, 2025
Platform engineers are commanding $172K-$207K in 2026, a 13-27% premium over DevOps roles. This episode breaks down salary benchmarks from Dice, Motion Recruitment, and Levels.fyi, revealing which skills are S-tier ($200K+) and which are table stakes.
We cover:- Platform Engineer vs DevOps salary gap (13-27% premium)- S-tier skills: LLM/GenAI ($195K-$312K), Platform Engineering, DevSecOps, MLOps- A-tier skills: Kubernetes + CKA, Go/Golang, FinOps, OpenTelemetry- Entry-level hiring crisis (-25% to -50% at major tech)- Geographic salary shifts: Atlanta +13.9%, Silicon Valley -7.3%- Top certification ROI: CKA, CNPE, FinOps Practitioner
Listen for actionable recommendations on which skills to prioritize in 2026 based on your current career level.
Episode page: https://platformengineering.org/podcasts/00072-platform-engineering-salary-skills-2026

Saturday Dec 27, 2025
Saturday Dec 27, 2025
The series finale of our five-part Platform Engineering 2026 Look Forward Series. We synthesize everything from agentic AI operations, mainstream adoption, developer experience metrics, and boring Kubernetes into ten concrete predictions for 2026. Learn what to invest in versus ignore, and discover our 2026 platform engineering thesis.
In this episode:- High confidence predictions: IDP market consolidates into 3 tiers, AI-assisted operations becomes table stakes, policy-as-code becomes table stakes- Medium confidence predictions: Talent gap peaks H1 2026 then stabilizes, "Platform team of one" becomes technically viable- INVEST IN: Developer experience measurement, self-service capabilities, golden paths, AI-assisted incident response- 2026 thesis: Invisible infrastructure, measurable experience, AI-augmented (not AI-replaced), product thinking
📰 News Segment:• KEDA v2.18.3 & v2.17.3 releases• Google Agent Development Kit for TypeScript• NIST Atomic Clock Failure at Boulder CO
Perfect for platform engineers, engineering leaders, and DevOps practitioners looking to level up their platform engineering skills.
Episode URL: https://platformengineeringplaybook.com/podcasts/00071-platform-engineering-predictions-2026
Duration: 17 minutes
Host: Jordan and Alex
Category: TechnologySubcategory: Software How-To
Keywords: platform engineering, 2026 predictions, IDP, AI operations, GitOps, policy-as-code

Friday Dec 26, 2025
Friday Dec 26, 2025
The best thing happening to Kubernetes in 2026 is that it's becoming boring. After a decade of explosive innovation, Kubernetes is entering its "mature infrastructure" phase - stable, predictable, and increasingly invisible. Like Linux and PostgreSQL before it, boring Kubernetes enables platform teams to build abstractions without worrying about breaking changes. Part of the Platform Engineering 2026 Look Forward Series.
In this episode:- Boring infrastructure is mature infrastructure - Linux and PostgreSQL became boring, then conquered the world- K8s 1.32-1.35 pattern: incremental stability, small refinements, no paradigm shifts- Innovation is moving up the stack: kro, Crossplane, and composition tools building on stable K8s foundation- The "just use managed Kubernetes" consensus has won - EKS/GKE/AKS handle 90% of operational concerns
Perfect for platform engineers, engineering leaders, and DevOps practitioners looking to level up their platform engineering skills.
Episode URL: https://platformengineeringplaybook.com/podcasts/00070-kubernetes-boring-era-2026
Duration: 15 minutes
Host: Jordan and Alex
Category: TechnologySubcategory: Software How-To
Keywords: Kubernetes, boring infrastructure, kro, Crossplane, platform engineering, EKS, GKE, AKS

Wednesday Dec 24, 2025
Wednesday Dec 24, 2025
DORA metrics revolutionized how we measure DevOps performance, but they have a critical blind spot: they tell you how your delivery pipeline is performing, but not how your people are doing. This episode explores the SPACE framework, DX Core 4, cognitive load measurement, and the HEART framework for platform teams. Part of the Platform Engineering 2026 Look Forward Series.
In this episode:- DORA tells you the what but not the how or the at what cost - teams can hit every DORA metric while engineers burn out- SPACE framework: Satisfaction, Performance, Activity, Communication, and Efficiency - five dimensions of developer productivity- DX Core 4: Speed (diffs per engineer), Effectiveness (DXI survey), Quality (change failure rate), Impact (% time on new features)- Five-metric starter pack for 2026: Deployment Frequency, Lead Time, DXI Score, Time to First Deployment, % Time on New Features
Perfect for platform engineers, engineering leaders, and DevOps practitioners looking to level up their platform engineering skills.
Episode URL: Developer Experience Metrics Beyond DORA
Duration: 14 minutes
Host: Jordan and Alex
Category: TechnologySubcategory: Software How-To
Keywords: developer experience, DORA, metrics, SPACE framework, DX Core 4, cognitive load, platform engineering




