Platform Engineering Playbook Podcast
The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution.
Read the playbook at https://platformengineeringplaybook.com
Episodes

Tuesday Dec 23, 2025
Tuesday Dec 23, 2025
Episode 2 of our 5-part "Platform Engineering 2026 Look Forward Series" examines the macro trend: platform engineering crossing the chasm to mainstream adoption.
Gartner predicts 80% of software engineering organizations will have platform teams by 2026. The CNPE certification launched at KubeCon 2025. But there's a 56% talent gap and nearly half of initiatives run on under $1M annually.
We address the "DevOps rebranding" debate with a 5-question litmus test:1. Do you have internal customers (developers)?2. Do you measure developer satisfaction?3. Do you have a product roadmap?4. Can developers self-serve without tickets?5. Do you deprecate platform features?
Key statistics:- 55% adoption in 2025 (Google), projected 80% by 2026 (Gartner)- Average PE salary: $172k (range $143k-$201k)- 55% of platform teams are less than 2 years old- Team sizing benchmark: 3.5% to 19% of engineering headcount
Platform engineering isn't just trendy - it's becoming table stakes. The question isn't IF you'll adopt it, but HOW WELL.
#PlatformEngineering #DevOps #SRE #CloudNative #CNPE #InternalDeveloperPlatform #2026Predictions #Gartner

Monday Dec 22, 2025
Monday Dec 22, 2025
Episode 1 of our 5-part "Platform Engineering 2026 Look Forward Series" tackles the hottest debate in platform engineering: will AI agents replace us or amplify us?
AWS Frontier Agents can reason across 30+ steps. The MLOps market hits $129 billion by 2028. Netflix AI triage cuts MTTR by 40%. But where are the hard limits?
We introduce the 60/30/10 Framework:- 60% Delegate: Log analysis, runbook execution, cost optimization- 30% Augment: Incident response, capacity planning (AI suggests, human confirms)- 10% Guard: Architecture decisions, security posture, novel failures
The key insight: the 20% AI can't do is 80% of the value.
Five action items for 2026:1. Audit your runbooks for automation candidates2. Pilot AI agents on low-risk, high-volume tasks3. Build the guardrail muscle4. Invest in AI orchestration skills5. Track the last mile gap
Platform engineering isn't becoming obsolete - it's evolving. The engineers who embrace AI agents will pull ahead of those who resist.
https://platformengineering.org/podcasts/00067-agentic-ai-platform-operations-2026
#PlatformEngineering #AgenticAI #MLOps #DevOps #SRE #AWSFrontierAgents #CloudNative #2026Predictions

Sunday Dec 21, 2025
Sunday Dec 21, 2025
The CNPE (Certified Cloud Native Platform Engineer) exam launched November 11, 2025 at KubeCon Atlanta, becoming the first hands-on platform engineering certification in five years. This deep dive covers exam format, all five domains, and a complete study guide.
Key Points:• CNPE is hands-on: 17 tasks in 2 hours, 64% pass score• Five domains: GitOps/CD (25%), Platform APIs (25%), Observability (20%), Architecture (15%), Security (15%)• BACK stack: Backstage, Argo CD, Crossplane, Kyverno• Golden Kubestronaut requires CNPE after March 2026• Career impact: Platform engineer salaries $160K-$220K
Resources:• Episode page: https://platformengineering.org/podcasts/00066-cnpe-certification-study-guide• CNPE Exam: https://training.linuxfoundation.org/certification/certified-cloud-native-platform-engineer/• CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platforms/
#CNPE #PlatformEngineering #Kubernetes #CNCF #Certification #DevOps #CloudNative #ArgoCD #Crossplane #Backstage #Kyverno

Saturday Dec 20, 2025
Saturday Dec 20, 2025
Kubernetes 1.35 "Timbernetes" dropped on December 17, 2025, fundamentally changing how we operate clusters. This deep dive covers the 60 enhancements, 3 breaking changes that will bite you if unprepared, and in-place pod resize graduating to GA after six years of development.
What You'll Learn:• Breaking Changes: cgroup v1 REMOVED (not deprecated), containerd 1.x EOL, IPVS deprecated• In-Place Pod Resize GA: Resize CPU/memory without pod restart - 6 years from KEP to stable• Pod Certificates Beta: Native kubelet-managed mTLS for zero-trust pod-to-pod auth• Gang Scheduling Alpha: Native all-or-nothing scheduling for AI/ML distributed training• Alpha Features: Node Declared Features, Partitionable Devices, Extended Toleration Operators• Practical Upgrade Checklist: What to audit and test before upgrading
Resources:• Episode page: https://platformengineering.org/podcasts/00065-kubernetes-1-35-timbernetes-deep-dive• Kubernetes 1.35 Release Blog: https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/• KEP-1287 In-Place Resize: https://github.com/kubernetes/enhancements/issues/1287• KEP-4317 Pod Certificates: https://github.com/kubernetes/enhancements/issues/4317
#Kubernetes #K8s #PlatformEngineering #DevOps #CloudNative #Timbernetes #ContainerOrchestration #InPlaceResize #GangScheduling #AI #ML

Saturday Dec 20, 2025
Saturday Dec 20, 2025
No more copy-paste configs. No more manual state management. Terraform just went component-based.
HashiCorp released native monorepo support and Terraform Stacks to GA on September 25, 2025. This is the biggest architectural shift since Terraform modules. Instead of directory-per-environment with duplicate configurations, you define components once and deploy multiple times with isolated state.
We explain components (lifecycle-aware resource groups in .tfstack.hcl files), deployments (isolated instances with separate state), orchestration rules (context-aware automated approvals), linked stacks (declarative cross-stack dependencies), migration paths from Terragrunt, and when platform teams should adopt.
NEWS SEGMENT:• Terraform Stacks + Monorepo (GA Sept 2025): Component-based architecture, orchestration rules, basic functionality in free tier https://www.hashicorp.com/blog/terraform-adds-native-monorepo-support-stack-component-configurations-and-more
• Pulumi IaC Including Terraform/HCL (Private Beta, GA Q1 2026): Direct Terraform state file support, native HCL, credits for HashiCorp costs https://www.pulumi.com/blog/all-iac-including-terraform-and-hcl/
• vLLM v0.13.0: 442 commits from 207 contributors, NVIDIA Blackwell Ultra support, DeepSeek optimizations (5.3% throughput gains) https://github.com/vllm-project/vllm/releases/tag/v0.13.0
• Amazon EC2 AZ ID API Support: Consistent Availability Zone IDs across all AWS accounts, eliminates manual zone mapping https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-ec2-availability-zone-id-api-support/
• GPT-5.2-Codex (Dec 18, 2025): 56.4% SWE-Bench Pro, 64% Terminal-Bench 2.0, invite-only cybersecurity capabilities https://openai.com/index/gpt-5-2-codex/
LINKS:• Platform Engineering Playbook: https://platformengineeringplaybook.com• Episode Page: https://platformengineeringplaybook.com/podcasts/00064-terraform-stacks-native-monorepo• Full Script: https://github.com/platformengineeringorg/platform-engineering-playbook/blob/main/docs/podcasts/scripts/00064-terraform-stacks-native-monorepo.txt• Terraform Stacks Explained: https://www.hashicorp.com/blog/terraform-stacks-explained
#terraform #terraformstacks #hashicorp #iac #infrastructureascode #platformengineering #devops #terragrunt #pulumi

Friday Dec 19, 2025
Friday Dec 19, 2025
Supply chain attacks cost $60 billion in 2025. Docker just made the solution free.
On December 17, Docker released 1,000+ hardened container images under Apache 2.0—previously a paid offering. Independent penetration testing by SRLabs confirmed 95% CVE reduction and found NO root escapes or container breakouts. These images use distroless runtime: no shell, no package manager, no attack surface.
We break down how distroless actually works (why removing /bin/sh matters), SLSA Level 3 cryptographic provenance, SBOM/VEX for killing alert fatigue, multi-stage build migration patterns, debugging without a shell (kubectl debug), and how Docker compares to Chainguard Wolfi, Google distroless, and Red Hat UBI.
NEWS SEGMENT:• First Linux Kernel Rust CVE (CVE-2025-68260): Race condition in Android Binder's unsafe block. DoS only, no RCE. Greg Kroah-Hartman: "totally expected and normal." https://www.phoronix.com/news/First-Linux-Rust-CVE
• GitHub Actions 39% Price Cut: Self-hosted billing postponed indefinitely after backlash. 96% of customers unaffected. https://resources.github.com/actions/2026-pricing-changes-for-github-actions/
LINKS:• Platform Engineering Playbook: https://platformengineeringplaybook.com• Episode Page: https://platformengineeringplaybook.com/podcasts/00063-docker-hardened-images-free-security• Full Script: https://github.com/platformengineeringorg/platform-engineering-playbook/blob/main/docs/podcasts/scripts/00063-docker-hardened-images-free-security.txt• Docker Blog: https://www.docker.com/blog/docker-hardened-images-for-every-developer/
#docker #containers #security #kubernetes #platformengineering #devops #supplychainsecurity #distroless #sbom #slsa

Thursday Dec 18, 2025
Thursday Dec 18, 2025
Kubernetes 1.35 is here, and it changes everything about pod lifecycle management. In this episode, we break down the release that finally lets you scale pods without restarting them.
In This Episode:- In-Place Pod Vertical Scaling goes GA - adjust CPU/memory without pod restarts- Breaking changes: cgroup v1 removed, containerd 1.x EOL, IPVS deprecated- Pod Certificates (beta) for native workload identity without cert-manager- 60 enhancements: what matters for platform teams- Practical upgrade checklist and timing guidance
News Segment:- Docker makes 1,000+ hardened container images free (95% CVE reduction)- GitHub Actions pricing changes (up to 39% reduction) coming January 2026- First Linux Kernel Rust CVE announced (CVE-2025-68260)- KubeVirt completes OSTIF security audit (15 findings, strong architecture)
Resources:- K8s 1.35 Release: https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/- Full show notes: https://platformengineering.org/podcasts/00062-kubernetes-1-35-timbernetes
Duration: ~15 minutesSpeakers: Jordan & Alex

Wednesday Dec 17, 2025
Wednesday Dec 17, 2025
Netflix reduced their deployment failures by 40,000x using Temporal. In this episode, we break down how they achieved this remarkable improvement and what it means for your platform engineering practice.
In This Episode:- Netflix's deployment reliability problem: 4% failure rate from transient cloud operations- What is durable execution? Write code as if failures don't exist- Temporal vs AWS Step Functions vs Apache Airflow vs Cadence comparison- Netflix's Spinnaker/Clouddriver implementation with 2-hour fix-forward window- When Temporal is (and isn't) the right choice for your organization
Key Stats:- Deployment failures: 4% → 0.0001% (40,000x improvement)- Temporal valuation: $2.5B with 183,000+ weekly active developers- 600% growth in developer adoption over 18 months
Resources:- Netflix Tech Blog: https://netflixtechblog.com/how-temporal-powers-reliable-cloud-operations-at-netflix-73c69ccb5953- Temporal.io: https://temporal.io/- Full show notes: https://platformengineering.org/podcasts/00061-netflix-temporal-deployment-reliability

Tuesday Dec 16, 2025
Tuesday Dec 16, 2025
48% of Kubernetes users struggle with tool choice. That's nearly half of us paralyzed by options. So when AWS adopted kro alongside Argo CD, we had to ask: is this the Goldilocks solution we've been waiting for?
In this episode, Jordan and Alex tackle the composition tool landscape with an honest decision framework. We dive deep into CEL expressions, resource graph mechanics, and GitOps integration. We also give Viktor Farcic's criticism a fair hearing, and explain exactly when kro makes sense - and when it doesn't.
News Segment:• Shai-Hulud npm supply chain attack postmortem - 500+ packages, 25K repos• Ingress-nginx retirement - March 2026, 3 months away• Netflix Maestro 100x faster through full rewrite
Main Topics:• The Goldilocks problem: Helm (too simple?), Crossplane (too complex?), kro (just right?)• CEL expressions deep dive: syntax, operators, and functions• Resource graph mechanics: topological sorting and dependency inference• GitOps integration: how kro works with Argo CD and Flux• Viktor Farcic's criticism and our honest response• Migration paths and real-world use cases
Resources:• Episode page: https://platformengineering.org/podcasts/00060-kro-goldilocks-kubernetes-composition• kro GitHub: https://github.com/kubernetes-sigs/kro• CNCF Blog: https://www.cncf.io/blog/2025/12/15/building-platforms-using-kro-for-composition/• AWS EKS Capabilities: https://aws.amazon.com/blogs/aws/announcing-amazon-eks-capabilities-for-workload-orchestration-and-cloud-resource-management/• InfoQ Analysis (Viktor Farcic): https://www.infoq.com/news/2025/02/kube-resource-orchestrator/• Spectro Cloud 2024 Survey: https://www.spectrocloud.com/news/spectro-cloud-releases-2024-state-of-production-kubernetes
News Segment Links:• Shai-Hulud npm Attack Postmortem: https://trigger.dev/blog/the-shai-hulud-2-0-attack-postmortem• Ingress-nginx Retirement: https://github.com/kubernetes/ingress-nginx/issues/12094• Netflix Maestro 100x Faster: https://netflixtechblog.com/maestro-netflixs-workflow-orchestrator-ee13a06f9c78
#kubernetes #platformengineering #kro #crossplane #helm #devops

Sunday Dec 14, 2025
Sunday Dec 14, 2025
2025 was the year platform engineering grew up—and got a reality check. AI entered infrastructure in ways we couldn't ignore. Industry consensus finally emerged on what platforms should actually do. And Cloudflare went down six times to remind us that concentration risk isn't just theoretical.
In this special year-in-review episode, we look back at the ten stories that defined platform engineering in 2025:
✅ AI-native Kubernetes arrived (DRA GA, AI Conformance v1.0)✅ Platform engineering reached consensus—but 70% still fail✅ Infrastructure concentration risk became undeniable (AWS + Cloudflare)✅ IngressNightmare exposed 43% of cloud environments✅ Open source sustainability crisis (60% maintainers unpaid)✅ GPU waste: 13% average utilization = $4,350/month wasted per GPU✅ Service mesh sidecar era ended (Istio Ambient GA)✅ IaC consolidation (IBM + HashiCorp, CDKTF deprecated)✅ Gateway API became the standard✅ Agentic AI entered platform engineering
Top 5 Takeaways for 2026:1. AI infrastructure is now standardized—architect to avoid lock-in2. Platform engineering has a definition—use it3. Concentration risk is real—multi-region, multi-cloud, multi-CDN4. Open source needs funding—$2K/dev/year recommendation5. GPU waste is the new cloud waste—DRA and time-slicing are table stakes
Show notes: https://platformengineeringplaybook.io/podcasts/00059-platform-engineering-2025-year-in-review




