Platform Engineering Playbook Podcast

The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution.

Read the playbook at https://platformengineeringplaybook.com

Listen on:

  • Apple Podcasts
  • YouTube
  • Podbean App
  • Spotify
  • Amazon Music

Episodes

Sunday Dec 14, 2025

In five years, Okta scaled Auth0's private cloud from 12 to 1,000+ Kubernetes clusters using ArgoCD. At KubeCon 2025, engineers Jérémy Albuixech and Kahou Lei shared their hard-won lessons. This episode breaks down the challenges, solutions, and practical wisdom for scaling GitOps to enterprise levels.
Full episode page: https://platformengineeringplaybook.com/podcasts/00058-okta-gitops-argocd-1000-clusters
In this episode, we cover:- The 83x scaling journey: from 12 clusters in 2020 to 1,000+ in 2025- Five major challenges at scale: controller degradation, centralized bottlenecks, application explosion, global latency, observability gaps- Five key solutions: controller sharding, ArgoCD Agent hub-spoke model, Application Sets templating, progressive rollouts, purpose-built observability- When to implement sharding (hint: 100+ clusters is the threshold)- The ArgoCD UI degradation wall at 1,000 applications- Six lessons learned including "GitOps doesn't solve organizational problems"- Practical guidance for teams at 10-50, 100-500, and 500+ cluster scales
Plus news on Helm v4.0.4/v3.19.4 releases, Zero Trust in CI/CD Pipelines guide, 1 billion row migration without downtime, Microsoft Azure HorizonDB, and the Platform Engineering State 2026 report.
Sources:- The New Stack: How Okta Scaled From 12 to 1000 Kubernetes Clusters With Argo CD- ITNEXT: How We Load Test Argo CD at Scale: 1,000 vClusters with GitOps- Red Hat: Multi-cluster GitOps with the Argo CD Agent- KubeCon + CloudNativeCon Atlanta 2025: "One Dozen To One Thousand Clusters" by Jérémy Albuixech and Kahou Lei
#DevOps #PlatformEngineering #GitOps #ArgoCD #Kubernetes #MultiCluster #CNCF #KubeCon #CloudNative #SRE

Saturday Dec 13, 2025

Ninety percent of organizations now have platform teams, but most just renamed their ops team and expected different results. This episode breaks down the team sizes, reporting structures, and interaction patterns backed by DORA 2025 data that separate successful platform teams from glorified ticket handlers.
Full episode page: https://platformengineeringplaybook.com/podcasts/00057-platform-engineering-team-structures
In this episode, we cover:- DORA 2025 shows 90% of orgs have platforms, 76% have dedicated teams—when done right, 8% individual productivity boost and 10% team productivity boost- Optimal team size is 6-12 people (Spotify squads, Microsoft 5-9)—small enough for ownership, large enough for complete capabilities- Reporting structure matters: companies with 100+ engineers need dedicated platform leader to shield from competing priorities- Team Topologies interaction patterns: start Collaboration mode while building, evolve to X-as-a-Service when mature- Success metrics: self-service rate >90%, developer happiness tracking, DORA metrics for consuming teams- Anti-patterns to avoid: rebranding without role change, underinvestment after launch, skill concentration trap, Field of Dreams (building without validation)
Plus news on Sim (Apache 2.0 n8n alternative), Docker Hub credential leaks (10K+ images exposed), Meta's BPF-LSM replacing SELinux, Litestream VFS for S3, GitHub login failures, and GPT-5.2 launch.
Sources:- DORA 2025 Report: https://dora.dev/- Team Topologies: https://teamtopologies.com/- Spotify Engineering Culture: https://engineering.atspotify.com/- Backstage: https://backstage.io/
#DevOps #PlatformEngineering #TeamTopologies #DORA #EngineeringLeadership #DevEx #InternalDeveloperPlatform #SRE #CloudNative

Friday Dec 12, 2025

HashiCorp (now IBM) has officially archived the CDK for Terraform project, ending a five-year experiment in programmatic infrastructure-as-code.
Full episode page: https://platformengineeringplaybook.com/podcasts/00056-cdktf-deprecated-iac-migration
In this episode, we break down:- Why CDKTF failed to find product-market fit (243K downloads vs Pulumi's 1.1M)- The four key factors behind the deprecation: Pulumi's head start, JSII complexity, HCL "good enough", IBM acquisition timing- Community reaction and the "rug pull" sentiment- Migration paths: HCL (cdktf synth --hcl), Pulumi, OpenTofu, or AWS CDK- What platform engineers should learn about vendor lock-in risk
Plus news on Envoy CVE-2025-0913 (CVSS 8.6), Google's managed MCP servers, OpenTofu 1.11, pgAdmin 4 v9.11, Lima v2.0, and Amazon ECS custom stop signals.
If you're on CDKTF, start your migration analysis this week. The programmatic IaC dream isn't dead—it just won't be at HashiCorp.
Sources:- CDKTF Repository: https://github.com/hashicorp/terraform-cdk- Hacker News Discussion: https://news.ycombinator.com/item?id=42379268- Pulumi: https://www.pulumi.com/- OpenTofu: https://opentofu.org/
#DevOps #PlatformEngineering #InfrastructureAsCode #Terraform #CDKTF #Pulumi #OpenTofu #HashiCorp #IBM #CloudNative

Thursday Dec 11, 2025

🎧 AUDIODOCS: Official documentation of popular open-source projects, adapted and narrated for audio. Learn while commuting, exercising, or doing chores.
Stop juggling terminal windows to tail Kubernetes logs. stern lets you tail multiple pods and containers simultaneously with regex queries, auto-detection of new pods, and color-coded output. This episode covers everything from basic usage to advanced templates and filtering.
WHAT YOU'LL LEARN:00:00 - Introduction & The Problem stern Solves01:30 - Basic Usage: Regex and Resource Queries03:00 - Multi-Container Tailing & Filtering04:30 - Namespace, Label, and Node Filtering06:00 - Output Formatting & Custom Templates07:30 - Time-Based Filtering & Batch Mode08:45 - Configuration & Color Customization09:45 - Installation & Practical Tips10:30 - Summary & Key Takeaways
LINKS:Full Transcript & Episode Page:https://platformengineeringplaybook.com/audiodocs/stern/v1.33.1
stern Official GitHub:https://github.com/stern/stern
stern Releases:https://github.com/stern/stern/releases
Platform Engineering Playbook:https://platformengineeringplaybook.com
KEY TOPICS:- Multi-pod log tailing with regex queries- Resource queries (deployment/nginx, statefulset/db)- Container filtering (-c, -E flags)- Namespace and label selectors- Output templates (default, raw, json, custom Go)- Time filtering (--since, --timestamps)- Batch mode with --no-follow- Configuration file and color customization- fzf integration and shell completion
---📜 LICENSE & ATTRIBUTIONThis AudioDocs episode is a derivative work based on the official stern documentation.Original documentation: https://github.com/stern/sternLicense: Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)© stern contributors---
#stern #Kubernetes #DevOps #PlatformEngineering #CloudNative #SRE #AudioDocs #Logging #kubectl #Observability
TAGS:stern, kubernetes logs, kubectl logs, multi-pod tailing, container logs, kubernetes debugging, log aggregation, devops tools, sre tools, platform engineering, cloud native, kubernetes observability, log streaming, audiodocs, listen to docs

Thursday Dec 11, 2025

🎧 AUDIODOCS: Official documentation of popular open-source projects, adapted and narrated for audio. Learn while commuting, exercising, or doing chores.
Master CoreDNS, the default DNS server for Kubernetes clusters. This 72-minute episode covers the complete v1.13.1 documentation - from plugin architecture to production configuration.
Every time a pod looks up a service, every time kubectl exec needs to find a pod - CoreDNS handles that resolution. If you're debugging DNS issues or optimizing cluster performance, this comprehensive audio guide has you covered.
WHAT YOU'LL LEARN:00:00 - Introduction & Overview02:30 - Project Context: CNCF Graduation & Why CoreDNS Replaced kube-dns06:00 - Architecture: The Plugin "Lego Blocks" Model12:00 - Core Concepts: Server Blocks, Zones, Plugin Ordering18:00 - Installation: Kubernetes, Standalone, Docker, Package Managers24:00 - Corefile Configuration Mastery30:00 - Common Setups: Recursive Resolver, Authoritative DNS36:00 - kubernetes Plugin Deep Dive: Service Discovery & Pod Modes42:00 - forward Plugin: Upstream Servers, Health Checking, Policies48:00 - cache Plugin: TTL Handling, Prefetch, Denial Caching54:00 - file & hosts Plugins: Zone Files, /etc/hosts Style Records60:00 - errors, log, health & ready Plugins64:00 - prometheus Plugin: Metrics & Grafana Dashboards68:00 - rewrite & acl Plugins: Query Modification, Access Control72:00 - DNS Security: DNSSEC, DNS over TLS, DNS over HTTPS76:00 - Additional Plugins & Key Takeaways
LINKS:Full Transcript & Episode Page:https://platformengineeringplaybook.com/audiodocs/coredns/v1.13.1
CoreDNS Official Website:https://coredns.io/
CoreDNS GitHub Repository:https://github.com/coredns/coredns
CoreDNS Documentation:https://coredns.io/manual/toc/
CoreDNS Plugin Documentation:https://coredns.io/plugins/
CNCF Project Page:https://www.cncf.io/projects/coredns/
Platform Engineering Playbook:https://platformengineeringplaybook.com
KEY TOPICS COVERED:- Plugin architecture & execution ordering- Corefile configuration syntax- Kubernetes service discovery (ClusterIP, headless services, endpoint slices)- DNS caching strategies & TTL management- Forwarding to upstream DNS servers- Health checks & readiness probes for Kubernetes- Prometheus metrics integration- Query rewriting & access control lists- DNS security: DNSSEC validation, DNS over TLS (port 853), DNS over HTTPS- Production best practices & testing configurations
WHO THIS IS FOR:Platform engineers, SREs, and DevOps engineers who need to understand, configure, or troubleshoot Kubernetes DNS. Assumes familiarity with Kubernetes concepts.
---📜 LICENSE & ATTRIBUTIONThis AudioDocs episode is a derivative work based on the official CoreDNS documentation.Original documentation: https://coredns.io/manual/toc/License: Apache License 2.0 (https://github.com/coredns/coredns/blob/master/LICENSE)© CoreDNS Authors
This audio transforms written documentation into educational audio format with proper attribution as required by the Apache 2.0 license.---
#CoreDNS #Kubernetes #DNS #CNCF #DevOps #PlatformEngineering #K8s #CloudNative #SRE #AudioDocs #KubernetesDNS #ServiceDiscovery
TAGS:coredns, coredns tutorial, coredns kubernetes, kubernetes dns, k8s dns, coredns plugins, coredns configuration, corefile, coredns corefile, dns server kubernetes, kubernetes service discovery, coredns forward plugin, coredns cache, coredns prometheus, dns over tls kubernetes, dnssec kubernetes, coredns health check, cncf graduated project, cloud native dns, coredns troubleshooting, kubernetes networking, coredns v1.13.1, audiodocs, platform engineering

Thursday Dec 11, 2025

🎧 AUDIODOCS: Official documentation of popular open-source projects, adapted and narrated for audio. Learn while commuting, exercising, or doing chores.
Stop typing long kubectl config commands! kubectx and kubens are essential CLI tools that let you switch Kubernetes contexts and namespaces instantly with tab completion and fuzzy search.
This 10-minute episode covers everything you need to know about v0.9.5 - from installation to power-user workflows. If you work with multiple Kubernetes clusters, these tools will save you hours every week.
WHAT YOU'LL LEARN:00:00 - The Problem: Why kubectl Context Switching is Painful01:30 - kubectx Basics: Instant Context Switching02:45 - The Dash Flag: Toggle Between Two Contexts03:30 - Context Renaming: Human-Readable Names04:30 - kubens: Namespace Switching Made Easy05:30 - The Force Flag: Non-Existent Namespaces06:00 - fzf Integration: Interactive Fuzzy Search07:00 - Installation: Homebrew, apt, Krew, Chocolatey08:00 - Shell Completion: bash, zsh, fish08:30 - Customization: Colors & Environment Variables09:00 - Workflow Tips: kube-ps1, Naming Conventions09:45 - Summary & Key Takeaways
LINKS:Full Transcript & Episode Page:https://platformengineeringplaybook.com/audiodocs/kubectx/v0.9.5
kubectx GitHub Repository:https://github.com/ahmetb/kubectx
fzf - Fuzzy Finder (Recommended):https://github.com/junegunn/fzf
kube-ps1 - Prompt Integration:https://github.com/jonmosco/kube-ps1
kubectl Krew Plugin Manager:https://krew.sigs.k8s.io/
Platform Engineering Playbook:https://platformengineeringplaybook.com
KEY FEATURES COVERED:- kubectx: List and switch contexts with a single command- kubens: Switch namespaces without verbose kubectl commands- Dash flag (-): Toggle back to previous context/namespace- Context renaming: kubectx prod=gke_project_region_cluster- fzf integration: Interactive fuzzy-search menu- Shell completion: Tab-complete context and namespace names- Force flag: Set namespace before it exists- NO_COLOR support: Disable colored output- KUBECTX_IGNORE_FZF: Disable fzf when needed
INSTALLATION METHODS:- macOS/Linux: brew install kubectx- Debian/Ubuntu: sudo apt install kubectx- Arch Linux: sudo pacman -S kubectx- kubectl plugin: kubectl krew install ctx ns- Windows: choco install kubectx-ps, scoop install kubectx
WHO THIS IS FOR:Anyone working with multiple Kubernetes clusters or namespaces. Perfect for platform engineers, SREs, and developers who want to eliminate kubectl config friction.
---📜 LICENSE & ATTRIBUTIONThis AudioDocs episode is a derivative work based on the official kubectx documentation.Original documentation: https://github.com/ahmetb/kubectxLicense: Apache License 2.0 (https://github.com/ahmetb/kubectx/blob/master/LICENSE)© Ahmet Alp Balkan
This audio transforms written documentation into educational audio format with proper attribution as required by the Apache 2.0 license.---
#kubectx #kubens #Kubernetes #kubectl #DevOps #PlatformEngineering #K8s #CloudNative #SRE #CLI #AudioDocs #ProductivityTools #KubernetesTools
TAGS:kubectx, kubens, kubernetes context, kubectl context switch, kubernetes namespace switch, k8s tools, kubernetes cli tools, kubectx tutorial, kubens tutorial, kubernetes productivity, fzf kubernetes, kubectl tips, kubernetes workflow, multi-cluster kubernetes, kubernetes context management, ahmetb kubectx, krew plugins, kubectl productivity, switch kubernetes cluster, kubernetes namespace, kubectx v0.9.5, audiodocs, platform engineering

Thursday Dec 11, 2025

Part 4 of 4 in our AWS re:Invent 2025 series (finale). The data and AI services that tie everything together. S3 Tables with Apache Iceberg hits GA with Intelligent-Tiering and cross-region replication. Aurora DSQL delivers distributed SQL with GPS atomic clocks. S3 Vectors supports 2 billion vectors at 90% lower cost. Clean Rooms ML enables privacy-enhanced synthetic datasets. Plus a comprehensive wrap-up connecting 50+ announcements across all four episodes. News: Envoy CVE-2025-0913, Rust in Linux kernel permanent, Let's Encrypt 10 years.
In this episode:- S3 Tables GA with Intelligent-Tiering (80% cost savings) and automatic cross-region replication for Iceberg tables- Aurora DSQL uses GPS atomic clocks for global consistency, 4x faster than other distributed SQL, built 100% in Rust- S3 Vectors supports 2B vectors per index (40x preview increase), 90% cheaper than Pinecone/Weaviate/Qdrant- Clean Rooms ML generates privacy-enhanced synthetic datasets for collaborative ML without exposing raw data- Database Savings Plans: up to 35% savings, flexible across engines/regions, no Reserved Instance Tetris- Series wrap-up: 4 episodes, 50+ announcements, theme is "make infrastructure boring"
📰 News Segment Links:• NVD CVE Details  https://nvd.nist.gov/vuln/detail/CVE-2025-0913• Wiz Vulnerability Database  https://www.wiz.io/vulnerability-database/cve/cve-2025-0913• Envoy Releases (v1.34.12, v1.35.8, v1.33.14)  https://github.com/envoyproxy/envoy/releases• LWN.net Article  https://lwn.net/Articles/1049831/• Hacker News Discussion  https://news.ycombinator.com/item?id=46213585• Linux.org Thread  https://www.linux.org/threads/lwn-net-the-end-of-the-kernel-rust-experiment.59852/• ByteIOTA Coverage  https://byteiota.com/linux-kernel-rust-experiment-over/• Official Blog Post  https://letsencrypt.org/2025/12/09/10-years• Simon Willison Commentary  https://simonwillison.net/2025/Dec/10/lets-encrypt/• GIGAZINE Coverage  https://gigazine.net/gsc_news/en/20251210-letsencrypt-10-years/• EFF Celebration  https://www.eff.org/deeplinks/2023/08/celebrating-ten-years-encrypting-web-lets-encrypt• GitHub Status Page  https://www.githubstatus.com/• GitHub Status Incident History  https://www.githubstatus.com/history.rss• Amazon S3 Tables Replication  https://aws.amazon.com/about-aws/whats-new/2025/12/s3-tables-automatic-replication-apache-iceberg-tables/• Amazon S3 Expands Capabilities  https://press.aboutamazon.com/2024/12/amazon-s3-expands-capabilities-with-managed-apache-iceberg-tables-for-faster-data-lake-analytics-and-automatic-metadata-generation-to-simplify-data-discovery-and-understanding• Top AWS re:Invent 2025 Announcements  https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2025/• AWS re:Invent Recap  https://www.nops.io/blog/aws-reinvent-recap-2025/• TechCrunch Coverage  https://techcrunch.com/2024/12/03/aws-announces-aurora-dsql-a-new-distributed-sql-database-that-promises-virtually-unlimited-scalability/• AWS Press Release  https://press.aboutamazon.com/2024/12/aws-announces-new-database-capabilities-including-amazon-aurora-dsql-the-fastest-distributed-sql-database• Aurora DSQL GA Announcement  https://aws.amazon.com/blogs/aws/amazon-aurora-dsql-is-now-generally-available/• Werner Vogels Blog  https://www.allthingsdistributed.com/2025/05/just-make-it-scale-an-aurora-dsql-story.html• InfoQ Coverage (Preview)  https://www.infoq.com/news/2024/12/amazon-aurora-dsql/• InfoQ Coverage (GA)  https://www.infoq.com/news/2025/06/amazon-aurora-dsql-ga/• AWSInsider  https://awsinsider.net/articles/2025/06/09/amazon-aurora-dsql-now-generally-available-serverless-and-distributed.aspx• AWS Glue Zero-ETL  https://aws.amazon.com/about-aws/whats-new/2025/11/glue-zero-etl-selfmanaged/• The Register (2022 announcement)  https://www.theregister.com/2022/11/29/aws_selipsky_reinvent_keynote/• Cloud Data Insights  https://www.clouddatainsights.com/aws-eyes-a-zero-etl-future-with-newly-announced-capabilities/• S3 Vectors GA Blog Post  https://aws.amazon.com/blogs/aws/amazon-s3-vectors-now-generally-available-with-increased-scale-and-performance/• AWS What's New  https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-s3-vectors-generally-available/• InfoQ Coverage  https://www.infoq.com/news/2025/07/aws-s3-vectors/• NAND Research  https://nand-research.com/research-note-aws-s3-ai-focused-enhancements/• Blocks and Files  https://blocksandfiles.com/2025/12/03/aws-s3/• Computer Weekly  https://www.computerweekly.com/blog/CW-Developer-Network/AWS-recalibrates-data-economics-further-with-S3-Vectors-batch-Intelligent-Tiering• AWS Blog Post  https://aws.amazon.com/blogs/aws/aws-clean-rooms-launches-privacy-enhancing-synthetic-dataset-generation-for-ml-model-training/• AWS What's New  https://aws.amazon.com/about-aws/whats-new/2025/11/aws-clean-rooms-synthetic-dataset-generation-custom-ml/• Forged Concepts Analysis  https://forgedconcepts.com/aws-clean-rooms-synthetic-data-reinvent-2025• IDC Blog  https://blogs.idc.com/2025/12/05/how-synthetic-data-and-clean-rooms-are-redefining-secure-data-collaboration/• GoML.io  https://www.goml.io/blog/aws-clean-rooms• AWS Blog Post  https://aws.amazon.com/blogs/aws/introducing-database-savings-plans-for-aws-databases/• AWS What's New  https://aws.amazon.com/about-aws/whats-new/2025/12/database-savings-plans-savings/• GeekWire Coverage  https://www.geekwire.com/2025/the-hot-new-thing-at-aws-reinvent-a-database-pricing-update/• CloudCostChefs  https://www.cloudcostchefs.com/blog/aws-database-savings-plans• Top Announcements  https://aws.amazon.com/blogs/aws/top-announcements-of-aws-reinvent-2025/• About Amazon News  https://www.aboutamazon.com/news/aws/aws-re-invent-2025-ai-news-updates• TechCrunch Roundup  https://techcrunch.com/2025/12/04/all-the-biggest-news-from-aws-big-tech-show-reinvent-2025• TechRadar Live Coverage  https://www.techradar.com/pro/live/aws-re-invent-2025-all-the-news-and-updates-as-it-happens• AWS Weekly Roundup  https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-reinvent-keynote-recap-on-demand-videos-and-more-december-8-2025/
Perfect for platform engineers, sres, devops engineers, cloud architects looking to level up their platform engineering skills.
Episode URL: https://platformengineeringplaybook.com/podcasts/00052-aws-reinvent-2025-data-ai-wrap-up
Duration: ~25 minutes
Host: Jordan & Alex
Category: TechnologySubcategory: Software How-To
Keywords: wrap, infrastructure, Port, AWS, episode, invent, data

Wednesday Dec 10, 2025

Part 3 of our AWS re:Invent 2025 series. AWS transforms Kubernetes into an AI infrastructure platform with massive scale and AI-native operations.
In this episode:- EKS Ultra Scale: 100,000 nodes per cluster (vs 15K GKE, 5K AKS)—1.6 million Trainium accelerators or 800K GPUs in a single cluster- AWS replaced etcd's Raft consensus with their internal "journal" system and moved to in-memory storage for 500 pods/sec at 100K scale- Anthropic using EKS Ultra Scale for Claude training, improving latency KPIs from 35% to 90%+- EKS Capabilities: Fully managed Argo CD, AWS Controllers for Kubernetes (200+ CRDs for 50+ services), Kube Resource Orchestrator- EKS MCP Server: Natural language Kubernetes management—"show me all pods not running" instead of kubectl- EKS Provisioned Control Plane: XL/2XL/4XL tiers ($1.65-$6.90/hr), 4XL supports 40K nodes- CloudWatch Gen AI Observability: LangChain, LangGraph, CrewAI agent tracing- DevOps Agent (Preview): Autonomous on-call engineer—Kindle saw 80% time savings- CloudWatch unified data store with S3 Tables, OCSF, Apache Iceberg
📰 News Segment Links:• cert-manager v1.19.2 CVE Patches (CVE-2025-61727, CVE-2025-61729)  https://github.com/cert-manager/cert-manager/releases/tag/v1.19.2• cert-manager v1.18.4 Backport  https://github.com/cert-manager/cert-manager/releases/tag/v1.18.4• Canonical Extends Kubernetes Long-Term Support to 15 Years  https://thenewstack.io/canonical-extends-kubernetes-long-term-support-to-15-years/• OpenTofu 1.11 with Ephemeral Resources  https://github.com/opentofu/opentofu/releases/tag/v1.11.0• Cloudflare Shift-Left Enterprise IaC  https://blog.cloudflare.com/shift-left-enterprise-scale/
🔗 Main Content Sources:• EKS Ultra Scale 100K Nodes  https://aws.amazon.com/blogs/containers/amazon-eks-enables-ultra-scale-ai-ml-workloads-with-support-for-100k-nodes-per-cluster/• Under the Hood: EKS Ultra Scale  https://aws.amazon.com/blogs/containers/under-the-hood-amazon-eks-ultra-scale-clusters/• EKS Capabilities Announcement  https://aws.amazon.com/blogs/aws/announcing-amazon-eks-capabilities-for-workload-orchestration-and-cloud-resource-management/• EKS MCP Server  https://aws.amazon.com/blogs/containers/introducing-the-fully-managed-amazon-eks-mcp-server-preview/• EKS Provisioned Control Plane  https://aws.amazon.com/blogs/containers/amazon-eks-introduces-provisioned-control-plane/• Cloud Operations Top 10 Announcements  https://aws.amazon.com/blogs/mt/2025-top-10-announcements-for-aws-cloud-operations/• AI-driven Operations at re:Invent  https://aws.amazon.com/blogs/mt/embracing-ai-driven-operations-and-observability-at-reinvent-2025/
Perfect for platform engineers, SREs, DevOps engineers, and cloud architects looking to level up their platform engineering skills.
Episode URL: https://platformengineering.org/podcasts/00051-aws-reinvent-2025-eks-cloud-operations
Series: AWS re:Invent 2025 (Part 3 of 4)
Episode URL: https://platformengineeringplaybook.com/podcasts/00051-aws-reinvent-2025-eks-cloud-operations
Part 1: The Agentic AI Revolution - https://platformengineeringplaybook.com/podcasts/00049-aws-reinvent-2025-agentic-ai-revolutionPart 2: Infrastructure & Developer Experience - https://platformengineeringplaybook.com/podcasts/00050-aws-reinvent-2025-infrastructure-developer-experience
Category: TechnologySubcategory: Software How-To
Keywords: AWS, re:Invent 2025, EKS, Kubernetes, EKS Ultra Scale, EKS Capabilities, Argo CD, ACK, MCP Server, CloudWatch, DevOps Agent, AIOps, platform engineering

Tuesday Dec 09, 2025

AWS re:Invent 2025 Series (Part 2 of 4)
AWS announces Graviton5 with 192 cores (3x previous gen) and 40% better price-performance vs x86. Trainium 3 delivers 4.4x performance at 50% lower cost, with NeuronLink eliminating 50% network overhead. Lambda Durable Functions enable year-long workflows. Werner Vogels introduces the "Renaissance Developer" framework for the AI era. Plus: BellSoft's hardened Java images cut CVEs by 95%, GitHub Actions package management security gaps exposed, and Proxmox releases VMware escape hatch.
Links & Resources:- Full episode page: https://platformengineering.org/podcasts/00050-aws-reinvent-2025-infrastructure-developer-experience- BellSoft Hardened Images: https://www.infoq.com/news/2025/12/bellsoft-hardened-images/- GitHub Actions Security Critique: https://nesbitt.io/2025/12/06/github-actions-package-manager.html- Proxmox DCM 1.0: https://www.theregister.com/2025/12/05/proxmox_datacenter_manager_1_stable/
Key Topics:- Graviton5: 192 cores (3x), 40% price-performance vs x86, 250M+ ops/sec in-memory- Trainium 3: 4.4x AI training performance, 50% cost reduction, NeuronLink- Lambda Durable Functions: Year-long workflows with context.step/wait- Werner Vogels' Renaissance Developer concept and verification debt

Monday Dec 08, 2025

AWS announces autonomous AI agents that can work for days without human intervention. The DevOps Agent is an always-on incident responder. The Security Agent understands your application architecture. And Kiro is already used by hundreds of thousands of developers.
This is part 1 of our 4-part AWS re:Invent 2025 coverage series.
KEY TOPICS:• Frontier Agents: DevOps Agent, Security Agent, and Kiro• DevOps Agent: 24/7 incident response with human-in-the-loop approval• Security Agent: Context-aware security from design through deployment• Kiro: GA autonomous developer agent used internally at Amazon• Bedrock AgentCore: Policy controls, memory, and 13 evaluation frameworks• Nova Act: 90% reliability on browser automation workflows• Verification debt: Werner Vogels' concept for AI code generation risks
NEWS SEGMENT:• Model Context Protocol (MCP) becomes de facto standard for AI-tool integrations• Oxide Computer Company requires human approval for all AI-generated code
SHOW NOTES:Full transcript and links: https://platformengineeringplaybook.com/podcasts/00049-aws-reinvent-2025-agentic-ai-revolution

Copyright 2025 All rights reserved.

Podcast Powered By Podbean

Version: 20241125