
Wednesday Dec 10, 2025
AWS re:Invent 2025 Recap Part 3/4 - EKS & Cloud Operations
Part 3 of our AWS re:Invent 2025 series. AWS transforms Kubernetes into an AI infrastructure platform with massive scale and AI-native operations.
In this episode:
- EKS Ultra Scale: 100,000 nodes per cluster (vs 15K GKE, 5K AKS)—1.6 million Trainium accelerators or 800K GPUs in a single cluster
- AWS replaced etcd's Raft consensus with their internal "journal" system and moved to in-memory storage for 500 pods/sec at 100K scale
- Anthropic using EKS Ultra Scale for Claude training, improving latency KPIs from 35% to 90%+
- EKS Capabilities: Fully managed Argo CD, AWS Controllers for Kubernetes (200+ CRDs for 50+ services), Kube Resource Orchestrator
- EKS MCP Server: Natural language Kubernetes management—"show me all pods not running" instead of kubectl
- EKS Provisioned Control Plane: XL/2XL/4XL tiers ($1.65-$6.90/hr), 4XL supports 40K nodes
- CloudWatch Gen AI Observability: LangChain, LangGraph, CrewAI agent tracing
- DevOps Agent (Preview): Autonomous on-call engineer—Kindle saw 80% time savings
- CloudWatch unified data store with S3 Tables, OCSF, Apache Iceberg
📰 News Segment Links:
• cert-manager v1.19.2 CVE Patches (CVE-2025-61727, CVE-2025-61729)
https://github.com/cert-manager/cert-manager/releases/tag/v1.19.2
• cert-manager v1.18.4 Backport
https://github.com/cert-manager/cert-manager/releases/tag/v1.18.4
• Canonical Extends Kubernetes Long-Term Support to 15 Years
https://thenewstack.io/canonical-extends-kubernetes-long-term-support-to-15-years/
• OpenTofu 1.11 with Ephemeral Resources
https://github.com/opentofu/opentofu/releases/tag/v1.11.0
• Cloudflare Shift-Left Enterprise IaC
https://blog.cloudflare.com/shift-left-enterprise-scale/
🔗 Main Content Sources:
• EKS Ultra Scale 100K Nodes
https://aws.amazon.com/blogs/containers/amazon-eks-enables-ultra-scale-ai-ml-workloads-with-support-for-100k-nodes-per-cluster/
• Under the Hood: EKS Ultra Scale
https://aws.amazon.com/blogs/containers/under-the-hood-amazon-eks-ultra-scale-clusters/
• EKS Capabilities Announcement
https://aws.amazon.com/blogs/aws/announcing-amazon-eks-capabilities-for-workload-orchestration-and-cloud-resource-management/
• EKS MCP Server
https://aws.amazon.com/blogs/containers/introducing-the-fully-managed-amazon-eks-mcp-server-preview/
• EKS Provisioned Control Plane
https://aws.amazon.com/blogs/containers/amazon-eks-introduces-provisioned-control-plane/
• Cloud Operations Top 10 Announcements
https://aws.amazon.com/blogs/mt/2025-top-10-announcements-for-aws-cloud-operations/
• AI-driven Operations at re:Invent
https://aws.amazon.com/blogs/mt/embracing-ai-driven-operations-and-observability-at-reinvent-2025/
Perfect for platform engineers, SREs, DevOps engineers, and cloud architects looking to level up their platform engineering skills.
Episode URL: https://platformengineering.org/podcasts/00051-aws-reinvent-2025-eks-cloud-operations
Series: AWS re:Invent 2025 (Part 3 of 4)
Episode URL: https://platformengineeringplaybook.com/podcasts/00051-aws-reinvent-2025-eks-cloud-operations
Part 1: The Agentic AI Revolution - https://platformengineeringplaybook.com/podcasts/00049-aws-reinvent-2025-agentic-ai-revolution
Part 2: Infrastructure & Developer Experience - https://platformengineeringplaybook.com/podcasts/00050-aws-reinvent-2025-infrastructure-developer-experience
Category: Technology
Subcategory: Software How-To
Keywords: AWS, re:Invent 2025, EKS, Kubernetes, EKS Ultra Scale, EKS Capabilities, Argo CD, ACK, MCP Server, CloudWatch, DevOps Agent, AIOps, platform engineering
No comments yet. Be the first to say something!