How To Auto-Delete Ephemeral Cloud Resources With Kubernetes
How many times have you been surprised by your cloud bill?
Upon further investigation, you realized a meaningful percentage of the cost is due to no-longer-used resources. However, it’s not easy to implement a centralized solution to systematically delete these unused resources without human intervention.
We usually resort to asking developer teams to inspect their cloud accounts, identify unneeded resources, and delete them to reduce costs. And again next month. Even if many of the resources are technically being used, in many cases it’s fine to delete them and re-create them on command (or custom logic), rather than having them run continuously — especially in non-production environments.
Let’s use AWS as an example: Users create ephemeral S3 buckets or EC2 instances (or a combination of resources together), typically in sandbox/development accounts, to perform their work for the day. Whether it’s manual in the console or through CDK/CloudFormation (IaC), it’s easy to forget to delete the resources before heading home. And if they finished their work that day, it’s even easier to forget to delete them at all.
One effective solution to this problem is to build a centralized Kubernetes platform and manage AWS resources using Crossplane. Rather than enforcing the use of IaC or resource cleanup practices at the level of individual teams, a central platform engineering team can expose APIs to enable users to create specific AWS resources. The platform engineering team does use Terraform to deploy the highly-available platform resources themselves (EKS, VPC, etc.) to enable users across the organization to self-service.
Users don’t even need to be aware of the specific cloud provider, or the name of the resources. For example, the creation of an S3 bucket can be abstracted through a “Storage” API. Users simply submit the storage request (called a “claim”), and the S3 bucket (or S3 bucket + IAM role + EventBridge rule as a logical unit) is automatically created and managed by Kubernetes as a composite resource. This approach fits into a broader shift toward developer self-service and platform engineering maturity.
And now that you are in a Kubernetes environment, you can build custom controllers for just about any automation you can imagine. Going back to the cloud bill problem, you can build a custom controller to watch all claims across the cluster (each team could have its own namespace). If a given claim meets a specific condition (i.e. created more than 12 hours ago), the controller’s reconciliation loop will ensure the corresponding cloud resources are automatically deleted. Moreover, the controller pod is managed through a Deployment (ReplicaSet), and Kubernetes will ensure it’s always available to process claim events. This controller saves time, reduces cloud costs, and gives platform teams full control over ephemeral infrastructure — without slowing developers down.
I recently built this controller as part of a Crossplane platform built with Terraform on AWS EKS. Try it out and let me know how you like it: https://github.com/CarlosLaraFP/k8s-platform
What are automation problems your team has encountered and/or solved recently? Comment below!
Senior Software Engineer, Infrastructure | Go | Kubernetes
2moFirst in a weekly platform engineering series — would love your feedback. Bharad Narayanan Larry Reed