CloudMap Kubernetes Controller
A lightweight Kubernetes controller that integrates Kubernetes headless services with AWS Cloud Map to enable seamless service discovery, compatible with existing ECS-based architectures.
π¨ Problem Statement
In our current infrastructure, we operate ~200 services on Amazon ECS, utilizing AWS Cloud Map for service discovery, which in turn creates Route 53 DNS records for those ECS services.
However, we faced the following operational challenges:
-
β Code Modification Requirement: Developers were expected to manually update service endpoint references in code. This was time-consuming and error-prone.
-
β Unknown Dependencies: We lacked visibility into which services consumed others via DNS. Accidental changes could break production workloads.
-
π Scalability Concerns: With hundreds of services, manual intervention became unmanageable and risky.
β Solution Approach
We decided to migrate workloads to EKS, but we wanted to retain Cloud Map as the central discovery mechanism. This would ensure backward compatibility without requiring any code changes.
Tried: ExternalDNS
We initially evaluated external-dns, which works well for public domains. However:
- It requires a resolvable DNS domain name (e.g.,
abc.example.com) - In our case, Cloud Map used namespaces like
test-namespace(i.e., not tied to a public domain) - As a result,
external-dnscould not find the namespace and skipped registration
π οΈ What This Controller Does
This custom controller addresses the above challenges by:
- β Automatically registering headless services in AWS Cloud Map
- β
Preserving service hostnames as-is (
nginx-serviceintest-namespace) - β Supporting cross-namespace service discovery
- β Handling replica scaling events (Pod IPs update automatically)
- β Cleaning up stale IPs when pods go offline
- β Performing drift detection every 60s to re-sync any Cloud Map inconsistencies
- β Stale IP cleanup
- β Performing Conflict resolution (multiple services claiming same DNS name)
π§ How It Works
- You create a Kubernetes headless service (ClusterIP: None) with annotations:
annotations:
cloudmap.controller/namespace: "test-namespace"
cloudmap.controller/hostname: "nginx-service"
- The controller watches for:
Servicecreation/deletion eventsEndpointsevents to track pod IPs
- For each matching service:
- Creates a Cloud Map service (if it doesnβt exist)
- Registers all pod IPs as
AWS_INSTANCE_IPV4records - Performs drift reconciliation periodically (re-registers deleted records)
- Deletion of services automatically:
- De-registers Cloud Map records
- Releases domain ownership tracking
π¦ Features Summary
| Feature | Status |
|---|---|
| π Annotation-based config | β |
| π§ Headless service support | β |
| π Cross-namespace compatibility | β |
| π Replica count changes tracked | β |
| π Drift detection + re-registration | β |
| β Selective de-registration | β |
| βΈοΈ Pod lifecycle aware | β |
| π§Ό Safe domain conflict detection | β |
| π‘ Compatible with ECS Cloud Map names | β |
π Getting Started
π¦ Project Structure
controller/
βββ cloudmap.py # Cloud Map integration logic (register/deregister, drift sync)
βββ k8s_watcher.py # Watches Kubernetes services/endpoints, triggers Cloud Map updates
example # Example deployment manifest files
main.py # Entry point to start the controller
Dockerfile # Multi-stage build for production-ready container
βΈοΈ How to Deploy the Controller on Kubernetes
1. Build and Push Docker Image
docker build -t <your-dockerhub>/cloudmap-controller:latest .
docker push <your-dockerhub>/cloudmap-controller:latest
2. Create Kubernetes Resources
Namespace
apiVersion: v1
kind: Namespace
metadata:
name: kube-system
π IAM Permissions Required
To run this controller in AWS EKS, ensure the pod role or IRSA has these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"route53:GetHostedZone",
"route53:ListHostedZonesByName",
"route53:CreateHostedZone",
"route53:DeleteHostedZone",
"route53:ChangeResourceRecordSets",
"route53:CreateHealthCheck",
"route53:GetHealthCheck",
"route53:DeleteHealthCheck",
"route53:UpdateHealthCheck",
"ec2:DescribeVpcs",
"ec2:DescribeRegions",
"servicediscovery:*"
],
"Effect": "Allow",
"Resource": [
"*"
]
}
]
}
Service Account & RBAC
IAM Role Trusted entities Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<aws_accound_id>:oidc-provider/oidc.eks.ap-south-1.amazonaws.com/id/<oidc_id>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<region>.amazonaws.com/id/<oidc_id>:sub": [
"system:serviceaccount:kube-system:cloudmap-controller"
],
"oidc.eks.ap-south-1.amazonaws.com/id/<oidc_id>:aud": "sts.amazonaws.com"
}
}
}
]
}
kubectl get sa cloudmap-controller -n kube-system -oyaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<aws_accound_id>:role/<IAM_role_name>
name: cloudmap-controller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cloudmap-controller-role
rules:
- apiGroups: [""]
resources: ["services", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: ["events.k8s.io"]
resources: ["events"]
verbs: ["create", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cloudmap-controller-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cloudmap-controller-role
subjects:
- kind: ServiceAccount
name: cloudmap-controller
namespace: kube-system
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cloudmap-controller
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cloudmap-controller
template:
metadata:
labels:
app: cloudmap-controller
spec:
serviceAccountName: cloudmap-controller
containers:
- name: controller
image: <your-dockerhub>/cloudmap-controller:latest
imagePullPolicy: Always
π§ͺ Example Test Service Manifest
apiVersion: v1
kind: Service
metadata:
name: nginx-ecs-service
namespace: default
annotations:
cloudmap.controller/namespace: "test-namespace"
cloudmap.controller/hostname: "nginx-ecs-service"
spec:
clusterIP: None
selector:
app: nginx
ports:
- name: http
port: 80
targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
π Observability
This controller emits Kubernetes Events that can be viewed via:
kubectl get events -n <namespace>
# controller logs
kubectl logs deploy/cloudmap-controller -n kube-system -f
Example:
Registered 172.16.10.45 to nginx-serviceDeregistered stale IP 172.16.12.110 from nginx-serviceDomainConflict: nginx-service already claimed by other service
π License
Apache 2.0 License.
π Contributions
PRs welcome! This is a focused solution but extensible for health checks, weighted routing, TTLs, and more.
π§ Inspired By
- AWS Cloud Map
- Kubernetes ExternalDNS
- Service Mesh-less discovery