EKS
What is AWS EKS, and why use it for Django?
EKS is AWS’s managed Kubernetes control plane offering. You get a Kubernetes API endpoint and control plane managed by AWS; you maintain (or let AWS manage) the worker nodes. You can deploy containerized applications (like your Django app) onto Kubernetes, letting you leverage:
- Container orchestration: rolling updates, health checks, scaling, self-healing
- Standardization: Kubernetes ecosystem, tooling, portability
- Infrastructure abstraction: separate concerns (deployment, scaling, networking)
- Integration with AWS services: IAM, Load Balancers, VPC, etc
- Flexibility: you can run additional services/pods (e.g. sidecars, microservices)
Using EKS for Django makes sense when you want containerized deployment, scalability, and more complex infrastructure (microservices, sidecars, etc.). But it’s more complex than simpler options (e.g. Elastic Beanstalk, ECS) — so it’s best when your app or team needs the flexibility.
Key Architectural Considerations & Patterns
Here are the building blocks and design choices you’ll want to think through:
Layer / Concern | What you need to decide / configure | Common patterns |
---|---|---|
Containerization | Build a Docker image for Django (with Gunicorn / ASGI, dependencies, static file handling) | Multi-stage builds, minimal base image, healthcheck, entrypoint scripts |
Cluster & Node setup | How many nodes, what instance types, auto-scaling, spot vs on-demand, node groups | Use managed node groups or EKS Fargate (if appropriate) |
Networking / VPC / Subnets | Placement of cluster, security groups, CIDR, cross-AZ, routing | Use private subnets for nodes, control plane accessible via private endpoint or public with restriction |
Service exposure / ingress | How external traffic reaches your Django app | Use LoadBlancers (ALB, NLB) or Ingress + ALB Ingress Controller |
Database / state | Django typically needs a relational DB and often media storage | Use RDS (or Aurora) for database; S3 (with Django storages) or EFS for media; persistent volumes only where needed |
Secrets / configuration | Where to store DB credentials, Django SECRET_KEY, etc | Use Kubernetes Secrets (or external secrets: AWS Secrets Manager, external-secrets operator) |
Scaling & autoscaling | Horizontal Pod Autoscaler (HPA), vertical autoscaling, cluster autoscaler | Use HPA (CPU, memory, or custom metrics), cluster autoscaler (or Karpenter) to dynamically adjust worker nodes |
Observability & logging | Monitoring, metrics, logs, alerting | Use Prometheus + Grafana, CloudWatch, EFK/ELK, metrics-server, dashboards |
CI / CD | How images are built, deployed to EKS | Use GitHub Actions, GitLab CI, Jenkins + kubectl , or GitOps (Flux, ArgoCD) |
Health checks & readiness / liveness | Kubernetes probes to detect and restart unhealthy pods | Define readinessProbe and livenessProbe in your Deployment spec |
Rollouts / updates / rollbacks | How deployments are updated without downtime | Use Deployment strategies (rolling updates, max surge / max unavailable), Canary or Blue/Green if needed |
AWS itself publishes an EKS Best Practices Guide covering reliability, security, performance, cluster autoscaling, networking, cost optimization, and more. ([AWS Documentation][1]) Also, the “Running highly-available applications” section is helpful. ([AWS Documentation][2])
Typical EKS + Django Deployment Flow (End-to-End)
Here’s a walkthrough of a typical workflow / lifecycle for deploying Django on EKS.
Dockerize the Django app
- Create a
Dockerfile
(multi-stage) - Run migrations at startup or via job
- Collect static files (or do this in CI)
- Configure health endpoint (e.g.
/healthz
) - Use environment variables for config
- Create a
Push to a container registry
- Use AWS ECR (Elastic Container Registry)
- Tag, push image from CI pipeline
- Optionally scan images for vulnerabilities
Create / manage EKS cluster
- Use
eksctl
, AWS Console, or IaC (Terraform / CloudFormation) - Create node groups / Fargate profile
- Setup IAM roles, VPC, subnets, networking
- Use
Connect kubectl / kubeconfig
aws eks update-kubeconfig
(or equivalent)- Confirm you can
kubectl get nodes
, etc
Define Kubernetes manifests / Helm charts
- Deployment (with replicas, resources, probes)
- Service (ClusterIP, LoadBalancer)
- Ingress (with ALB ingress controller)
- ConfigMap / Secret for environment variables / credentials
- (Optional) PersistentVolumeClaim / EFS for persistent storage
- (Optional) Job or CronJob for migrations, cleanup
Expose via Load Balancer / Ingress
- Deploy AWS ALB Ingress Controller (or NLB)
- Define Ingress rules and annotations (SSL, path-based routing, etc)
- Use TLS / certs (via AWS Certificate Manager or cert-manager)
Scaling & autoscaling
- Deploy metrics-server (for resource metrics)
- Define Horizontal Pod Autoscaler (HPA) for Django pods
- Configure cluster autoscaler or Karpenter for nodes
Database / storage connectivity
- Ensure Django can reach RDS / Aurora (in same VPC or via VPC peering)
- Secrets management for DB credentials
- Use S3 or EFS for media / file storage
Logging / monitoring / alerting
- Export logs from pods (stdout) to CloudWatch or ELK
- Export metrics via Prometheus
- Setup dashboards, alerts
CI/CD / Rollout
- CI builds new image on push, tags and pushes to ECR
- CD deploys new manifest or updates Deployment (via
kubectl apply
) - Monitor rollout, rollback on failure
- Use GitOps (FluxCD / ArgoCD) for declarative deployment
Day-2 operations & upgrades
- Upgrade Kubernetes versions, node scaling
- Monitor resource usage, cost
- Handle cluster maintenance
- Backup DB, handle failure, scaling
The repository “aws-django-eks-tutorial” shows a working example with Django + EKS + RDS + EFS. ([GitHub][3]) Another guide “Deploying a Django App to Kubernetes with Amazon ECR and EKS” walks through the simpler pipeline. ([DEV Community][4]) ByteGoblin has a more production-grade optimization guide. ([ByteGoblin][5])
Best Practices & Gotchas (with Django-specific notes)
Here are the practices, pitfalls, and things you absolutely want to watch out for:
Availability & resiliency
- Don’t run a single replica in production — always run multiple pods behind a Service or Ingress so if one pod fails, traffic continues. ([AWS Documentation][2])
- Use Pod Disruption Budgets to limit how many pods can be down during maintenance or scaling. ([AWS Documentation][2])
- Spread pods across nodes / AZs using anti-affinity or topology spread constraints so you avoid correlated failures. ([AWS Documentation][2])
- Define readiness probes so Kubernetes knows when a pod is ready to receive traffic, avoiding sending traffic to half-initialized pods.
Scaling & resource tuning
- Set resource requests & limits on your Django pods to avoid CPU/memory interference across pods.
- Use Horizontal Pod Autoscaler (HPA) based on CPU, memory, or custom metrics (e.g. request latency) ([AWS Documentation][2])
- Use Cluster Autoscaler (or Karpenter) so when pods cannot schedule (due to insufficient capacity), new nodes are automatically added.
- Be aware of DB connection limits: if your Django pods scale up quickly, they might overwhelm the database with too many connections. Use connection pooling (pgBouncer) or limit simultaneous connections.
- Use persistent connections (
CONN_MAX_AGE
) so pods don’t repeatedly open/close DB connections unnecessarily.
Secrets & config
- Don’t bake sensitive credentials into container images. Use Kubernetes Secrets, or better yet integrate with AWS Secrets Manager via external-secrets operators.
- Use ConfigMaps for non-sensitive config (feature flags, environment variables).
- If using Django’s
settings.py
to read environment variables, ensure that missing values are handled gracefully (fail fast or fallback). - Be careful with secret rotation: if credentials change under you, pods might fail. You may need logic to restart pods or reconfigure.
Static / media files & state
- Django’s static files: generate them during build / CI, serve them via CDN or external storage (e.g. S3). Don’t store them inside the pod at runtime.
- For user uploads / media: use S3 (via
django-storages
) or EFS (mounted volume) depending on consistency & latency needs. EFS is networked and slower but can allow file sharing across pods. - Use ReadOnly mounts for code, and only mount writable volumes if absolutely required (e.g. for ephemeral caches, temporary files).
Deployment strategies
- Use rolling updates by default (Deployment with max surge / max unavailable) so that new pods are gradually brought up while old ones are taken down.
- For zero downtime, consider canary deployments or blue/green strategies.
- Always include rollback paths: if a deployment fails, have a way to revert.
- Be cautious with migrations: before applying schema changes (especially destructive ones), coordinate with the deployment (e.g. run migrations early, use non-blocking migrations).
- Consider running migrations as a Kubernetes Job separate from the pod startup, so you avoid race conditions.
Networking & ingress
- Use AWS ingress controllers (e.g. AWS ALB Ingress Controller) to integrate Kubernetes with AWS ALB (Application Load Balancer), which supports path-based routing, SSL, etc.
- Use TLS: either via AWS ACM certificates or via cert-manager in cluster.
- If your cluster is in private subnets, configure ingress / NAT / routing properly so traffic flows correctly.
- Use network policies (Kubernetes NetworkPolicy) to enforce pod-to-pod communication controls (e.g. only allow Django pods to talk to DB pods or external databases).
Monitoring, logging, and observability
- Deploy metrics-server (necessary for HPA) ([AWS Documentation][2])
- Use Prometheus + Grafana or AWS managed solutions to collect metrics (CPU, memory, request latencies, etc).
- Export container logs (stdout / stderr) to a centralized system (CloudWatch Logs, EFK/ELK)
- Create dashboards & alerting (e.g. alert on high memory usage, pod crash loops, CPU saturation, DB errors).
- Monitor cluster health (node status, network, etc), not just application.
Security & IAM
- Use IAM roles for service accounts (IRSA) so pods can assume minimal IAM permissions, e.g. to access S3, Secrets Manager, etc
- Avoid giving pods broad AWS permissions. Use fine-grained IAM policies.
- Use RBAC in Kubernetes to limit what actions pods or users can take in the cluster.
- Use Pod security policies (or newer mechanisms like Pod Security Admission) to enforce least-privilege (no root containers, use non-root users, restrict volume types)
- Keep your cluster control plane, nodes, and Kubernetes version up to date (patch vulnerabilities).
- Use network segmentation (e.g. private cluster, private endpoints) where possible.
Operational / day-2 concerns
- Upgrading Kubernetes cluster versions and node groups (rolling upgrades)
- Node replacement, draining, handle disruptions
- Cost control (right-sizing nodes, scaling down idle nodes)
- Backup / restore of persistent data and database
- Disaster recovery planning (e.g. multi-region or backup clusters)
- Handling errors, crash loops, debugging pod logs
- Scaling strategies over time (e.g. moving to microservices)
- Resource quotas, limit ranges to prevent “noisy neighbor” pods
Common gotchas
- Pods not being scheduled due to insufficient node capacity (if cluster autoscaler not configured)
- DB connection explosion — if many pods connect, DB may run out of allowed connections
- Secrets misconfiguration — missing or mis-typed keys causing runtime failures
- Static or media files baked into images causing issues on code updates
- Migrations causing downtime or lockups if handled incorrectly
- Rolling update failures that cause downtime if probes or resource limits not tuned
- Not having readiness probes or using them incorrectly — sending traffic to unready pods
- Not propagating environment variables properly across containers/pods
- Version mismatches or incompatibilities between Django, dependencies, and container base images
Sample Minimal Setup (Manifest Sketch)
Here’s a minimal Kubernetes manifest sketch (not complete) for a Django app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: django-app
spec:
replicas: 3
selector:
matchLabels:
app: django
template:
metadata:
labels:
app: django
spec:
serviceAccountName: django-sa # for IRSA IAM
containers:
- name: django
image: <your-ecr-uri>:latest
ports:
- containerPort: 8000
env:
- name: DJANGO_SETTINGS_MODULE
value: myproject.settings.production
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: django-secrets
key: DATABASE_URL
readinessProbe:
httpGet:
path: /healthz/
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz/
port: 8000
initialDelaySeconds: 15
periodSeconds: 20
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: django-svc
spec:
type: ClusterIP
selector:
app: django
ports:
- port: 80
targetPort: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: django-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/cert-arn: arn:aws:acm:… # for TLS
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: django-svc
port:
number: 80
tls:
- hosts:
- example.com
secretName: django-tls
You’ll also define a Secret
for your database URL, perhaps using an external-secrets operator or IRSA to populate it from AWS Secrets Manager.
Then define an HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: django-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: django-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
And ensure cluster autoscaler or Karpenter is configured to scale nodes when pods aren’t schedulable.
When (or When Not) to Use EKS for Django
Good use cases:
- You’re already containerizing or microservices-based
- You want portability across clouds / environments
- You need advanced orchestration, scaling, sidecars, custom logic
- You plan to host more than just Django (e.g. background workers, event processors)
- You want full control over deployments, lifecycle, scaling, updates
When it might be overkill:
- If your app is simple and doesn’t need that complexity
- If you don’t want the operational burden of managing Kubernetes concepts
- If you want something more “managed” (e.g. ECS, Fargate, Elastic Beanstalk)
- If your team is smaller or not comfortable with Kubernetes