EC2
What is AWS EC2?
- EC2 is Amazon’s virtual server (compute) service: you can spin up “instances” (virtual machines) in the AWS cloud, with your choice of OS, size, storage, networking, etc. ([AWS Documentation][1])
- It provides scalable, on-demand compute: you pay for what you use (by the second or hour) and can scale up/down as needed. ([AWS Documentation][1])
- It’s foundational: many AWS services (EKS worker nodes, ECS EC2 mode, etc.) rely on EC2 under the hood. ([AWS Documentation][1])
Core Concepts & Components
To use EC2 well, you need to understand the building blocks. Here are the main ones:
Concept | Description / Role | Notes & tips |
---|---|---|
Instance | A running virtual machine (VM) in AWS | You choose an AMI, instance type, key pair, etc. |
AMI (Amazon Machine Image) | A template (OS + software) used to launch instances | You can use AWS-provided AMIs or build your own (custom) ones. ([Wikipedia][2]) |
Instance Types | T2 / T3 / M / C / R / etc — different sizes & resource profiles | You match instance type to your workload (CPU, memory, I/O). ([AWS Documentation][3]) |
Storage: EBS, Instance Store | Persistent block storage (EBS) or ephemeral storage (instance store) | EBS volumes can be detached/reattached, snapshot, etc. |
Key Pairs & SSH | You use key pairs to log into Linux instances (SSH) | The public key is stored in AWS, and you keep the private key. |
Security Groups / Network / VPC | Virtual network boundaries, firewall rules, subnet placement | You control what traffic can reach your instances (inbound/outbound) |
Elastic IPs | Static IP addresses you can assign to instances | Useful if you need a fixed public IP. |
Elastic Load Balancer (ELB) | Distribute incoming traffic across multiple EC2 instances | For high availability, scale, redundancy |
Auto Scaling / Scaling Groups | Automatically adjust EC2 count based on metrics or schedule | Ensures you have enough capacity, but not wasteful |
Pricing Models | On-demand, Reserved, Spot, Savings Plans, Dedicated Hosts | Each has tradeoffs (cost, flexibility, availability) |
IAM / Roles | Permissions, instance profiles, roles attached to instances | Let instances access AWS APIs securely without embedding credentials |
How EC2 Works — The Lifecycle
Here’s a typical flow when working with EC2:
- Choose region & VPC / subnets
- Select or build an AMI (OS + software)
- Choose instance type (size, CPU, memory, etc.)
- Configure networking / security (VPC, subnets, routing, Internet gateway, security groups)
- Set storage (EBS volume sizes, snapshot, IOPS)
- Associate key pair / SSH access
- Launch the instance
- Connect (SSH / RDP / etc.)
- Configure / deploy software
- Monitor & scale
- Tear down or stop when no longer needed
You can do the above via the AWS Console, AWS CLI, SDKs, or IaC (Terraform, CloudFormation).
Using EC2 in a Django Project
When you build a Django app and host it on EC2, here are the things to plan/consider:
Architecture sketch
Clients → (optional: CDN / CloudFront) → ELB / ALB → EC2 (Django App)
↳ (maybe auto-scaling group)
EC2 (Django) → RDS (or external DB)
EC2 (Django) → S3 for media / static
Optional: Cache (Redis/ElastiCache), Worker instances (Celery), etc.
Setup details & best practices
AMI / Image preparation Prepare an AMI with all dependencies (Python, libraries, OS packages) so that new instances boot ready. Use immutable infrastructure approach: new code = new instance/image, rather than SSHing into live machines.
Security & networking Use private subnets for internal services (like DB), public subnets for web-facing EC2. Set up security groups: only allow necessary ports (e.g. 80/443 inbound to web tier, 22 only from narrow IPs). Use an Application Load Balancer (ALB) to distribute traffic.
Scaling / high availability Use Auto Scaling Groups (minimum, maximum instance counts). EC2 startup time matters — your AMI/setup must be fast so scaling is responsive. Use health checks (e.g. via ALB) to avoid routing to unhealthy instances.
Data persistence / state Don’t store user uploads or media on the local filesystem of EC2 (ephemeral). Use S3 or EFS. Database should be external (RDS, Aurora) — not on EC2 (unless small scale or special requirement). Cache or session state should be external (Redis, ElastiCache).
Monitoring & logging Use CloudWatch metrics (CPU, memory, network). Note: memory/disk metrics require the CloudWatch agent installed on the instance. Push application logs to a centralized log system (CloudWatch Logs, ELK stack). Set alarms on critical thresholds (CPU, memory, disk, load).
Deployment & updates Use CI/CD pipelines to build a new AMI or deploy via rolling updates. Use blue/green or canary approaches if you need zero downtime. For code updates, either replace instances or use tools (Ansible, Chef) but immutable is safer.
Security maintenance Regular OS patching (rebuild new AMIs, replace instances). Use IAM roles attached to instances (instance profiles) rather than embedding AWS credentials. Restrict SSH access (e.g. via bastion jump host, VPN). Use encryption (TLS) for web traffic, and optionally disk encryption (EBS encryption).
Cost optimization Right-size instance types. Use Spot instances where acceptable (e.g. worker nodes). Use Reserved Instances or Savings Plans for baseline usage. Terminate idle instances. Monitor data transfer costs (especially cross-AZ, cross-region).
EC2 Instance Types & Hardware Options
The instance type is one of the most important decisions. You match instance type to your workload (compute, memory, I/O, GPU, etc.). ([AWS Documentation][3])
Categories include:
- General purpose (e.g.
t3
,m5
) — balanced compute/memory - Compute optimized (e.g.
c5
) — more CPU per memory - Memory optimized (e.g.
r5
,x1
) — good for DB, in-memory workloads - Storage / I/O optimized (e.g.
i3
) — for high disk IOPS - Accelerated / GPU / ML instances (e.g.
p3
,g4
) — for ML / GPU workloads - Burstable / micro / low-cost (e.g.
t3.micro
) — lower baseline, bursts allowed
Also, Graviton (ARM-based) instances are available (e.g. m6g
, c6g
), often with better price/performance for certain workloads. ([Wikipedia][4])
You must also consider networking limits and EBS performance limits per instance type.
Pricing & Cost Models
EC2 offers multiple pricing models. Understanding these is key to optimizing cost.
Pricing Model | Description | Use cases / tradeoffs |
---|---|---|
On-Demand | Pay for compute by the hour or second, no long-term commitment | Good for unpredictable workloads, dev/test, bursty capacity |
Reserved Instances / Savings Plans | Commit to usage for 1 or 3 years in exchange for discount | Good for baseline / steady workloads |
Spot Instances | Use excess capacity at steep discount (with the risk of interruption) | Best for fault-tolerant workloads (batch, workers) |
Dedicated Hosts / Instances | Physical isolation | For licensing, compliance, isolation |
Capacity Reservations | Reserve capacity in a specific AZ | If you need guaranteed capacity at a moment in time |
Also consider EBS costs, data transfer costs (especially across AZs or regions), and elastic IP / unused elastic IP charges.
Fault Tolerance, HA & Scaling
To build resilient architecture using EC2:
- Spread instances across multiple Availability Zones (AZs)
- Use an Auto Scaling Group with a min / max / desired count
- Use Elastic Load Balancer (Application LB, Network LB) in front
- Use health checks so unhealthy instances are replaced
- Use stateless architecture (instances don’t hold critical user state)
- Use immutable deployments so updates don’t break running instances
- If necessary, use backup / snapshotting of EBS volumes
- Test failure scenarios (terminate instance, degrade AZ) to ensure system recovers
Common Pitfalls & Gotchas
- Not using IAM instance roles — embedding AWS credentials in code is insecure
- Storing state / media on instance disk — gets lost if instance is replaced
- Overprovisioning instance sizes (wasted cost)
- Long startup / bootstrap time — if your instance’s init scripts are slow, scaling will lag
- Not monitoring memory / disk — EC2’s basic metrics don’t capture memory usage unless agent installed
- Ignoring spot instance interruption risk — if spot is used for critical tasks without fallback, your app may break
- Security group misconfiguration — opening too wide (e.g. SSH 0.0.0.0/0)
- Cross-AZ data transfer costs — moving lots of data between AZs costs money
- Not automating deployments — manual changes lead to drifting, configuration inconsistencies
- Ignoring OS patching / updates — leaving instances unpatched is a security risk
Example: Bootstrapping a Django App on EC2 (Simplified)
Here’s a high-level walkthrough:
Create an AMI
- Start from base (Ubuntu, Amazon Linux)
- Install Python, dependencies, pip, etc
- Clone Django app, static files tooling
- Bake into a custom AMI
Set up network / VPC
- Public and private subnets
- Internet Gateway, NAT, routing
- Security groups for web, SSH, DB access
Launch EC2 instances via Auto Scaling Group
- Use the AMI
- Assign IAM instance role (for S3, RDS access)
- Attach to security group
- Add user data script (if needed) to run
migrate
,collectstatic
or startup entrypoint
Put an Application Load Balancer (ALB) in front
- Forward HTTP/HTTPS to EC2 instances
- Set health checks (e.g.
GET /healthz
)
Set up external services
- RDS for database
- S3 for static / media
- Redis (ElastiCache) for caching / sessions
Deploy new code / rolling update
- Build new AMI or deploy via script
- Update ASG (use rolling update or blue/green)
Monitor & manage
- Install CloudWatch agent for memory, disk metrics
- Set alarms
- Ensure logs are shipped to central system