EC2

Here’s a thorough guide to AWS EC2 (Elastic Compute Cloud) — concepts, best practices, how you’d use it in a project (e.g. Django), tradeoffs, pitfalls, and a sample setup. Let me know if you want a boilerplate (Terraform / CloudFormation + Django + EC2) you can drop in.

Author

Benedict Thekkel

What is AWS EC2?

EC2 is Amazon’s virtual server (compute) service: you can spin up “instances” (virtual machines) in the AWS cloud, with your choice of OS, size, storage, networking, etc. ([AWS Documentation][1])
It provides scalable, on-demand compute: you pay for what you use (by the second or hour) and can scale up/down as needed. ([AWS Documentation][1])
It’s foundational: many AWS services (EKS worker nodes, ECS EC2 mode, etc.) rely on EC2 under the hood. ([AWS Documentation][1])

Core Concepts & Components

To use EC2 well, you need to understand the building blocks. Here are the main ones:

Concept	Description / Role	Notes & tips
Instance	A running virtual machine (VM) in AWS	You choose an AMI, instance type, key pair, etc.
AMI (Amazon Machine Image)	A template (OS + software) used to launch instances	You can use AWS-provided AMIs or build your own (custom) ones. ([Wikipedia][2])
Instance Types	T2 / T3 / M / C / R / etc — different sizes & resource profiles	You match instance type to your workload (CPU, memory, I/O). ([AWS Documentation][3])
Storage: EBS, Instance Store	Persistent block storage (EBS) or ephemeral storage (instance store)	EBS volumes can be detached/reattached, snapshot, etc.
Key Pairs & SSH	You use key pairs to log into Linux instances (SSH)	The public key is stored in AWS, and you keep the private key.
Security Groups / Network / VPC	Virtual network boundaries, firewall rules, subnet placement	You control what traffic can reach your instances (inbound/outbound)
Elastic IPs	Static IP addresses you can assign to instances	Useful if you need a fixed public IP.
Elastic Load Balancer (ELB)	Distribute incoming traffic across multiple EC2 instances	For high availability, scale, redundancy
Auto Scaling / Scaling Groups	Automatically adjust EC2 count based on metrics or schedule	Ensures you have enough capacity, but not wasteful
Pricing Models	On-demand, Reserved, Spot, Savings Plans, Dedicated Hosts	Each has tradeoffs (cost, flexibility, availability)
IAM / Roles	Permissions, instance profiles, roles attached to instances	Let instances access AWS APIs securely without embedding credentials

How EC2 Works — The Lifecycle

Here’s a typical flow when working with EC2:

Choose region & VPC / subnets
Select or build an AMI (OS + software)
Choose instance type (size, CPU, memory, etc.)
Configure networking / security (VPC, subnets, routing, Internet gateway, security groups)
Set storage (EBS volume sizes, snapshot, IOPS)
Associate key pair / SSH access
Launch the instance
Connect (SSH / RDP / etc.)
Configure / deploy software
Monitor & scale
Tear down or stop when no longer needed

You can do the above via the AWS Console, AWS CLI, SDKs, or IaC (Terraform, CloudFormation).

Using EC2 in a Django Project

When you build a Django app and host it on EC2, here are the things to plan/consider:

Architecture sketch

Clients → (optional: CDN / CloudFront) → ELB / ALB → EC2 (Django App)  
                    ↳ (maybe auto-scaling group)  
EC2 (Django) → RDS (or external DB)  
EC2 (Django) → S3 for media / static  
Optional: Cache (Redis/ElastiCache), Worker instances (Celery), etc.

Setup details & best practices

AMI / Image preparation Prepare an AMI with all dependencies (Python, libraries, OS packages) so that new instances boot ready. Use immutable infrastructure approach: new code = new instance/image, rather than SSHing into live machines.
Security & networking Use private subnets for internal services (like DB), public subnets for web-facing EC2. Set up security groups: only allow necessary ports (e.g. 80/443 inbound to web tier, 22 only from narrow IPs). Use an Application Load Balancer (ALB) to distribute traffic.
Scaling / high availability Use Auto Scaling Groups (minimum, maximum instance counts). EC2 startup time matters — your AMI/setup must be fast so scaling is responsive. Use health checks (e.g. via ALB) to avoid routing to unhealthy instances.
Data persistence / state Don’t store user uploads or media on the local filesystem of EC2 (ephemeral). Use S3 or EFS. Database should be external (RDS, Aurora) — not on EC2 (unless small scale or special requirement). Cache or session state should be external (Redis, ElastiCache).
Monitoring & logging Use CloudWatch metrics (CPU, memory, network). Note: memory/disk metrics require the CloudWatch agent installed on the instance. Push application logs to a centralized log system (CloudWatch Logs, ELK stack). Set alarms on critical thresholds (CPU, memory, disk, load).
Deployment & updates Use CI/CD pipelines to build a new AMI or deploy via rolling updates. Use blue/green or canary approaches if you need zero downtime. For code updates, either replace instances or use tools (Ansible, Chef) but immutable is safer.
Security maintenance Regular OS patching (rebuild new AMIs, replace instances). Use IAM roles attached to instances (instance profiles) rather than embedding AWS credentials. Restrict SSH access (e.g. via bastion jump host, VPN). Use encryption (TLS) for web traffic, and optionally disk encryption (EBS encryption).
Cost optimization Right-size instance types. Use Spot instances where acceptable (e.g. worker nodes). Use Reserved Instances or Savings Plans for baseline usage. Terminate idle instances. Monitor data transfer costs (especially cross-AZ, cross-region).

EC2 Instance Types & Hardware Options

The instance type is one of the most important decisions. You match instance type to your workload (compute, memory, I/O, GPU, etc.). ([AWS Documentation][3])

Categories include:

General purpose (e.g. t3, m5) — balanced compute/memory
Compute optimized (e.g. c5) — more CPU per memory
Memory optimized (e.g. r5, x1) — good for DB, in-memory workloads
Storage / I/O optimized (e.g. i3) — for high disk IOPS
Accelerated / GPU / ML instances (e.g. p3, g4) — for ML / GPU workloads
Burstable / micro / low-cost (e.g. t3.micro) — lower baseline, bursts allowed

Also, Graviton (ARM-based) instances are available (e.g. m6g, c6g), often with better price/performance for certain workloads. ([Wikipedia][4])

You must also consider networking limits and EBS performance limits per instance type.

Pricing & Cost Models

EC2 offers multiple pricing models. Understanding these is key to optimizing cost.

Pricing Model	Description	Use cases / tradeoffs
On-Demand	Pay for compute by the hour or second, no long-term commitment	Good for unpredictable workloads, dev/test, bursty capacity
Reserved Instances / Savings Plans	Commit to usage for 1 or 3 years in exchange for discount	Good for baseline / steady workloads
Spot Instances	Use excess capacity at steep discount (with the risk of interruption)	Best for fault-tolerant workloads (batch, workers)
Dedicated Hosts / Instances	Physical isolation	For licensing, compliance, isolation
Capacity Reservations	Reserve capacity in a specific AZ	If you need guaranteed capacity at a moment in time

Also consider EBS costs, data transfer costs (especially across AZs or regions), and elastic IP / unused elastic IP charges.

Fault Tolerance, HA & Scaling

To build resilient architecture using EC2:

Spread instances across multiple Availability Zones (AZs)
Use an Auto Scaling Group with a min / max / desired count
Use Elastic Load Balancer (Application LB, Network LB) in front
Use health checks so unhealthy instances are replaced
Use stateless architecture (instances don’t hold critical user state)
Use immutable deployments so updates don’t break running instances
If necessary, use backup / snapshotting of EBS volumes
Test failure scenarios (terminate instance, degrade AZ) to ensure system recovers

Common Pitfalls & Gotchas

Not using IAM instance roles — embedding AWS credentials in code is insecure
Storing state / media on instance disk — gets lost if instance is replaced
Overprovisioning instance sizes (wasted cost)
Long startup / bootstrap time — if your instance’s init scripts are slow, scaling will lag
Not monitoring memory / disk — EC2’s basic metrics don’t capture memory usage unless agent installed
Ignoring spot instance interruption risk — if spot is used for critical tasks without fallback, your app may break
Security group misconfiguration — opening too wide (e.g. SSH 0.0.0.0/0)
Cross-AZ data transfer costs — moving lots of data between AZs costs money
Not automating deployments — manual changes lead to drifting, configuration inconsistencies
Ignoring OS patching / updates — leaving instances unpatched is a security risk

Example: Bootstrapping a Django App on EC2 (Simplified)

Here’s a high-level walkthrough:

Create an AMI
- Start from base (Ubuntu, Amazon Linux)
- Install Python, dependencies, pip, etc
- Clone Django app, static files tooling
- Bake into a custom AMI
Set up network / VPC
- Public and private subnets
- Internet Gateway, NAT, routing
- Security groups for web, SSH, DB access
Launch EC2 instances via Auto Scaling Group
- Use the AMI
- Assign IAM instance role (for S3, RDS access)
- Attach to security group
- Add user data script (if needed) to run migrate, collectstatic or startup entrypoint
Put an Application Load Balancer (ALB) in front
- Forward HTTP/HTTPS to EC2 instances
- Set health checks (e.g. GET /healthz)
Set up external services
- RDS for database
- S3 for static / media
- Redis (ElastiCache) for caching / sessions
Deploy new code / rolling update
- Build new AMI or deploy via script
- Update ASG (use rolling update or blue/green)
Monitor & manage
- Install CloudWatch agent for memory, disk metrics
- Set alarms
- Ensure logs are shipped to central system