EC2

Here’s a thorough guide to AWS EC2 (Elastic Compute Cloud) — concepts, best practices, how you’d use it in a project (e.g. Django), tradeoffs, pitfalls, and a sample setup. Let me know if you want a boilerplate (Terraform / CloudFormation + Django + EC2) you can drop in.
Author

Benedict Thekkel

What is AWS EC2?

  • EC2 is Amazon’s virtual server (compute) service: you can spin up “instances” (virtual machines) in the AWS cloud, with your choice of OS, size, storage, networking, etc. ([AWS Documentation][1])
  • It provides scalable, on-demand compute: you pay for what you use (by the second or hour) and can scale up/down as needed. ([AWS Documentation][1])
  • It’s foundational: many AWS services (EKS worker nodes, ECS EC2 mode, etc.) rely on EC2 under the hood. ([AWS Documentation][1])

Core Concepts & Components

To use EC2 well, you need to understand the building blocks. Here are the main ones:

Concept Description / Role Notes & tips
Instance A running virtual machine (VM) in AWS You choose an AMI, instance type, key pair, etc.
AMI (Amazon Machine Image) A template (OS + software) used to launch instances You can use AWS-provided AMIs or build your own (custom) ones. ([Wikipedia][2])
Instance Types T2 / T3 / M / C / R / etc — different sizes & resource profiles You match instance type to your workload (CPU, memory, I/O). ([AWS Documentation][3])
Storage: EBS, Instance Store Persistent block storage (EBS) or ephemeral storage (instance store) EBS volumes can be detached/reattached, snapshot, etc.
Key Pairs & SSH You use key pairs to log into Linux instances (SSH) The public key is stored in AWS, and you keep the private key.
Security Groups / Network / VPC Virtual network boundaries, firewall rules, subnet placement You control what traffic can reach your instances (inbound/outbound)
Elastic IPs Static IP addresses you can assign to instances Useful if you need a fixed public IP.
Elastic Load Balancer (ELB) Distribute incoming traffic across multiple EC2 instances For high availability, scale, redundancy
Auto Scaling / Scaling Groups Automatically adjust EC2 count based on metrics or schedule Ensures you have enough capacity, but not wasteful
Pricing Models On-demand, Reserved, Spot, Savings Plans, Dedicated Hosts Each has tradeoffs (cost, flexibility, availability)
IAM / Roles Permissions, instance profiles, roles attached to instances Let instances access AWS APIs securely without embedding credentials

How EC2 Works — The Lifecycle

Here’s a typical flow when working with EC2:

  1. Choose region & VPC / subnets
  2. Select or build an AMI (OS + software)
  3. Choose instance type (size, CPU, memory, etc.)
  4. Configure networking / security (VPC, subnets, routing, Internet gateway, security groups)
  5. Set storage (EBS volume sizes, snapshot, IOPS)
  6. Associate key pair / SSH access
  7. Launch the instance
  8. Connect (SSH / RDP / etc.)
  9. Configure / deploy software
  10. Monitor & scale
  11. Tear down or stop when no longer needed

You can do the above via the AWS Console, AWS CLI, SDKs, or IaC (Terraform, CloudFormation).


Using EC2 in a Django Project

When you build a Django app and host it on EC2, here are the things to plan/consider:

Architecture sketch

Clients → (optional: CDN / CloudFront) → ELB / ALB → EC2 (Django App)  
                    ↳ (maybe auto-scaling group)  
EC2 (Django) → RDS (or external DB)  
EC2 (Django) → S3 for media / static  
Optional: Cache (Redis/ElastiCache), Worker instances (Celery), etc.

Setup details & best practices

  • AMI / Image preparation Prepare an AMI with all dependencies (Python, libraries, OS packages) so that new instances boot ready. Use immutable infrastructure approach: new code = new instance/image, rather than SSHing into live machines.

  • Security & networking Use private subnets for internal services (like DB), public subnets for web-facing EC2. Set up security groups: only allow necessary ports (e.g. 80/443 inbound to web tier, 22 only from narrow IPs). Use an Application Load Balancer (ALB) to distribute traffic.

  • Scaling / high availability Use Auto Scaling Groups (minimum, maximum instance counts). EC2 startup time matters — your AMI/setup must be fast so scaling is responsive. Use health checks (e.g. via ALB) to avoid routing to unhealthy instances.

  • Data persistence / state Don’t store user uploads or media on the local filesystem of EC2 (ephemeral). Use S3 or EFS. Database should be external (RDS, Aurora) — not on EC2 (unless small scale or special requirement). Cache or session state should be external (Redis, ElastiCache).

  • Monitoring & logging Use CloudWatch metrics (CPU, memory, network). Note: memory/disk metrics require the CloudWatch agent installed on the instance. Push application logs to a centralized log system (CloudWatch Logs, ELK stack). Set alarms on critical thresholds (CPU, memory, disk, load).

  • Deployment & updates Use CI/CD pipelines to build a new AMI or deploy via rolling updates. Use blue/green or canary approaches if you need zero downtime. For code updates, either replace instances or use tools (Ansible, Chef) but immutable is safer.

  • Security maintenance Regular OS patching (rebuild new AMIs, replace instances). Use IAM roles attached to instances (instance profiles) rather than embedding AWS credentials. Restrict SSH access (e.g. via bastion jump host, VPN). Use encryption (TLS) for web traffic, and optionally disk encryption (EBS encryption).

  • Cost optimization Right-size instance types. Use Spot instances where acceptable (e.g. worker nodes). Use Reserved Instances or Savings Plans for baseline usage. Terminate idle instances. Monitor data transfer costs (especially cross-AZ, cross-region).


EC2 Instance Types & Hardware Options

The instance type is one of the most important decisions. You match instance type to your workload (compute, memory, I/O, GPU, etc.). ([AWS Documentation][3])

Categories include:

  • General purpose (e.g. t3, m5) — balanced compute/memory
  • Compute optimized (e.g. c5) — more CPU per memory
  • Memory optimized (e.g. r5, x1) — good for DB, in-memory workloads
  • Storage / I/O optimized (e.g. i3) — for high disk IOPS
  • Accelerated / GPU / ML instances (e.g. p3, g4) — for ML / GPU workloads
  • Burstable / micro / low-cost (e.g. t3.micro) — lower baseline, bursts allowed

Also, Graviton (ARM-based) instances are available (e.g. m6g, c6g), often with better price/performance for certain workloads. ([Wikipedia][4])

You must also consider networking limits and EBS performance limits per instance type.


Pricing & Cost Models

EC2 offers multiple pricing models. Understanding these is key to optimizing cost.

Pricing Model Description Use cases / tradeoffs
On-Demand Pay for compute by the hour or second, no long-term commitment Good for unpredictable workloads, dev/test, bursty capacity
Reserved Instances / Savings Plans Commit to usage for 1 or 3 years in exchange for discount Good for baseline / steady workloads
Spot Instances Use excess capacity at steep discount (with the risk of interruption) Best for fault-tolerant workloads (batch, workers)
Dedicated Hosts / Instances Physical isolation For licensing, compliance, isolation
Capacity Reservations Reserve capacity in a specific AZ If you need guaranteed capacity at a moment in time

Also consider EBS costs, data transfer costs (especially across AZs or regions), and elastic IP / unused elastic IP charges.


Fault Tolerance, HA & Scaling

To build resilient architecture using EC2:

  • Spread instances across multiple Availability Zones (AZs)
  • Use an Auto Scaling Group with a min / max / desired count
  • Use Elastic Load Balancer (Application LB, Network LB) in front
  • Use health checks so unhealthy instances are replaced
  • Use stateless architecture (instances don’t hold critical user state)
  • Use immutable deployments so updates don’t break running instances
  • If necessary, use backup / snapshotting of EBS volumes
  • Test failure scenarios (terminate instance, degrade AZ) to ensure system recovers

Common Pitfalls & Gotchas

  • Not using IAM instance roles — embedding AWS credentials in code is insecure
  • Storing state / media on instance disk — gets lost if instance is replaced
  • Overprovisioning instance sizes (wasted cost)
  • Long startup / bootstrap time — if your instance’s init scripts are slow, scaling will lag
  • Not monitoring memory / disk — EC2’s basic metrics don’t capture memory usage unless agent installed
  • Ignoring spot instance interruption risk — if spot is used for critical tasks without fallback, your app may break
  • Security group misconfiguration — opening too wide (e.g. SSH 0.0.0.0/0)
  • Cross-AZ data transfer costs — moving lots of data between AZs costs money
  • Not automating deployments — manual changes lead to drifting, configuration inconsistencies
  • Ignoring OS patching / updates — leaving instances unpatched is a security risk

Example: Bootstrapping a Django App on EC2 (Simplified)

Here’s a high-level walkthrough:

  1. Create an AMI

    • Start from base (Ubuntu, Amazon Linux)
    • Install Python, dependencies, pip, etc
    • Clone Django app, static files tooling
    • Bake into a custom AMI
  2. Set up network / VPC

    • Public and private subnets
    • Internet Gateway, NAT, routing
    • Security groups for web, SSH, DB access
  3. Launch EC2 instances via Auto Scaling Group

    • Use the AMI
    • Assign IAM instance role (for S3, RDS access)
    • Attach to security group
    • Add user data script (if needed) to run migrate, collectstatic or startup entrypoint
  4. Put an Application Load Balancer (ALB) in front

    • Forward HTTP/HTTPS to EC2 instances
    • Set health checks (e.g. GET /healthz)
  5. Set up external services

    • RDS for database
    • S3 for static / media
    • Redis (ElastiCache) for caching / sessions
  6. Deploy new code / rolling update

    • Build new AMI or deploy via script
    • Update ASG (use rolling update or blue/green)
  7. Monitor & manage

    • Install CloudWatch agent for memory, disk metrics
    • Set alarms
    • Ensure logs are shipped to central system
Back to top