Prompt Chaining
1. What is Prompt Chaining?
Prompt chaining decomposes a complex task into a sequence of smaller LLM calls, where each call processes the output of the previous one. Optional gate checks (programmatic validations) can be inserted between steps to catch errors early before they propagate.
Input
│
▼
┌────────┐ ┌───────┐ ┌────────┐ ┌───────┐ ┌────────┐
│ LLM₁ │────▶│ Gate? │────▶│ LLM₂ │────▶│ Gate? │────▶│ LLM₃ │────▶ Output
└────────┘ └───────┘ └────────┘ └───────┘ └────────┘
(fail → stop/retry) (fail → stop/retry)
The trade-off: higher latency (sequential calls) in exchange for higher accuracy (each call is a simpler, more focused task).
2. When to Use Prompt Chaining
Good fit when: - The task cleanly decomposes into fixed, sequential subtasks - Each subtask is simpler than the full task - You want to catch and correct errors early (gates) - You need to iterate/evaluate each step independently
Not ideal when: - Subtasks are independent (use Parallelization instead) - The number of steps can’t be predicted upfront (use Orchestrator-Workers) - A single well-crafted prompt with CoT already works
Examples: - Generate marketing copy → translate into multiple languages - Write document outline → validate outline → write full document - Extract data → validate schema → transform to output format - Meeting transcript → extract action items → check consistency → write summary
3. Implementation Pattern
from openai import OpenAI
client = OpenAI()
def llm_call(prompt: str, system: str = "") -> str:
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
return response.choices[0].message.content
def gate_check(text: str, criteria: str) -> bool:
"""Programmatic or LLM-based validation between steps."""
result = llm_call(
f"Does this text meet the criteria? Respond only YES or NO.\n\nCriteria: {criteria}\n\nText: {text}"
)
return result.strip().upper().startswith("YES")
def document_chain(transcript: str) -> str:
"""3-step chain: extract → validate → summarise."""
# Step 1: Extract key items
extraction = llm_call(
f"Extract all action items and owners from this transcript as a bulleted list:\n\n{transcript}",
system="You are a precise meeting analyst."
)
# Gate: confirm extraction found items
if not gate_check(extraction, "Contains at least one action item with an owner"):
return "No actionable items found in transcript."
# Step 2: Validate consistency
validated = llm_call(
f"Check each action item below against the original transcript for accuracy. "
f"Remove any that are not explicitly mentioned.\n\nItems:\n{extraction}\n\nTranscript:\n{transcript}"
)
# Step 3: Write summary
summary = llm_call(
f"Write a concise executive summary (3-5 sentences) based on these verified action items:\n\n{validated}"
)
return summary4. AlphaCodium Example — A Real-World Win
By switching from a single prompt to a multi-step chain, AlphaCodium increased GPT-4 accuracy on CodeContests from 19% → 44% (pass@5).
Their 6-step chain: 1. Reflect on the problem statement 2. Reason on the public test cases 3. Generate possible solutions 4. Rank solutions 5. Generate synthetic tests 6. Iterate on solutions against public + synthetic tests
Each step is a focused LLM call doing one thing well — the classic single-responsibility principle applied to prompts.
5. Best Practices
- One thing per prompt: Each step should have a single, clear objective. Avoid “God prompts” that try to do everything.
- Structured intermediate outputs: Use JSON/XML between steps to make parsing reliable and reduce errors.
- Insert gates at decision points: Fail fast rather than propagating bad intermediate results through expensive steps.
- Eval each step independently: Breaking the chain makes it easier to identify which step is failing.
- Start simple: A 2-step chain is better than a monolithic prompt. Don’t jump straight to 8 steps.
- Watch latency: Each sequential call adds wall-clock time. If steps are independent, consider Parallelization.