Abstract Syntax Tree (AST)
Why “Abstract”?
Unlike a concrete syntax tree (parse tree), an AST omits redundant grammatical nodes. For 1 + 2 * 3:
- Parse tree retains every grammar rule node
- AST just gives you
BinaryOp(+, Literal(1), BinaryOp(*, Literal(2), Literal(3)))### Structure of an AST node
Every node typically carries: - type — what construct it represents (FunctionDef, BinaryOp, IfStatement, etc.) - children — sub-nodes (operands, body, arguments) - metadata — line/column numbers for error reporting, sometimes type annotations
How an AST is produced
1. Lexing (tokenisation) — raw source → stream of tokens (INT:1, PLUS, INT:2, etc.)
2. Parsing — token stream → parse tree (concrete syntax tree, every grammar rule materialised)
3. AST construction — parse tree → AST (drop literals, punctuation, redundant grammar nodes)
Most compilers and interpreters expose the AST directly, skipping the intermediate CST from user perspective.
What ASTs are used for
| Use case | How the AST is consumed |
|---|---|
| Compilation / interpretation | Walk the tree, emit bytecode or machine code |
| Static analysis / linting | Traverse nodes, pattern-match anti-patterns |
| Type checking | Annotate nodes with inferred types bottom-up |
| Code transformation | Rewrite nodes (Babel, macros, codemods) |
| IDE tooling | Hover info, go-to-definition, rename refactors |
| Minification / bundling | Prune dead nodes, rename identifiers |
| Code generation / templating | Construct an AST programmatically, serialise to source |
Tree traversal patterns
The two fundamental traversals you’ll see everywhere:
Visitor pattern — define visit_<NodeType> methods; the walker dispatches on node type. Used by Python’s ast module, ESTree tools, LLVM passes.
Recursive descent — write a recursive function that switches on node.type and recurses into children. Simple, readable, fine for one-off tools.
Pre-order vs post-order matters: for code generation you usually want post-order (evaluate children before the operator). For scope resolution you often want pre-order (enter scope before visiting body).
ASTs in popular languages/tools
- Python —
import ast;ast.parse(src)gives you the full tree,ast.dump()pretty-prints it,ast.NodeVisitor/ast.NodeTransformerfor walking and rewriting - JavaScript — ESTree spec (Acorn, Espree); Babel uses its own AST with
@babel/parser+@babel/traverse+@babel/types; ts-morph wraps the TypeScript compiler API - Java —
javax.lang.model(annotation processing), or JavaParser for source-level ASTs - C/C++ — Clang’s LibAST;
clang -ast-dumpprints it - Rust —
syncrate (proc macros);rustc’s HIR/MIR are post-AST lowerings - General — Tree-sitter is a universal parser that generates ASTs for ~100 languages with incremental reparsing (used in Neovim, GitHub’s code search, and many LSPs)
Practical tips
- Use existing parsers. Writing your own full parser is rarely necessary.
ast(Python), Babel (JS), Tree-sitter (polyglot) cover most needs. - Source maps. Always preserve line/column on nodes — you’ll need them for error messages and debugger stepping.
- Immutable ASTs + transforms. It’s safer to produce a new tree than mutate in place; most modern tools (Babel, Roslyn) follow this.
- AST diffing. Tools like GumTree diff ASTs to produce semantic diffs rather than line-based diffs — useful for code review and refactoring analysis.