Mid-level 7 min · March 06, 2026

Context-Free Grammar — Fixing Ambiguous Parser Failures

Same SQL query passed in dev, failed in prod.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • CFG is a set of recursive production rules defining language syntax.
  • Four components: terminals (tokens), non-terminals (categories), start symbol, productions.
  • Can describe nested structures (balanced parens) that regex cannot.
  • Performance: unambiguous LL(1) grammars parse in linear time; ambiguous grammars risk exponential blowup.
  • Production insight: a single ambiguous rule can cause different parse trees across parser versions, leading to silent bugs.
  • Always test grammar with a parser generator that reports conflicts.
Plain-English First

Imagine you're teaching a younger sibling how to build a valid LEGO sentence: 'A sentence is ONE color brick, then ONE shape brick, then ONE size brick — in that order, no exceptions.' That rulebook is a Context-Free Grammar. The 'context-free' part means the rule for replacing a brick never changes based on which bricks are around it — the rule is the same everywhere. Programming languages work the same way: a grammar is a rulebook the compiler uses to decide whether your code is legal before it even tries to run it.

Every time you hit compile and your IDE catches a missing semicolon before running a single line, a CFG is quietly doing that job. Compilers for C, Java, Python, SQL — every one of them is built on top of a formal grammar that precisely defines what 'valid code' looks like. CFGs aren't an academic curiosity; they're the load-bearing wall of language design.

Before CFGs existed, language recognizers were brittle, hand-coded state machines that couldn't handle nested structures like balanced parentheses or recursive function calls. Regular expressions — powerful as they are — can't count. They can't verify that every opening brace has a matching closing brace. CFGs solve exactly this problem by introducing recursive production rules that can describe arbitrarily deep nesting with a handful of compact rules.

By the end of this article you'll be able to write your own CFG for a mini expression language, trace through a derivation by hand, build a recursive-descent parser from those production rules in Java, spot ambiguity before it bites you in production, and answer the CFG questions that senior compiler-track interviewers love to ask.

What is Context-Free Grammar?

A CFG is a set of rules that describe how to form strings in a language. Each rule has a single non-terminal on the left and a sequence of terminals and non-terminals on the right. The key insight is recursion — a rule can refer to itself, allowing you to describe nested structures like parentheses or nested if-statements.

Let's look at a concrete example. The language of balanced parentheses can be described with just two rules: S → ( S ) S → ( ) That's it. With these two rules you can generate any string of properly balanced parentheses, no matter how deep. Try doing that with a regular expression — you can't. That's why every programming language uses a CFG: because real code has nested structures (blocks, expressions, function calls) that require counting and matching.

In production, the grammar is the contract between the developer and the compiler. If the grammar is ambiguous or incomplete, the parser will reject valid code or accept invalid code. That's why language specifications are written as formal grammars — it's the only way to be unambiguous about what is and isn't allowed.

The name 'context-free' means the rule for expanding a non-terminal doesn't depend on where it appears — no surrounding context needed. That's what makes CFGs both powerful (they can describe recursion) and tractable (efficient parsers exist). The trade-off: context-sensitive features like variable declarations must be handled in later compiler phases.

io/thecodeforge/cfg/CoreExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
package io.thecodeforge.cfg;

// A simple CFG representation in Java
public class CoreExample {
    public static void main(String[] args) {
        String language = "Context-Free Grammar";
        System.out.println("Learning: " + language);
        System.out.println("Balanced parens example: S -> ( S ) | ( )");
    }
}
Mental Model: LEGO Instructions
  • Non-terminals: 'door', 'roof', 'wall' — they have no bricks yet.
  • Terminals: the actual 2x4 bricks, roof tiles, etc.
  • Productions: 'door -> frame + handle' tells you exactly which bricks go where.
  • Context-free: the instruction for 'door' is the same whether you're building a house or a castle.
Production Insight
CFGs are the backbone of every compiler's syntax analysis phase.
Ambiguity in a grammar can lead to production parsing failures that are hard to reproduce.
Always validate grammars with automated tools before deploying to production.
Key Takeaway
CFGs define language syntax via recursive production rules.
They handle nested structures that regular expressions cannot.
Test your grammar for ambiguity and left recursion early.

Formal Definition of a CFG

A context-free grammar is a 4-tuple (V, Σ, R, S) where
  • V is a finite set of non-terminal symbols (variables that represent syntactic categories)
  • Σ is a finite set of terminal symbols (tokens or alphabet)
  • R is a finite set of production rules of the form A → α, where A ∈ V and α ∈ (V ∪ Σ)*
  • S ∈ V is the start symbol

Each production rule describes how a non-terminal can be replaced by a string of terminals and non-terminals. The term 'context-free' means the rule applies regardless of the surrounding symbols — no context is needed. This is what allows nesting and recursion, but also introduces ambiguity when multiple rules can expand the same non-terminal in conflicting ways.

Let's dissect that with a real example. For a simple arithmetic grammar: V = {Exp, Term, Factor} Σ = {+, , (, ), num} R = { Exp → Exp + Term | Term, Term → Term Factor | Factor, Factor → ( Exp ) | num } S = Exp

Notice how Exp can rewrite to itself plus a Term — that's left recursion, which is fine for bottom-up parsers, but top-down parsers require elimination. Also notice the precedence hierarchy: multiplication is one level below addition, so it binds tighter.

In a production grammar, you'll often have dozens or hundreds of non-terminals. A real language like Java has around 80 non-terminals and hundreds of productions. Managing that by hand is error-prone — that's why parser generators like ANTLR and Bison exist. The trade-off is learnability: parser generators hide the complexity but you must understand conflicts and precedence to debug them.

io/thecodeforge/cfg/GrammarDefinition.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package io.thecodeforge.cfg;

import java.util.*;

public record GrammarDefinition(
    Set<String> nonTerminals,
    Set<String> terminals,
    Map<String, List<List<String>>> productions,
    String startSymbol
) {
    public GrammarDefinition {
        nonTerminals = Set.copyOf(nonTerminals);
        terminals = Set.copyOf(terminals);
        productions = Map.copyOf(productions);
    }

    public boolean isNonTerminal(String symbol) {
        return nonTerminals.contains(symbol);
    }

    public boolean isTerminal(String symbol) {
        return terminals.contains(symbol);
    }

    public List<List<String>> getProductionsFor(String nonTerminal) {
        return productions.getOrDefault(nonTerminal, List.of());
    }

    public static GrammarDefinition balancedParensGrammar() {
        Set<String> nonTerminals = Set.of("S");
        Set<String> terminals = Set.of("(", ")");
        Map<String, List<List<String>>> productions = new HashMap<>();
        productions.put("S", List.of(
            List.of("(", "S", ")"),
            List.of("(", ")")
        ));
        return new GrammarDefinition(nonTerminals, terminals, productions, "S");
    }
}
Mental Model: Jigsaw Puzzles
  • Non-terminals are placeholders that get replaced by pieces.
  • Terminals are the final pieces that cannot be broken further.
  • Productions tell you exactly which combination of pieces fill a slot.
  • The start symbol is the empty board — the full picture you're trying to build.
  • Context-free means the instruction for a slot doesn't change based on what's already on the board.
Production Insight
When defining a formal grammar for a real language, the choice of terminals directly affects tokenization.
If two tokens overlap (e.g., '>>' vs '> >'), the lexer must be unambiguous.
Production rule: always disambiguate terminal boundaries to avoid shift-reduce conflicts.
Key Takeaway
CFG = (V, Σ, R, S).
Non-terminals define categories, terminals are tokens, productions define expansions.
Context-free means no external context affects which production to apply.

Derivations and Parse Trees

A derivation is a sequence of production rule applications that starts with the start symbol and ends with a string of only terminals. Each step replaces one non-terminal with the right-hand side of a matching production. The resulting structure can be visualised as a parse tree, where the root is the start symbol, internal nodes are non-terminals, leaves are terminals, and children of a node correspond to the symbols in the production used.

Leftmost derivation always replaces the leftmost non-terminal; rightmost derivation always replaces the rightmost. Both produce the same parse tree for an unambiguous grammar. Parse trees are the foundation of syntax-directed translation: compilers attach semantic actions to tree nodes.

Here's a concrete derivation for the string "2 + 3" using our arithmetic grammar: Exp → Exp + Term (leftmost) → Term + Term → Factor + Term → 2 + Term → 2 + Factor → 2 + 3

The parse tree would have Exp at root, with children Exp, +, Term. And so on. In production, you don't usually build the full parse tree — you build an Abstract Syntax Tree (AST) that prunes away unnecessary nodes like single-child productions. That's a memory savings of 40-60% on typical code.

When you debug a parser, tracing the derivation is often the fastest way to find why certain inputs fail. Tools like ANTLR's -trace option output the derivation steps, which is invaluable.

io/thecodeforge/cfg/ParseTree.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package io.thecodeforge.cfg;

import java.util.*;

public class ParseTree {
    private final String symbol;
    private final List<ParseTree> children;

    public ParseTree(String symbol, List<ParseTree> children) {
        this.symbol = symbol;
        this.children = children;
    }

    public static ParseTree fromDerivation(String start, Map<String, List<List<String>>> prods) {
        // Simplified: builds tree from leftmost derivation steps
        // In practice, the parser outputs this structure
        return new ParseTree(start, List.of());
    }

    public void print(String indent) {
        System.out.println(indent + symbol);
        for (ParseTree child : children) {
            child.print(indent + "  ");
        }
    }

    // Add method to compute AST by removing single-child nodes
    public ParseTree toAST() {
        if (children.size() == 1 && children.get(0).children.size() > 0) {
            return children.get(0).toAST();
        }
        List<ParseTree> astChildren = new ArrayList<>();
        for (ParseTree child : children) {
            astChildren.add(child.toAST());
        }
        return new ParseTree(symbol, astChildren);
    }
}
Production Insight
In production parsers, building a full parse tree for every statement is memory-intensive.
Many compilers use abstract syntax trees (ASTs) that omit redundant non-terminal nodes.
Rule: use AST nodes directly in parser actions to reduce memory and simplify subsequent passes.
Key Takeaway
Derivation = sequence of rule applications from start symbol.
Parse tree = graphical representation of a derivation.
Leftmost derivation corresponds to top-down parsing; rightmost to bottom-up.
Which derivation style should you use?
IfYou need to implement a recursive-descent parser by hand
UseUse leftmost derivation — it maps directly to top-down parsing
IfYou are using a parser generator like Bison or Yacc
UseUse rightmost derivation — typical for bottom-up LALR(1) parsers
IfYou are building a parse tree for readability or teaching
UseEither leftmost or rightmost; both produce the same tree for an unambiguous grammar

Ambiguity in CFGs and How to Resolve It

A grammar is ambiguous if some string in its language has more than one distinct parse tree (or equivalently, more than one leftmost derivation). Ambiguity is a serious problem in compilers because it means the same source code can have multiple meanings — the compiler cannot choose which one is intended. Classic examples include the 'dangling else' problem in C-like languages and operator precedence for arithmetic expressions.

Resolving ambiguity: Restructure the grammar to enforce precedence and associativity. For example, instead of a single non-terminal for expressions, break it into multiple levels (Expr, Term, Factor). Use left recursion for left-associative operators. Parser generators like ANTLR also allow disambiguating predicates or semantic actions, but the cleanest solution is a non-ambiguous grammar.

Here's the classic ambiguous arithmetic grammar: E → E + E | E E | n For input "1 + 2 3", you get two parse trees: one treating it as (1+2)3, the other as 1+(23). That's a bug for any calculator. The fix is to introduce precedence levels as shown earlier.

In production, ambiguity can lead to security vulnerabilities. If an attacker can craft input that parses differently than intended, they might bypass validation. That's why SQL injection often exploits ambiguous grammar in SQL dialects.

io/thecodeforge/cfg/AmbiguityExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package io.thecodeforge.cfg;

public class AmbiguityExample {
    // Ambiguous grammar: E → E + E | E * E | n
    // String "1 + 2 * 3" has two parse trees:
    // Tree1: (1 + 2) * 3
    // Tree2: 1 + (2 * 3)
    
    // Unambiguous version with precedence:
    // E → E + T | T
    // T → T * F | F
    // F → n | ( E )
    
    public static void main(String[] args) {
        System.out.println("Ambiguous: E → E+E | E*E | n");
        System.out.println("String '1+2*3' yields two parse trees.");
        System.out.println("Use precedence levels to disambiguate.");
    }
}
Common Pitfall: Hiding Ambiguity with Semantic Actions
Do not rely on semantic actions to disambiguate – that leads to non-portable, fragile parsers. Always fix the grammar first.
Production Insight
An ambiguous grammar in a production compiler can cause security vulnerabilities (e.g., SQL injection through parse tree manipulation).
Parser generators report shift-reduce and reduce-reduce conflicts — treat those as errors, not warnings.
Rule: zero conflicts before shipping to production.
Key Takeaway
Ambiguity = multiple parse trees for same input.
Fix by restructuring grammar with precedence levels and associativity.
Never deploy a grammar with unresolved conflicts.

Writing a CFG for a Mini Expression Language

Let's design a small expression language that supports addition, multiplication, parentheses, and integer literals. We'll follow the standard precedence hierarchy (multiplication binds tighter than addition) and left associativity for both operators.

Grammar (unambiguous): Exp → Exp + Term | Term Term → Term * Factor | Factor Factor → ( Exp ) | num

This grammar ensures that '+' sees '*' as a subexpression on the right, so multiplication happens first. The left recursion ensures left associativity: '1 + 2 + 3' parses as ((1+2)+3). We'll also handle whitespace and negative numbers by making 'num' a terminal that includes optional minus sign (or separate unary minus later).

Extending this grammar is straightforward: add a new non-terminal for each new precedence level. For example, to add exponentiation (right-associative), you'd insert: Factor → Factor ^ Unary | Unary Unary → ( Exp ) | num

This pattern scales to any language. In real parsers like GCC, there are around 15 precedence levels for C expressions. Each level is a non-terminal.

One nuance: left-recursive rules are fine for bottom-up parsers (Bison) but cause infinite recursion in top-down parsers. For recursive descent, you'll transform left recursion into iteration (while loop) as shown in the next section.

io/thecodeforge/cfg/ExpressionGrammar.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package io.thecodeforge.cfg;

import java.util.*;

public class ExpressionGrammar {
    public static Map<String, List<List<String>>> buildProductions() {
        Map<String, List<List<String>>> prods = new LinkedHashMap<>();
        prods.put("Exp", List.of(
            List.of("Exp", "+", "Term"),
            List.of("Term")
        ));
        prods.put("Term", List.of(
            List.of("Term", "*", "Factor"),
            List.of("Factor")
        ));
        prods.put("Factor", List.of(
            List.of("(", "Exp", ")"),
            List.of("num")
        ));
        return prods;
    }

    public static void main(String[] args) {
        var prods = buildProductions();
        System.out.println("Production rules:");
        prods.forEach((k, v) -> {
            for (List<String> rhs : v) {
                System.out.println(k + " → " + String.join(" ", rhs));
            }
        });
    }
}
Production Insight
When extending the expression language with new operators (e.g., exponentiation), add a new layer of precedence (Factor → Factor ^ Unary | Unary).
If you try to fit all operators into two layers, you'll introduce ambiguity.
Rule: one non-terminal per precedence level.
Key Takeaway
Use one non-terminal per precedence level.
Left recursion for left-associative operators, right recursion for right-associative ones.
Test with edge cases: nested parentheses, chained operators.

Building a Recursive Descent Parser in Java

A recursive descent parser is a top-down parser that directly mirrors the grammar's production rules. Each non-terminal becomes a method. For each production of that non-terminal, the method tries to match the input tokens sequentially. If one production fails, it backtracks (or uses lookahead to choose deterministically). For our expression grammar, we'll implement a simple version that uses a lookahead token (LL(1) with a single token peek) to decide between alternatives.

The parser consumes tokens from a lexer. Our methods: parseExp(), parseTerm(), parseFactor(). Each returns an AST node or performs actions. We'll keep it simple: evaluate the expression on the fly (calculator style). The code below demonstrates the structure.

Key observation: the left recursion in the grammar becomes a while loop in the parser. That's a standard pattern — instead of recursive calls, use iteration for left associative operators. This avoids stack overflows. The same pattern works for addition, multiplication, and other left-assoc operators.

In production, you'd probably use a parser generator. But knowing how to build a recursive descent parser by hand is invaluable when you're using a tool like ANTLR and need to understand what it generates.

io/thecodeforge/parser/RecursiveDescentParser.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
package io.thecodeforge.parser;

import java.util.*;

public class RecursiveDescentParser {
    private final List<String> tokens;
    private int pos = 0;

    public RecursiveDescentParser(List<String> tokens) {
        this.tokens = tokens;
    }

    // Exp → Exp + Term | Term
    public int parseExp() {
        int val = parseTerm();
        while (match("+")) {
            int right = parseTerm();
            val += right;
        }
        return val;
    }

    // Term → Term * Factor | Factor
    public int parseTerm() {
        int val = parseFactor();
        while (match("*")) {
            int right = parseFactor();
            val *= right;
        }
        return val;
    }

    // Factor → ( Exp ) | num
    public int parseFactor() {
        if (match("(")) {
            int val = parseExp();
            expect(")");
            return val;
        }
        // Assume token is a number
        String token = consume();
        return Integer.parseInt(token);
    }

    private boolean match(String expected) {
        if (pos < tokens.size() && tokens.get(pos).equals(expected)) {
            pos++;
            return true;
        }
        return false;
    }

    private String consume() {
        if (pos >= tokens.size()) throw new RuntimeException("Unexpected end of input");
        return tokens.get(pos++);
    }

    private void expect(String expected) {
        String actual = consume();
        if (!actual.equals(expected))
            throw new RuntimeException("Expected '" + expected + "' but got '" + actual + "'");
    }

    public static void main(String[] args) {
        // Input: "2 + 3 * 4" -> tokens: ["2", "+", "3", "*", "4"]
        List<String> tokens = Arrays.asList("2", "+", "3", "*", "4");
        RecursiveDescentParser parser = new RecursiveDescentParser(tokens);
        int result = parser.parseExp();
        System.out.println("Result: " + result);  // Should be 14 (not 20)
    }
}
Eliminate Left Recursion Carefully
Our parser uses while loops to handle left-associative operators, which avoids infinite recursion. For direct left recursion, rewriting into iterative form is the standard technique.
Production Insight
Recursive descent parsers are intuitive but can suffer from stack overflow on deeply nested input.
For production use, consider iterative parsing or increase JVM stack size (-Xss).
Also, left recursion must be eliminated if using pure LL(1) — our grammar uses left recursion implicitly (while loops handle left-associativity without recursion).
Key Takeaway
Recursive descent = one method per non-terminal.
Use while loops for left-associative operators (not recursion).
Always test with nested expressions to catch recursion limits.
● Production incidentPOST-MORTEMseverity: high

Ambiguous Grammar Breaks Production Parser

Symptom
Intermittent 'syntax error' on the same SQL query across different environments. Query worked in dev but failed in prod.
Assumption
The query was malformed or the database version had a bug.
Root cause
The SQL grammar had an ambiguous rule for operator precedence between AND and OR. Different parser implementations chose different parse trees, sometimes rejecting valid input.
Fix
Refactored the grammar to introduce explicit precedence levels (non-terminals for each precedence) and removed ambiguity. Used ANTLR's ambiguity detection to verify no warnings.
Key lesson
  • Ambiguity in a context-free grammar is not a theoretical edge case — it causes real production failures.
  • Always test grammar with a parser generator that reports ambiguities (ANTLR, Bison).
  • Use unambiguous grammar design: left-associativity and precedence levels are non-negotiable for expression languages.
Production debug guideCommon symptoms and immediate actions when the parser rejects valid input or produces unexpected output.5 entries
Symptom · 01
Parser reports syntax error on input that should be valid according to language spec
Fix
Check for grammar ambiguity – run ANTLR with -Xexact-output to see alternative parses. Also verify the input token sequence matches the start symbol.
Symptom · 02
Parse tree differs between environments (dev vs prod)
Fix
Confirm both environments use identical grammar files and parser generator versions (e.g., ANTLR 4.13 vs 4.9). Even minor version changes can alter conflict resolution.
Symptom · 03
Stack overflow during parsing of deeply nested input
Fix
Check for left recursion in grammar – direct (A → A b) or indirect (A → B, B → A). Eliminate left recursion by rewriting productions.
Symptom · 04
Parser silently accepts invalid input (e.g., mismatched parentheses)
Fix
Look for missing base cases in recursive productions. For balanced parens, ensure both empty and nested rules exist; test with empty string and deepest nesting.
Symptom · 05
Infinite loop during parsing with no error message
Fix
Kill the process and inspect grammar for left recursion. Use grep to find rules where non-terminal appears on leftmost side of its own right-hand side. Rewrite recursively.
★ CFG and Parser Debugging Cheat SheetQuick commands to diagnose grammar and parsing issues in production.
Grammar rule never matches
Immediate action
Run parser with debug output
Commands
antlr4 -Xexact-output Grammar.g4
javac -Xlint:all Parser.java
Fix now
Add logging in parser to see which rule fires at each token.
Parsing hangs (infinite loop)+
Immediate action
Kill process, check for left recursion
Commands
grep -E '^[A-Za-z]+\s*:' Grammar.g4 | head -20
antlr4 Grammar.g4 -o temp && ls temp/Grammar*.java
Fix now
Rewrite left-recursive rules as right-recursive or use ANTLR's left-recursion elimination.
Ambiguous parse trees produce different results in prod vs test+
Immediate action
Run ANTLR with -diagnostics to list all ambiguities
Commands
antlr4 -diagnostics Grammar.g4
grep 'ambiguity' output.log
Fix now
Restructure grammar with precedence levels. For expression languages, add a new non-terminal per operator precedence.
Parser throws NullPointerException or crashes on valid input+
Immediate action
Check token stream for unexpected EOF or null tokens
Commands
java -cp antlr.jar org.antlr.v4.Tool -depend Grammar.g4
cat tokens.txt | head -50
Fix now
Ensure lexer produces at least one token; test with simplest valid input. If crash persists, reduce start rule to a single terminal.
Grammar Type Comparison
TypePowerExample
Context-Free GrammarCan describe nested structures, recursionBalanced parentheses: S → ( S ) | ε
Regular GrammarCan only describe patterns without recursionIdentifier: letter (letter|digit)*
Context-Sensitive GrammarCan describe dependencies that require contexta^n b^n c^n (requires counting three groups)
Unrestricted GrammarTuring-complete — can simulate any computationAny recursively enumerable language

Key takeaways

1
Context-free grammars define language syntax via recursive productions.
2
CFGs can model nested structures that regular expressions cannot handle.
3
Parse trees represent derivation steps and are used for semantic analysis.
4
Ambiguous grammars cause unpredictable behaviour in compilers
eliminate them.
5
Use one non-terminal per precedence level for expression languages.
6
Recursive descent parsers are simple but require left-recursion elimination.
7
Always test grammars with parser generators that report conflicts.

Common mistakes to avoid

5 patterns
×

Memorising syntax before understanding the concept

Symptom
You can write CFG rules but can't explain derivations or parse trees. You freeze when asked to debug an ambiguous grammar.
Fix
Start by drawing parse trees for simple strings. Trace derivations leftmost and rightmost. Understand why each rule exists.
×

Skipping practice and only reading theory

Symptom
You can recite the definition of a CFG but cannot write one for a simple expression language.
Fix
Write grammars for small languages: balanced parentheses, arithmetic expressions, boolean logic. Implement a parser for each.
×

Ignoring left recursion until the stack overflows

Symptom
Parser crashes with StackOverflowError on moderately nested input. Grammar has indirect left recursion that wasn't caught.
Fix
Eliminate all left recursion (direct and indirect) before building a top-down parser. Use tools like ANTLR that handle left recursion, or rewrite.
×

Assuming a single production per non-terminal is enough

Symptom
Grammar fails to match valid strings because you didn't consider alternative expansions (e.g., no rule for empty input or base case).
Fix
Always provide a base case (e.g., epsilon) or multiple alternatives for each non-terminal. Test with edge cases.
×

Confusing leftmost derivation with left recursion

Symptom
You incorrectly think leftmost derivation requires left recursion. Your parser fails to parse simple left-associative expressions.
Fix
Leftmost derivation is a strategy for applying rules (always expand leftmost non-terminal). Left recursion is a grammar property. They are independent.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is a context-free grammar? Give an example.
Q02SENIOR
How do you eliminate left recursion in a grammar?
Q03SENIOR
What is ambiguity in a CFG and why is it harmful in a compiler?
Q04SENIOR
Describe how a recursive descent parser works. What are its limitations?
Q05SENIOR
How would you design a grammar for a calculator that includes exponentia...
Q06SENIOR
What is the difference between leftmost and rightmost derivation? When w...
Q01 of 06JUNIOR

What is a context-free grammar? Give an example.

ANSWER
A context-free grammar (CFG) is a set of recursive production rules that describe all possible strings in a formal language. It consists of terminals, non-terminals, a start symbol, and productions. Example: balanced parentheses grammar: S → ( S ) | ( ) | ε
FAQ · 7 QUESTIONS

Frequently Asked Questions

01
What is Context-Free Grammar in simple terms?
02
What is the difference between a regular expression and a context-free grammar?
03
How do you test if a grammar is ambiguous?
04
Can a context-free grammar be used for natural language?
05
What is the Chomsky hierarchy?
06
What is the role of a CFG in a compiler?
07
How do parser generators like ANTLR handle left recursion?
🔥

That's Compiler Design. Mark it forged?

7 min read · try the examples if you haven't

Previous
Symbol Table in Compiler
7 / 9 · Compiler Design
Next
Finite Automata and Regular Expressions