Content
- Grammar
- Parse
- Parse Representation
- Top-down Derivation & Bottom-up Reduction
- Parse Tree (Concrete Syntax Tree)
- Parse Notation
- Backus-Naur Form (BNF)
- Extended BNF (EBNF)
- Parse Representation
- Context-Free Grammar (CFG)
- Definition
- Shift-Reduce Parsing
- Ambiguous Grammar
- Recursive Productions & Factoring
- Left or Right Recursion?
- Generalized LR
Grammar
sentence -> <subject> <verb> <object>
subject -> This | computer | I
...
Each line: rule || derivation || production
Parse

Parse Representation
Top-down Derivation & Bottom-up Reduction

Generally, top-down parsers easier to write than bottom-up.
Parse Tree (Concrete Syntax Tree)
No order of applying rules.

Parse Notation
Backus-Naur Form (BNF)
S := '-'FN | FN
FN := DL | DL '.' DL
DL := D | D DL
D :='0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9
Terminal: in quote
Symbol (S: start symbol)

Token: terminals (quoted literals || UPPERCASE identifiers)
Extended BNF (EBNF)
S := '-'? D+ ('.' D+)?
D := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Can be written in BNF.
Context-Free Grammar (CFG)
CFG ≡ corresponding pushdown automata (PDA)
PDA = FSA + stack (memory)
RE = FSA (no memory)
Turing Machine = FSA + 2 stacks

Difficult to parse -> easy to parse
Definition
- Alphabet (V)
- Terminal alphabet (T) (input alphabet)
- Non-terminal alphabet (N) (stack alphabet)
- Grammar: 4-tuple {S, P, N, T}
S: starting symbolP: production rulesR -> ERcan be replaced byERcan deriveERcan produceEEcan reduce toREcan be derived fromR
- Sentence: a string of symbols from
Tderived fromSusingP
- Equivalence
- Language (L(G)): set of sentences derivable using
G Gis equivalent toG'ifL(G) = L(G')
- Language (L(G)): set of sentences derivable using
Shift-Reduce Parsing

Ambiguous Grammar
Operator precedence
- Tell compiler your precedence choice
%left '+' '-' %left '*' '/' - Rewrite grammar (introduce term and factor)
E -> E t_op E | T t_op -> + | - T -> T f_op T | F f_op -> * | / F -> int | (E)
- Tell compiler your precedence choice
Operator associativity
- Tell compiler
- Rewrite grammar (left recursion)
E -> E t_op T | T t_op -> + | - T -> T f_op F | F f_op -> * | / F -> int | (E)
Dangling else:
if C1 then E1 if C2 then E2 else E3Tell compiler
%nonassoc IFX %nonassoc ELSE ...stmt: ... | IF C stmt %prec IFX | IF C stmt ELSE stmt ...Rewrite grammar
IFE -> IFm | IFu IFm -> if C then IFm else IFm | E IFu -> if C then E | if C then IFm else IFu | E
Recursive Productions & Factoring
# left recursion:
X -> Xa | Xb | AB | C | DEF
# to right recursion:
X -> ABX' | CX' | DEFX'
X' -> aX' | bX' | ε
# top-down parsers can't decide which production to use:
Stmt -> if C then Stmt else Stmt | if C then Stmt | Other | ...
# factor out common prefix:
Stmt -> if C then Stmt OptElse | Other | ...
OptElse -> else S | ε

Left or Right Recursion?
- Left recursion: bad for
LL(1), good forLALR(1)- Keep reducing as we go along (keep the stack space small)
- Right recursion
- All items pushed onto stack, then start reducing
But same amount of work!
Generalized LR
S -> aSa | bSb | ε
LR parser can't tell where is the midpoint.
A GLR parser will process all possible interpretations of the given input in BFS; when there's conflict, fork 2 threads and try both.
Can be > 1 accepted path (ambiguous).