Content
- Grammar
- Parse
- Parse Representation
- Top-down Derivation & Bottom-up Reduction
- Parse Tree (Concrete Syntax Tree)
- Parse Notation
- Backus-Naur Form (BNF)
- Extended BNF (EBNF)
- Parse Representation
- Context-Free Grammar (CFG)
- Definition
- Shift-Reduce Parsing
- Ambiguous Grammar
- Recursive Productions & Factoring
- Left or Right Recursion?
- Generalized LR
Grammar
sentence -> <subject> <verb> <object>
subject -> This | computer | I
...
Each line: rule || derivation || production
Parse
Parse Representation
Top-down Derivation & Bottom-up Reduction
Generally, top-down parsers easier to write than bottom-up.
Parse Tree (Concrete Syntax Tree)
No order of applying rules.
Parse Notation
Backus-Naur Form (BNF)
S := '-'FN | FN
FN := DL | DL '.' DL
DL := D | D DL
D :='0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9
Terminal: in quote
Symbol (S
: start symbol)
Token: terminals (quoted literals || UPPERCASE identifiers)
Extended BNF (EBNF)
S := '-'? D+ ('.' D+)?
D := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
Can be written in BNF.
Context-Free Grammar (CFG)
CFG ≡ corresponding pushdown automata (PDA)
PDA = FSA + stack (memory)
RE = FSA (no memory)
Turing Machine = FSA + 2 stacks
Difficult to parse -> easy to parse
Definition
- Alphabet (V)
- Terminal alphabet (T) (input alphabet)
- Non-terminal alphabet (N) (stack alphabet)
- Grammar: 4-tuple {S, P, N, T}
S
: starting symbolP
: production rulesR -> E
R
can be replaced byE
R
can deriveE
R
can produceE
E
can reduce toR
E
can be derived fromR
- Sentence: a string of symbols from
T
derived fromS
usingP
- Equivalence
- Language (L(G)): set of sentences derivable using
G
G
is equivalent toG'
ifL(G) = L(G')
- Language (L(G)): set of sentences derivable using
Shift-Reduce Parsing
Ambiguous Grammar
Operator precedence
- Tell compiler your precedence choice
%left '+' '-' %left '*' '/'
- Rewrite grammar (introduce term and factor)
E -> E t_op E | T t_op -> + | - T -> T f_op T | F f_op -> * | / F -> int | (E)
- Tell compiler your precedence choice
Operator associativity
- Tell compiler
- Rewrite grammar (left recursion)
E -> E t_op T | T t_op -> + | - T -> T f_op F | F f_op -> * | / F -> int | (E)
Dangling else:
if C1 then E1 if C2 then E2 else E3
Tell compiler
%nonassoc IFX %nonassoc ELSE ...stmt: ... | IF C stmt %prec IFX | IF C stmt ELSE stmt ...
Rewrite grammar
IFE -> IFm | IFu IFm -> if C then IFm else IFm | E IFu -> if C then E | if C then IFm else IFu | E
Recursive Productions & Factoring
# left recursion:
X -> Xa | Xb | AB | C | DEF
# to right recursion:
X -> ABX' | CX' | DEFX'
X' -> aX' | bX' | ε
# top-down parsers can't decide which production to use:
Stmt -> if C then Stmt else Stmt | if C then Stmt | Other | ...
# factor out common prefix:
Stmt -> if C then Stmt OptElse | Other | ...
OptElse -> else S | ε
Left or Right Recursion?
- Left recursion: bad for
LL(1)
, good forLALR(1)
- Keep reducing as we go along (keep the stack space small)
- Right recursion
- All items pushed onto stack, then start reducing
But same amount of work!
Generalized LR
S -> aSa | bSb | ε
LR parser can't tell where is the midpoint.
A GLR parser will process all possible interpretations of the given input in BFS; when there's conflict, fork 2 threads and try both.
Can be > 1 accepted path (ambiguous).