Content

  1. Grammar
  2. Parse
    1. Parse Representation
      1. Top-down Derivation & Bottom-up Reduction
      2. Parse Tree (Concrete Syntax Tree)
    2. Parse Notation
      1. Backus-Naur Form (BNF)
      2. Extended BNF (EBNF)
  3. Context-Free Grammar (CFG)
    1. Definition
  4. Shift-Reduce Parsing
  5. Ambiguous Grammar
  6. Recursive Productions & Factoring
  7. Left or Right Recursion?
  8. Generalized LR

Grammar

sentence -> <subject> <verb> <object>
subject  -> This | computer | I
...

Each line: rule || derivation || production

Parse

Parse Representation

Top-down Derivation & Bottom-up Reduction

Generally, top-down parsers easier to write than bottom-up.

Parse Tree (Concrete Syntax Tree)

No order of applying rules.

Parse Notation

Backus-Naur Form (BNF)
S := '-'FN | FN
FN := DL | DL '.' DL
DL := D | D DL
D :='0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9

Terminal: in quote Symbol (S: start symbol)

Token: terminals (quoted literals || UPPERCASE identifiers)

Extended BNF (EBNF)
S := '-'? D+ ('.' D+)?
D := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

Can be written in BNF.

Context-Free Grammar (CFG)

CFG ≡ corresponding pushdown automata (PDA)
PDA = FSA + stack (memory) RE = FSA (no memory) Turing Machine = FSA + 2 stacks

Difficult to parse -> easy to parse

Definition

  • Alphabet (V)
    • Terminal alphabet (T) (input alphabet)
    • Non-terminal alphabet (N) (stack alphabet)
  • Grammar: 4-tuple {S, P, N, T}
    • S: starting symbol
    • P: production rules
      • R -> E
        • R can be replaced by E
        • R can derive E
        • R can produce E
        • E can reduce to R
        • E can be derived from R
    • Sentence: a string of symbols from T derived from S using P
  • Equivalence
    • Language (L(G)): set of sentences derivable using G
    • G is equivalent to G' if L(G) = L(G')

Shift-Reduce Parsing

Ambiguous Grammar

  1. Operator precedence

    • Tell compiler your precedence choice
        %left '+' '-'
        %left '*' '/'
      
    • Rewrite grammar (introduce term and factor)
        E -> E t_op E | T
        t_op -> + | -
        T -> T f_op T | F
        f_op -> * | /
        F -> int | (E)
      
  2. Operator associativity

    • Tell compiler
    • Rewrite grammar (left recursion)
        E -> E t_op T | T
        t_op -> + | -
        T -> T f_op F | F
        f_op -> * | /
        F -> int | (E)
      
  3. Dangling else: if C1 then E1 if C2 then E2 else E3

    • Tell compiler

        %nonassoc IFX
        %nonassoc ELSE
        ...stmt: ... | IF C stmt %prec IFX | IF C stmt ELSE stmt ...
      
    • Rewrite grammar

        IFE -> IFm | IFu
        IFm -> if C then IFm else IFm | E
        IFu -> if C then E | if C then IFm else IFu | E
      

Recursive Productions & Factoring

# left recursion:
X -> Xa | Xb | AB | C | DEF

# to right recursion:
X -> ABX' | CX' | DEFX'
X' -> aX' | bX' | ε
# top-down parsers can't decide which production to use:
Stmt -> if C then Stmt else Stmt | if C then Stmt | Other | ...

# factor out common prefix:
Stmt -> if C then Stmt OptElse | Other | ...
OptElse -> else S | ε

Left or Right Recursion?

  • Left recursion: bad for LL(1), good for LALR(1)
    • Keep reducing as we go along (keep the stack space small)
  • Right recursion
    • All items pushed onto stack, then start reducing

But same amount of work!

Generalized LR

S -> aSa | bSb | ε

LR parser can't tell where is the midpoint.

A GLR parser will process all possible interpretations of the given input in BFS; when there's conflict, fork 2 threads and try both.
Can be > 1 accepted path (ambiguous).

results matching ""

    No results matching ""