Content

  1. Overview
  2. Scanner Implementations
  3. Regular Expressions
                          1. RE Ambiguities
                          2. Finite Automata
                                                       1. RE -> NFA
                                                       2. NFA -> DFA
                                                                                    1. Error States
                                                       3. DFA -> RE
                                                                                    1. Transitive Closure Method
                                                                                    2. State Removal Method
                                                                                    3. Algebraic Method (Brzozowski)
                          4. Limitations
    

Overview

source language --> |lexical analyzer| --> token stream
                             |
                           error

Scanner Implementations

  1. Loop & switch (handwritten)
                          - Efficient
                          - Hard to change
    
  2. Regular expressions + finite automata
                          - _flex_ can generate scanner from RE
    

Regular Expressions

  1. Epsilon ε: {""}
  2. Symbol a: {"a"}
  3. Concatenation AB: {rs|r∈A, s∈B}
  4. Alternation A|B: {A∪B}
  5. Repetition A*: {ε|A|(AA)|...}

RE Ambiguities

  • Disambiguation rules
                           - __Longest match__ wins
                           - Otherwise, __rule listed first__ wins
    
1. for - keyword = for
2. letter = [a-zA-Z]
3. digit = [0-9]
4. identifier = letter(letter|digit)*
5. sign = +|-|ε
6. integer = sign(0|[1-9]digit*)
7. decimal = integer.digit*
8. real = (integer|decimal)E sign digit+

Finite Automata

  • Formal language theory

                           - `Language Spec <-> RE <-> NFA <-> ε-NFA <-> DFA <-> mDFA`
    
  • Non-deterministic Finite Automata (NFA): more than 1 next-state for a given symbol

                           - __ε-NFA__
    
  • Deterministic Finite Automata (DFA): at most 1 transition from 1 state to another for a given symbol
                           - __Minimal DFA__
                           - __Transition table__: `T[s;c]` for a state `s` and a character `c`
    

RE -> NFA

Thompson's Procedure

  • Linear transition (not the other way round)
  • NFAs & REs difficult to program

re to nfa I re to nfa II

NFA -> DFA

  • Closure: determining a subset of states in NFA forming a state in DFA
  • Any DFA state containing a final NFA state is a final state
  • Many non-deterministic paths -> more combined states
  • Every RE has a unique minimal DFA -> compare and see if REs are equivalent
                           1. __Group accepting states__ together, others together.
                           2. __Pick unmarked group__ `G` and check consistency. 
                                                        - Consistent: mark
                                                        - Inconsistent: __split__ into maximal consistent subgroups and replace `G` by these. Unmark all groups.
                           3. Do until no unmarked groups.
    
Error States

If not all possible state transitions are given, then incomplete -> can use error state

DFA -> RE

Language spec to RE not always easy, sometimes to DFA easier, but programming languages only accept REs.

Examples

Transitive Closure Method
  • Clear & simple
  • Tedious; creates long REs
  1. Q = {q1, q2, ...} be set of automation states
  2. R_ij = set of all strings transit from qi to qj
  3. RE = union of all R_sf, qs = starting state, qf = 1 of final states
R_k_ij: set of all strings that transit from qi to qj without passing though any state higher than qk

R_1_ij,R_2_ij,...,R_|Q|_ij = R_ij

R_k_ij = R_(k-1)_ij 
       + R_(k-1)_ik (R_(k-1)_kk)* R_(k-1)_kj

R_0_ij = r     if i!=j and r transitions from qi to qj
         r + ε if i==j and r transitions from qi to qj
         ∅     otherwise

State Removal Method
  • Intuitive, useful
  • Not straightforward to implement
  1. Unify all final states into a single one using ε-trans
  2. Unify all multi-transitions into a single one that contains union of inputs
  3. Remove states & change transitions accordingly, until only starting & final states
  4. Get resulting RE by direct calculation

Algebraic Method (Brzozowski)
  • Elegant
  • Creates compact REs

  • Create a system of REs with 1 unknown for each state and solve for R1

                           - For each state `qi`, `Ri` is a __union of terms__
                           - `aRj`: transition `a` from `qi` to `qj`
                           - `ε`: if `Ri` is final state 
    
  • Arden’s Lemma
                           - `E = α + Eβ = α(β)*`, where `α`, `β` are REs
    

Limitations

  • RE describes only regular languages
  • RE <-> finite automaton, which only has a finite number of states
  • Nested structures (e.g. parentheses, comments, etc.) need a stack of sufficiently large size for processing
  • Start states? -> flex/lex provides

results matching ""

    No results matching ""