Content
- Overview
1. Lexical Analysis 2. Syntax Analysis 3. Semantic Analysis 4. Intermediate Code Generation 5. Object Code Generation 6. Object Code Optimization
- Multi-phase Compilers
- Interpreter
Overview
Compiler translates a source program into a functionally equivalent target program.
|Source program|
|(source code)
Lexical analysis (scanning) <- Lex <- patterns
|(token)
Syntax analysis (parsing) <- Yacc <- grammar
|(syntax tree)
Intermediate code gen
|(generated code)
|Intermediate representation|
|
IR optimization => not machine specific
|
Object code gen
|
Object code optimization => machine specific
|
|Target program|
Lexical Analysis
- Scanning; scanner; lexer; lexical analyzer
- Source program read and characters are grouped into tokens
special symbols- Identifiers, reserved words, integers, doubles/floats, delimiters, operators,
Syntax Analysis
- Parsing; parser; syntax analyzer
- Tokens grouped together using (context-free) grammar into parse tree/derivation
Semantic Analysis
- Parse tree/derivation checked for semantic errors
- undeclared variable, a function called with improper arguments, access violations, incompatible operands, type mismatches
Intermediate Code Generation
- Goal: easy to generate & easy to translate into target program
- TAC: three-address code
Intermediate Code Optimization
- Examples
- Suppressing code generation of unreachable code segments - Getting rid of unused variables - Eliminating multiplication by 1 and addition by 0 - Loop optimization (e.g. remove statements that are not modified in the loop) - Common sub-expression elimination
- Often a compiler's option
Object Code Generation
- TAC code translated into assembly or machine code
- Selecting memory locations, registers, which instructions
Object Code Optimization
- Consider H/W and make efficient use of processors & registers
- Specialized instructions, pipelining, branch prediction, and other peephole (a window of instructions) optimizations
Multi-phase Compilers
To build compilers for n
programming languages targeting m
architectures, n*m
compilers naively; n+m
components for 2-phase compiler if 1 front-end for each language, 1 back-end for each architecture.
Interpreter
- Each line is translated as it is encountered and actions carried out immediately
- Syntax tree doesn't generate code; it is processed directly to evaluate expressions and execute statements
- May process the same piece of syntax tree repeatedly (e.g. loop) -> slow down
- Easier to write
- Easier to move to a different machine
- Compilation & interpretation can be combined to implement a language e.g. JAVA