To study different phases of compiler
Problem Statement : To study how an assignment statement passes through different phases of
compiler.
Theory : We know a compiler is a single box that maps a source program into a
semantically equivalent target program. If we open up this box a little, we see that there are two
parts to this mapping: analysis and synthesis.
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate
representation of the source program. If the analysis part detects that the source program
is either syntactically ill formed or semantically unsound, then it must provide
informative messages, so the user can take corrective action. The analysis part also
collects information about the source program and stores it in a data structure called a
symbol table, which is passed along with the intermediate representation to the synthesis
part.
The synthesis part constructs the desired target program from the intermediate
representation and the information in the symbol table. The analysis part is often called
the front end of the compiler; the synthesis part is the back end.
If we examine the compilation process in more detail, we see that it operates as a sequence of
phases, each of which transforms one representation of the source program to another. A typical
decomposition of a compiler into phases is shown in Fig abc. In practice, several phases may be
grouped together, and the intermediate representations between the grouped phases need not be
constructed explicitly. The symbol table, which stores information about the entire source
program, is used by all phases of the compiler.
1. Lexical Analysis
The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads
the stream of characters making up the source program and groups the characters into
meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as
output a token of the form
(token-name, attribute-value)
that it passes on to the subsequent phase, syntax analysis. In the token, the first component
token-name is an abstract symbol that is used during syntax analysis, and the second
component attribute-value points to an entry in the symbol table for this token. Information
from the symbol-table entry Is needed for semantic analysis and code generation.
2. Syntax Analysis
The second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like intermediate
representation that depicts the grammatical structure of the token stream. A typical
representation is a syntax tree in which each interior node represents an operation and the
children of the node represent the arguments of the operation.
3. Semantic Analysis
The semantic analyzer uses the syntax tree and the information in the symbol table to check
the source program for semantic consistency with the language definition. It also gathers type
information and saves it in either the syntax tree or the symbol table, for subsequent use
during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler checks that each
operator has matching operands. For example, many programming language definitions
require an array index to be an integer; the compiler must report an error if a floating-point
number is used to index an array.
4. Intermediate Code Generation
In the process of translating a source program into target code, a compiler may construct one
or more intermediate representations, which can have a variety of forms. Syntax trees are a
form of intermediate representation; they are commonly used during syntax and semantic
analysis. After syntax and semantic analysis of the source program, many compilers generate
an explicit low-level or machine-like intermediate representation, which we can think of as a
program for an abstract machine. This intermediate representation should have two important
properties: it should be easy to produce and it should be easy to translate into the target
machine.
5. Code Optimization
The machine-independent code-optimization phase attempts to improve the intermediate
code so that better target code will result. Usually better means faster, but other objectives
may be desired, such as shorter code, or target code that consumes less power. A simple
intermediate code generation algorithm followed by code optimization is a reasonable way to
generate good target code. The optimizer can deduce that the conversion of 60 from integer
to floating point can be done once and for all at compile time, so the inttofloat operation can
be eliminated by replacing the integer 60 by the floating-point number 60.0.
6. Code Generation
The code generator takes as input an intermediate representation of the source program and
maps it into the target language. If the target language is machine code, registers Or memory
locations are selected for each of the variables used by the program. Then, the intermediate
instructions are translated into sequences of machine instructions that perform the same task.
A crucial aspect of code generation is the judicious assignment of registers to hold variables.
7. Symbol-Table Management
An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name. These attributes may
provide information about the storage allocated for a name, its type, its scope (where in the
program its value may be used), and in the case of procedure names, such things as the
number and types of its arguments, the method of passing each argument (for example, by
value or by reference), and the type returned.
The symbol table is a data structure containing a record for each variable name, with fields
for the attributes of the name. The data structure should be designed to allow the compiler to
find the record for each name quickly and to store or retrieve data from that record quickly.
Conclusion : _________________________________________________________
Frequently asked questions:
1. What are compilers?
2. What is language processor?
3. What is interpreter? How it differs from compiler?
4. What are different phases of the compiler?
compiler.
Theory : We know a compiler is a single box that maps a source program into a
semantically equivalent target program. If we open up this box a little, we see that there are two
parts to this mapping: analysis and synthesis.
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate
representation of the source program. If the analysis part detects that the source program
is either syntactically ill formed or semantically unsound, then it must provide
informative messages, so the user can take corrective action. The analysis part also
collects information about the source program and stores it in a data structure called a
symbol table, which is passed along with the intermediate representation to the synthesis
part.
The synthesis part constructs the desired target program from the intermediate
representation and the information in the symbol table. The analysis part is often called
the front end of the compiler; the synthesis part is the back end.
If we examine the compilation process in more detail, we see that it operates as a sequence of
phases, each of which transforms one representation of the source program to another. A typical
decomposition of a compiler into phases is shown in Fig abc. In practice, several phases may be
grouped together, and the intermediate representations between the grouped phases need not be
constructed explicitly. The symbol table, which stores information about the entire source
program, is used by all phases of the compiler.
1. Lexical Analysis
The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads
the stream of characters making up the source program and groups the characters into
meaningful sequences called lexemes. For each lexeme, the lexical analyzer produces as
output a token of the form
(token-name, attribute-value)
that it passes on to the subsequent phase, syntax analysis. In the token, the first component
token-name is an abstract symbol that is used during syntax analysis, and the second
component attribute-value points to an entry in the symbol table for this token. Information
from the symbol-table entry Is needed for semantic analysis and code generation.
2. Syntax Analysis
The second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like intermediate
representation that depicts the grammatical structure of the token stream. A typical
representation is a syntax tree in which each interior node represents an operation and the
children of the node represent the arguments of the operation.
3. Semantic Analysis
The semantic analyzer uses the syntax tree and the information in the symbol table to check
the source program for semantic consistency with the language definition. It also gathers type
information and saves it in either the syntax tree or the symbol table, for subsequent use
during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler checks that each
operator has matching operands. For example, many programming language definitions
require an array index to be an integer; the compiler must report an error if a floating-point
number is used to index an array.
4. Intermediate Code Generation
In the process of translating a source program into target code, a compiler may construct one
or more intermediate representations, which can have a variety of forms. Syntax trees are a
form of intermediate representation; they are commonly used during syntax and semantic
analysis. After syntax and semantic analysis of the source program, many compilers generate
an explicit low-level or machine-like intermediate representation, which we can think of as a
program for an abstract machine. This intermediate representation should have two important
properties: it should be easy to produce and it should be easy to translate into the target
machine.
5. Code Optimization
The machine-independent code-optimization phase attempts to improve the intermediate
code so that better target code will result. Usually better means faster, but other objectives
may be desired, such as shorter code, or target code that consumes less power. A simple
intermediate code generation algorithm followed by code optimization is a reasonable way to
generate good target code. The optimizer can deduce that the conversion of 60 from integer
to floating point can be done once and for all at compile time, so the inttofloat operation can
be eliminated by replacing the integer 60 by the floating-point number 60.0.
6. Code Generation
The code generator takes as input an intermediate representation of the source program and
maps it into the target language. If the target language is machine code, registers Or memory
locations are selected for each of the variables used by the program. Then, the intermediate
instructions are translated into sequences of machine instructions that perform the same task.
A crucial aspect of code generation is the judicious assignment of registers to hold variables.
7. Symbol-Table Management
An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name. These attributes may
provide information about the storage allocated for a name, its type, its scope (where in the
program its value may be used), and in the case of procedure names, such things as the
number and types of its arguments, the method of passing each argument (for example, by
value or by reference), and the type returned.
The symbol table is a data structure containing a record for each variable name, with fields
for the attributes of the name. The data structure should be designed to allow the compiler to
find the record for each name quickly and to store or retrieve data from that record quickly.
Conclusion : _________________________________________________________
Frequently asked questions:
1. What are compilers?
2. What is language processor?
3. What is interpreter? How it differs from compiler?
4. What are different phases of the compiler?
Comments
Post a Comment