To study LEX & YACC utilities in detail.
Problem
Statement : To study LEX & YACC utilities in detail.
Theory :
A compiler or interptreter for a programminning language is often
decomposed into two parts:
- Read the source program and discover its structure.
- Process this structure, e.g. to generate the target program.
Lex and Yacc can generate program fragments that solve
the first task.
The task of
discovering the source structure again is decomposed into subtasks:
- Split the source file into tokens (Lex).
- Find the hierarchical structure of the program (Yacc).
Lex - A Lexical Analyzer Generator
Lex helps write programs whose control flow is directed by instances
of regular expressions in the input stream. It is well suited for
editor-script type transformations and for segmenting input in
preparation for a parsing routine.
Lex
source is a table of regular expressions and corresponding program
fragments. The table is translated to a program which reads an input
stream, copying it to an output stream and partitioning the input
into strings which match the given expressions. As each such string
is recognized the corresponding program fragment is executed. The
recognition of the expressions is performed by a deterministic finite
automaton generated by Lex. The program fragments written by the user
are executed in the order in which the corresponding regular
expressions occur in the input stream.
The
lexical analysis programs written with Lex accept ambiguous
specifications and choose the longest match possible at each input
point. If necessary, substantial lookahead is performed on the input,
but the input stream will be backed up to the end of the current
partition, so that the user has general freedom to manipulate it.
Introduction.
Lex is a program generator designed for lexical processing of
character input streams. It accepts a high-level, problem oriented
specification for character string matching, and produces a program
in a general purpose language which recognizes regular expressions.
The regular expressions are specified by the user in the source
specifications given to Lex. The Lex written code recognizes these
expressions in an input stream and partitions the input stream into
strings matching the expressions. At the boundaries between strings
program sections provided by the user are executed. The Lex source
file associates the regular expressions and the program fragments. As
each expression appears in the input to the program written by Lex,
the corresponding fragment is executed.
The user
supplies the additional code beyond expression matching needed to
complete his tasks, possibly including code written by other
generators. The program that recognizes the expressions is generated
in the general purpose programming language employed for the user's
program fragments. Thus, a high level expression language is provided
to write the string expressions to be matched while the user's
freedom to write actions is unimpaired. This avoids forcing the user
who wishes to use a string manipulation language for input analysis
to write processing programs in the same and often inappropriate
string handling language.
Example.
Consider copying an input file while adding 3 to every positive
number divisible by 7. Here is a suitable Lex source program
%%
int
k;
[0-9]+
{
k
= atoi(yytext);
if
(k%7 == 0)
printf("%d",
k+3);
else
printf("%d",k);
}
to do just that. The rule [0-9]+ recognizes strings of digits; atoi
converts the digits to binary and stores the result in k. The
operator % (remainder) is used to check whether k is divisible by 7;
if it is, it is incremented by 3 as it is written out. It may be
objected that this program will alter such input items as 49.63 or
X7. Furthermore, it increments the absolute value of all negative
numbers divisible by 7. To avoid this, just add a few more rules
after the active one, as here:
%%
int
k;
-?[0-9]+
{
k
= atoi(yytext);
printf("%d",
k%7
== 0 ? k+3 : k);
}
-?[0-9.]+
ECHO;
[A-Za-z][A-Za-z0-9]+
ECHO;
Numerical strings containing a ``.'' or preceded by a letter will be
picked up by one of the last two rules, and not changed. The if-else
has been replaced by a C conditional expression to save space; the
form a?b:c means ``if a then b else c''.
Yacc:
Yet Another Compiler-Compiler
Yacc provides a general
tool for describing the input to a computer program. The Yacc user
specifies the structures of his input, together with code to be
invoked as each such structure is recognized. Yacc turns such a
specification into a subroutine that handles the input process;
frequently, it is convenient and appropriate to have most of the flow
of control in the user's application handled by this subroutine.
Yacc provides a general tool for imposing structure on the input to a
computer program. The Yacc user prepares a specification of the input
process; this includes rules describing the input structure, code to
be invoked when these rules are recognized, and a low-level routine
to do the basic input. Yacc then generates a function to control the
input process. This function, called a parser, calls the
user-supplied low-level input routine (the lexical analyzer) to pick
up the basic items (called tokens) from the input stream. These
tokens are organized according to the input structure rules, called
grammar rules; when one of these rules has been recognized, then user
code supplied for this rule, an action, is invoked; actions have the
ability to return values and make use of the values of other actions.
The heart of the input
specification is a collection of grammar rules. Each rule describes
an allowable structure and gives it a name. For example, one grammar
rule might be
date :
month_name day ',' year ;
Here, date, month_name, day, and year represent structures of
interest in the input process; presumably, month_name, day, and year
are defined elsewhere. The comma ``,'' is enclosed in single quotes;
this implies that the comma is to appear literally in the input. The
colon and semicolon merely serve as punctuation in the rule, and have
no significance in controlling the input. Thus, with proper
definitions, the input
July 4,
1776
might be matched by the above rule.
Specification files
are very flexible. It is realively easy to add to the above example
the rule
date :
month '/' day '/' year ;
allowing
7 / 4 / 1776
as a synonym for
July 4, 1776
For details of LEX, refer:
http://dinosaur.compilertools.net/lex/index.html
For details of YACC , refer:
http://dinosaur.compilertools.net/yacc/index.html
Conclusion :
_________________________________________________________
Frequently
asked questions:
1.What is mean by the syntax analysis?
2. What is LEX?
3. Give any example of structure of input specified by YACC?
4. What is mean by YACC? Explain the exact function of it.
4. What is mean by YACC? Explain the exact function of it.
Comments
Post a Comment