Вы находитесь на странице: 1из 27

Lexical Analysis

The role of lexical analyzer

Source
program

Lexical
Analyzer

token
Parser

getNextToken

Symbol
table

To semantic
analysis

Why to separate Lexical analysis


and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability

Tokens, Patterns and Lexemes


A token is a pair a token name and
an optional token value
A pattern is a description of the form
that the lexemes of a token may take
A lexeme is a sequence of characters
in the source program that matches
the pattern for a token

Example

Token

if
else

Informal description

Sample lexemes

if

Characters i, f
Characters e, l, s, e

else

<=, !=
comparison
< or > or <= or >= or == or !=

id Letter followed by letter and digits


pi, score, D2
number

Any numeric constant

3.14159, 0, 6.02e23

literal Anything but sorrounded by core dumped

printf(total = %d\n, score);

Attributes for tokens


E = M * C ** 2

< id, pointer to symbol table entry for


E>
<assign-op>
<id, pointer to symbol table entry for
M>
<mult-op>
<id, pointer to symbol table entry for
C>
<exp-op>
<number, integer value 2 >

Input buffering

Sometimes lexical analyzer needs to


look ahead some symbols to decide
about the token to return
In C language: we need to look after -, = or
< to decide what token to return
In Fortran: DO 5 I = 1.25

Lexical Analyzer Read its input from input


Buffer

Scheme used to buffer input


Buffer divided in two halves
One pointer marks the beginning of token
Lookahead pointer scans ahead till token is
discovered
Lookahead can be large
Declare(a1,a2,a3,a4 ) in PL/I program
Declare is a keyword , arrayname ?

Token beginning

lookahead pointer

Two buffer scheme

To handle large look-aheads safely

Switch (lookahead++)
{
case declare :
if (lookahead is at end of first buffer)
{
reload second buffer;
lookahead = beginning of second buffer;
}
else if
{
lookahead is at end of second buffer)
{
reload first buffer;\
lookahead = beginning of first buffer;
}
else /
break;
cases for the other characters;
}

Specification of tokens
Regular expressions are used to formalize the
specification of tokens
Regular expressions are means for specifying
regular languages
Example:
Identifiers = Letter(letter | digit)*
Keyword = begin | end | if | then | else
Constant = digit +
Relop
= < | <=| =| <> | > | >=
Each regular expression is a pattern specifying the
form of strings

Token Recognized

Transition diagrams(kind of flowchart )

Transition diagram for reserved


words and identifiers

Keywords

Identifier

Constants

Rel operator

Code for Transition diagram for


identifier
State 9: C= Getchar();
if letter(C ) then goto state 10
else fail ( );
Letter() : procedure which returns true iff C
is a letter
Fail() : Routine which retracts the lookahead
pointer and start next transition diagram

State 10 :
C = Getchar ()
if letter ( C) or digit ( C) then goto 10
else if Delimiter ( C) then goto 11
else fail ()
Delimiter() : Procedure that returns true
whenever C is a character that could follow
identifier
State 11 : retract ()
return( id , Install() )

Regular expressions = specification


Finite automata = implementation
A finite automaton consists of
An input alphabet
A set of states S
A start state n
A set of accepting states F S
A set of transitions state input state

Transition diagram is finite automation


Nondeterministic Finite Automation (NFA)
A set of states
A set of input symbols
A transition function, move(), that maps statesymbol pairs to sets of states.
A start state S0
A set of states F as accepting (Final) states.

a
a

start
0

b
NFA recognizing the language (a | b ) * abb
The set of states = {0,1,2,3}
Input symbol = {a,b}
Start state is S0, accepting state is S3
Language defined by NFA is the set of strings it
accepts

Transition Function
Transition function can be implemented as a transition
table.
State

Input Symbol
a

{0,1}

{0}

--

{2}

--

{3}

Converting a RE to an
Automata
We can convert a RE to an NFA
Inductive construction
Start with a simple basis, use that to build
more complex parts of the NFA

RE to NFA
Basis:
a

R=a
R=

R=S+T

R=ST

R=S

RE to -NFA Example

Convert R= (ab+a)* to an NFA


We proceed in stages, starting from
simple elements and working our way
up
a

b
ab

RE to NFA Example

ab+a
a

(ab+a)*

Вам также может понравиться