Вы находитесь на странице: 1из 19

Chittagong University of Engineering

and Technology

Implementation of Symbol Table


Using Flex on Unix Environment
March 21, 2016

Supervisor:
Author:
Maliha Momtaz Islam
ID-1104081

Mr.Safayet Arefin Fahim


Lecturer
Prof. Dr. M. Moshiul Hoque
Head of the Department

Contents
1 Introduction

2 Scanner

3 Symbol Table

4 Implementation of Symbol table using Lex Tool


4.1 Lex . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Regular expressions in Lex . . . . . . . . . . . . .
4.3 Defining regular expressions in Lex . . . . . . . .
4.4 Programming in Lex . . . . . . . . . . . . . . . .
4.5 Global C and Lex declarations . . . . . . . . . . .
4.6 Lex rules for matching patterns . . . . . . . . . .

3
3
3
4
5
5
5

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

5 Source Code For Lex File scan.l

.
.
.
.
.
.

.
.
.
.
.
.

6 Source Code For Symbol Table C Code scan.c

10

7 Source Code For Defining Customized Header File Symtab.h

14

8 Sample Input

15

9 Sample Output

17

10 Conclusion

18

Introduction

A compiler is a translator whose source language is a high-level language and


whose object language is close to the machine language of an actual computer.
The typical compiler consists of several phases each of which passes its output to
the next phase.
The lexical phase (scanner) groups characters into lexical units or tokens. The
input to the lexical phase is a character stream. The output is a stream of tokens.
Regular expressions are used to define the tokens recognized by a scanner (or
lexical analyzer). The scanner is implemented as a finite state machine.
Lex and Flex are tools for generating scanners: programs which recognize
lexical patterns in text. Flex is a faster version of Lex.
The unix utility lex parses a file of characters. It uses regular expression matching; typically it is used to tokenize the contents of the file. However, there are many
other applications possible. We have implemented symbol table by Lex tool.

Scanner

A scanner (lexical analyzer) is a program which recognizes patterns in text.Scanners


may be hand written or may be automatically generated by a lexical analyzer generator from descriptions of the patterns to be recognized. The descriptions are in
the form of regular expressions.

Symbol Table

An essential function of compiler is to record the identifier used in the source program and collect information about various attributes of each identifier. A symbol
table is a data structure containing a record for each identifier, with field for the
attribute of identifier. The data structure allows us to find the record for each
identifier. Quickly and to store or retrieve data form that,stored quickly. When
an identifier in source program detected the lexical analyzers, the identifiers entered into symbol table. A common implementation technique is to use a hash
table. A compiler may use one large symbol table for all symbols or use separated,
hierarchical symbol tables for different scopes. There are also trees, linear lists and
2

self organizing lists which can be used to implement a symbol table. It also simplifies the classification of literals in tabular format. The symbol table is accessed
by most phases of a compiler, beginning with the lexical analysis to optimization.

4
4.1

Implementation of Symbol table using Lex Tool


Lex

Lex is a tool for generating scanners. Scanners are programs that recognize lexical
patterns in text. These lexical patterns (or regular expressions) are defined in a
particular syntax. A matched regular expression may have an associated action.
This action may also include returning a token. When Lex receives input in the
form of a file or text, it attempts to match the text with the regular expression.
It takes input one character at a time and continues until a pattern is matched.If
a pattern can be matched, then Lex performs the associated action (which may
include returning a token). If, on the other hand, no regular expression can be
matched, further processing stops and Lex displays an error message. Lex and
C are tightly coupled. A .lex file (files in Lex have the extension.lex) is passed
through the lex utility, and produces output files in C. These file(s) are compiled
to produce an executable version of the lexical analyzer.

4.2

Regular expressions in Lex

A regular expression is a pattern description using a meta language. An expression


is made up of symbols. Normal symbols are characters and numbers, but there are
other symbols that have special meaning in Lex. The following two tables define
some of the symbols used in Lex and give a few typical examples.

4.3

Defining regular expressions in Lex

Figure 1: Defining Regular Expression in Lex.

Figure 2: Example of Regular Expression.

4.4

Programming in Lex

Programming in Lex can be divided into three steps:


1.Specify the pattern-associated actions in a form that Lex can understand.
2.Run Lex over this file to generate C code for the scanner.
3.Compile and link the C code to produce the executable scanner.
A Lex program is divided into three sections: the first section has global C
and Lex declarations, the second section has the patterns (coded in C), and the
third section has supplemental C functions. main(), for example, would typically
be found in the third section. These sections are delimited by

4.5

Global C and Lex declarations

In this section we can add C variable declarations. We will declare an integer variable here for our word-counting program that holds the number of words counted
by the program. Well also perform token declarations of Lex.

4.6

Lex rules for matching patterns

Lets look at the Lex rules for describing the token that we want to match. (Well
use C to define what to do when a token is matched.) Continuing with our wordcounting program, here are the rules for matching tokens.

Source Code For Lex File scan.l

%{
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif
#include
#include
#include
#include
int
int
int
int

<stdio.h>
<ctype.h>
<string.h>
"symtab.h"

badtoken_cnt = 0;
token_cnt = 0;
col_cnt = 0;
lineno = 0;

%}
comment
digit
ichar
integer
newline
strchar
identifier
whitespace
float
chrliteral
nullstring

\/\*([^*]|\n)*\*\/
[0-9]
[A-Z_a-z]
{digit}+
\n
([ ~]|\\n)
{ichar}([0-9]|{ichar})*
[ \t]+
([+-]?{digit}+)?\.{digit}*(e?[+-]?{digit}+)?
([!*]|\\n)
\"\"
6

escquote
strliteral
%%

[^"]*\\\"[^"]*
\"[^"]*{escquote}*\"

"if"
"then"
"else"
"while"
"return"
"break"
"goto"
"read"
"write"
"float"
"int"
"void"
"char"

{
{
{
{
{
{
{
{
{
{
{
{
{

return
return
return
return
return
return
return
return
return
return
return
return
return

IF;}
THEN;}
ELSE;}
WHILE;}
RETURN;}
GOTO;}
GOTO;}
READ;}
WRITE;}
REAL;}
INT;}
VOID;}
CHAR;}

"="
"!="
"=="
"<"
"<="
">"
">="
"&&"
"||"

{
{
{
{
{
{
{
{
{

return
return
return
return
return
return
return
return
return

ASSIGN;}
NE;}
EQ;}
LT;}
LE;}
GT;}
GE;}
AND;}
OR;}

"+"
"-"
"*"
"/"
"%"

{
{
{
{
{

return
return
return
return
return

PLUS;}
MINUS;}
TIMES;}
OVER;}
MOD;}

"{"
"}"
"["
"]"
"("
")"
";"
","

{
{
{
{
{
{
{
{

{float}

return
return
return
return
return
return
return
return

LBRACE;}
RBRACE;}
LBRACK;}
RBRACK;}
LPAREN;}
RPAREN;}
SEMI;}
COMMA;}

yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, REAL_TYPE, lineno);
printf("yytext: %s\n",yytext);
return FLOAT;
{integer}

}
{
yylval.tokname = malloc(sizeof(yytext));
printf("yylval: %s\n",yylval.tokname);
strncpy(yylval.tokname,yytext,yyleng);
insert(yytext, yyleng, INT_TYPE, lineno);
printf("yytext: %s\n",yytext);
return INTEGER;
}

{chrliteral}

{
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, -1, lineno);
printf("yytext: %s\n",yytext);
return CHRLIT;
8

}
{nullstring}

{
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, -1, lineno);
printf("yytext: %s\n",yytext);
return STRLIT;
}

{strliteral}

{
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, STR_TYPE, lineno);
printf("yytext: %s\n",yytext);
return STRLIT;
}

{identifier}

{
return IDENT;

{newline}

}
{ col_cnt = 1; }

{whitespace}

{ col_cnt+=yyleng; }

{comment}

{ col_cnt = 0; }

"//"

{ /* handle C++ style comments */


char c;
do { c = input();
} while (c != \n);
9

lineno++;
}
.

{ return ERROR;}

%%

Source Code For Symbol Table C Code scan.c

#include
#include
#include
#include

<stdio.h>
<stdlib.h>
<string.h>
"symtab.h"

/* maximum size of hash table */


#define SIZE 200
#define MAXTOKENLEN 40
/* power of two multiplier in hash function */
#define SHIFT 4
/* the hash function */
static int hash ( char * key )
{ int temp = 0;
int i = 0;
while (key[i] != \0)
{ temp = ((temp << SHIFT) + key[i]) % SIZE;
++i;
}
return temp;
}
/* a linked list of references (line nos) for each variable */
10

typedef struct RefListRec {


int lineno;
struct RefListRec * next;
/* ADDED */
int type;
} * RefList;

/* hash entry holds variable name and its reference list */


typedef struct HashRec {
char st_name[MAXTOKENLEN];
int st_size;
RefList lines;
int st_value;
/* ADDED */
int st_type;
struct HashRec * next;
} * Node;
/* the hash table */
static Node hashTable[SIZE];
/* insert an entry with its line number - if entry
* already exists just add its reference line no.
*/
void insert( char * name, int len, int type, int lineno )
{
/* ADDED */
/*int len = strlen(name);*/
int h = hash(name);
Node l = hashTable[h];
while ((l != NULL) && (strcmp(name,l->st_name) != 0))
l = l->next;
if (l == NULL) /* variable not yet in table */
11

{ l = (Node) malloc(sizeof(struct HashRec));


strncpy(l->st_name, name, len);
/* ADDED */
l->st_type = type;
l->lines = (RefList) malloc(sizeof(struct RefListRec));
l->lines->lineno = lineno;
l->lines->next = NULL;
l->next = hashTable[h];
hashTable[h] = l; }
else /* found in table, so just add line number */
{ RefList t = l->lines;
while (t->next != NULL) t = t->next;
t->next = (RefList) malloc(sizeof(struct RefListRec));
t->next->lineno = lineno;
t->next->next = NULL;
}
}
/* return value (address) of symbol if found or -1 if not found */
int lookup ( char * name )
{ int h = hash(name);
Node l = hashTable[h];
while ((l != NULL) && (strcmp(name,l->st_name) != 0))
l = l->next;
if (l == NULL) return -1;
else return l->st_value;
}
/* return type value of symbol or -1 if symbol not found */
int lookupType( char * name )
{
int h = hash(name);
Node l = hashTable[h];
while ((l != NULL) && (strcmp(name,l->st_name) != 0))
12

l = l->next;
if (l == NULL) return -1;
else return l->st_type;
}
/* set datatype of symbol returns 0 if symbol not found */
int setType( char * name, int t )
{
int h = hash(name);
Node l = hashTable[h];
while ((l != NULL) && (strcmp(name,l->st_name) != 0))
l = l->next;
if (l == NULL) return -1;
else {
l->st_type = t;
return 0;
}
}
/* print to stdout by default */
void symtab_dump(FILE * of) {
int i;
fprintf(of,"------------ ------ ------------\n");
fprintf(of,"Name
Type
Line Numbers\n");
fprintf(of,"------------ ------ -------------\n");
for (i=0; i < SIZE; ++i)
{ if (hashTable[i] != NULL)
{ Node l = hashTable[i];
while (l != NULL)
{ RefList t = l->lines;
fprintf(of,"%-12s ",l->st_name);
if (l->st_type == INT_TYPE)
fprintf(of,"%-7s","int ");
13

if (l->st_type == REAL_TYPE)
fprintf(of,"%-7s","real");
if (l->st_type == STR_TYPE)
fprintf(of,"%-7s","string");

while (t != NULL)
{ fprintf(of,"%4d ",t->lineno);
t = t->next;
}
fprintf(of,"\n");
l = l->next;
}
}
}
}

Source Code For Defining Customized Header


File Symtab.h

#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define

IF 1
ELSE 2
THEN 3
WHILE 4
GOTO 5
READ 6
WRITE 7
RETURN 8
REAL 9
INT 10
VOID 11
CHAR 12
ASSIGN 13
14

#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define

NE 14
EQ 15
LT 16
LE 17
GT 18
GQ 19
AND 20
OR 21
PLUS 22
MINUS 23
TIMES 24
OVER 25
MOD 26
LBRACE 27
RBRACE 28
LBRACK 29
RBRACK 30
LPAREN 31
RPAREN 32
SEMI 33
COMMA 34
ERROR 35
FLOAT 36
INTEGER 37
CHRLIT 38
STRLIT 39
IDENT 40

Sample Input

#include<stdio.h>
int main()
15

{
int n, i = 3, count, c;

//declaring variables

printf("Enter the number of prime numbers required\n");


scanf("%d",&n); //taking input
if ( n >= 1 )
{
printf("First %d prime numbers are :\n",n); //checking condition
printf("2\n");
}
for ( count = 2 ; count <= n ; ) //starting nested loop
{
for ( c = 2 ; c <= i - 1 ; c++ )
{
if ( i%c == 0 )
break;
}
if ( c == i )
{
printf("%d\n",i);
count++;
}
i++;
}
return 0;
}

16

Sample Output

17

10

Conclusion

As a demonstration this report will present a short program that produce Scanner
using Lex and generates a C Code and a symbol table, indicating each variable,
constant, identifier keyword in the program . We learned flex is a tool for generating scanners: programs which recognized lexical patterns in text. flex reads the
given input files, or its standard input if no file names are given, for a description
of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, lex.yy.c,
which defines a routine yylex(). This file is compiled and linked with the -lfl
library to produce an executable. When the executable is run, it analyzes its input
for occurrences of the regular expressions. Whenever it finds one, it executes the
corresponding C code. The Symbol Table Generator excludes comment text . We
learned about Lex and structure of a Lex code, how to generate lexical analyzer
using Lex. We installed and run Lex in Linux using terminal commands.We specified a input file instead of taking input from standard input and used C++ STL
SET in order to group the tokens and avoid duplicates. We compiled lex.yy.c using
C++ compiler. Example: g++ lex.yy.c.At last, we learned about LATEX while
preparing this report.

18

Вам также может понравиться