Академический Документы
Профессиональный Документы
Культура Документы
and Technology
Supervisor:
Author:
Maliha Momtaz Islam
ID-1104081
Contents
1 Introduction
2 Scanner
3 Symbol Table
3
3
3
4
5
5
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
14
8 Sample Input
15
9 Sample Output
17
10 Conclusion
18
Introduction
Scanner
Symbol Table
An essential function of compiler is to record the identifier used in the source program and collect information about various attributes of each identifier. A symbol
table is a data structure containing a record for each identifier, with field for the
attribute of identifier. The data structure allows us to find the record for each
identifier. Quickly and to store or retrieve data form that,stored quickly. When
an identifier in source program detected the lexical analyzers, the identifiers entered into symbol table. A common implementation technique is to use a hash
table. A compiler may use one large symbol table for all symbols or use separated,
hierarchical symbol tables for different scopes. There are also trees, linear lists and
2
self organizing lists which can be used to implement a symbol table. It also simplifies the classification of literals in tabular format. The symbol table is accessed
by most phases of a compiler, beginning with the lexical analysis to optimization.
4
4.1
Lex is a tool for generating scanners. Scanners are programs that recognize lexical
patterns in text. These lexical patterns (or regular expressions) are defined in a
particular syntax. A matched regular expression may have an associated action.
This action may also include returning a token. When Lex receives input in the
form of a file or text, it attempts to match the text with the regular expression.
It takes input one character at a time and continues until a pattern is matched.If
a pattern can be matched, then Lex performs the associated action (which may
include returning a token). If, on the other hand, no regular expression can be
matched, further processing stops and Lex displays an error message. Lex and
C are tightly coupled. A .lex file (files in Lex have the extension.lex) is passed
through the lex utility, and produces output files in C. These file(s) are compiled
to produce an executable version of the lexical analyzer.
4.2
4.3
4.4
Programming in Lex
4.5
In this section we can add C variable declarations. We will declare an integer variable here for our word-counting program that holds the number of words counted
by the program. Well also perform token declarations of Lex.
4.6
Lets look at the Lex rules for describing the token that we want to match. (Well
use C to define what to do when a token is matched.) Continuing with our wordcounting program, here are the rules for matching tokens.
%{
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif
#include
#include
#include
#include
int
int
int
int
<stdio.h>
<ctype.h>
<string.h>
"symtab.h"
badtoken_cnt = 0;
token_cnt = 0;
col_cnt = 0;
lineno = 0;
%}
comment
digit
ichar
integer
newline
strchar
identifier
whitespace
float
chrliteral
nullstring
\/\*([^*]|\n)*\*\/
[0-9]
[A-Z_a-z]
{digit}+
\n
([ ~]|\\n)
{ichar}([0-9]|{ichar})*
[ \t]+
([+-]?{digit}+)?\.{digit}*(e?[+-]?{digit}+)?
([!*]|\\n)
\"\"
6
escquote
strliteral
%%
[^"]*\\\"[^"]*
\"[^"]*{escquote}*\"
"if"
"then"
"else"
"while"
"return"
"break"
"goto"
"read"
"write"
"float"
"int"
"void"
"char"
{
{
{
{
{
{
{
{
{
{
{
{
{
return
return
return
return
return
return
return
return
return
return
return
return
return
IF;}
THEN;}
ELSE;}
WHILE;}
RETURN;}
GOTO;}
GOTO;}
READ;}
WRITE;}
REAL;}
INT;}
VOID;}
CHAR;}
"="
"!="
"=="
"<"
"<="
">"
">="
"&&"
"||"
{
{
{
{
{
{
{
{
{
return
return
return
return
return
return
return
return
return
ASSIGN;}
NE;}
EQ;}
LT;}
LE;}
GT;}
GE;}
AND;}
OR;}
"+"
"-"
"*"
"/"
"%"
{
{
{
{
{
return
return
return
return
return
PLUS;}
MINUS;}
TIMES;}
OVER;}
MOD;}
"{"
"}"
"["
"]"
"("
")"
";"
","
{
{
{
{
{
{
{
{
{float}
return
return
return
return
return
return
return
return
LBRACE;}
RBRACE;}
LBRACK;}
RBRACK;}
LPAREN;}
RPAREN;}
SEMI;}
COMMA;}
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, REAL_TYPE, lineno);
printf("yytext: %s\n",yytext);
return FLOAT;
{integer}
}
{
yylval.tokname = malloc(sizeof(yytext));
printf("yylval: %s\n",yylval.tokname);
strncpy(yylval.tokname,yytext,yyleng);
insert(yytext, yyleng, INT_TYPE, lineno);
printf("yytext: %s\n",yytext);
return INTEGER;
}
{chrliteral}
{
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, -1, lineno);
printf("yytext: %s\n",yytext);
return CHRLIT;
8
}
{nullstring}
{
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, -1, lineno);
printf("yytext: %s\n",yytext);
return STRLIT;
}
{strliteral}
{
yylval.tokname = malloc(sizeof(yytext));
strncpy(yylval.tokname,yytext,yyleng);
printf("yylval: %s\n",yylval.tokname);
insert(yytext, yyleng, STR_TYPE, lineno);
printf("yytext: %s\n",yytext);
return STRLIT;
}
{identifier}
{
return IDENT;
{newline}
}
{ col_cnt = 1; }
{whitespace}
{ col_cnt+=yyleng; }
{comment}
{ col_cnt = 0; }
"//"
lineno++;
}
.
{ return ERROR;}
%%
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<string.h>
"symtab.h"
l = l->next;
if (l == NULL) return -1;
else return l->st_type;
}
/* set datatype of symbol returns 0 if symbol not found */
int setType( char * name, int t )
{
int h = hash(name);
Node l = hashTable[h];
while ((l != NULL) && (strcmp(name,l->st_name) != 0))
l = l->next;
if (l == NULL) return -1;
else {
l->st_type = t;
return 0;
}
}
/* print to stdout by default */
void symtab_dump(FILE * of) {
int i;
fprintf(of,"------------ ------ ------------\n");
fprintf(of,"Name
Type
Line Numbers\n");
fprintf(of,"------------ ------ -------------\n");
for (i=0; i < SIZE; ++i)
{ if (hashTable[i] != NULL)
{ Node l = hashTable[i];
while (l != NULL)
{ RefList t = l->lines;
fprintf(of,"%-12s ",l->st_name);
if (l->st_type == INT_TYPE)
fprintf(of,"%-7s","int ");
13
if (l->st_type == REAL_TYPE)
fprintf(of,"%-7s","real");
if (l->st_type == STR_TYPE)
fprintf(of,"%-7s","string");
while (t != NULL)
{ fprintf(of,"%4d ",t->lineno);
t = t->next;
}
fprintf(of,"\n");
l = l->next;
}
}
}
}
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
IF 1
ELSE 2
THEN 3
WHILE 4
GOTO 5
READ 6
WRITE 7
RETURN 8
REAL 9
INT 10
VOID 11
CHAR 12
ASSIGN 13
14
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
NE 14
EQ 15
LT 16
LE 17
GT 18
GQ 19
AND 20
OR 21
PLUS 22
MINUS 23
TIMES 24
OVER 25
MOD 26
LBRACE 27
RBRACE 28
LBRACK 29
RBRACK 30
LPAREN 31
RPAREN 32
SEMI 33
COMMA 34
ERROR 35
FLOAT 36
INTEGER 37
CHRLIT 38
STRLIT 39
IDENT 40
Sample Input
#include<stdio.h>
int main()
15
{
int n, i = 3, count, c;
//declaring variables
16
Sample Output
17
10
Conclusion
As a demonstration this report will present a short program that produce Scanner
using Lex and generates a C Code and a symbol table, indicating each variable,
constant, identifier keyword in the program . We learned flex is a tool for generating scanners: programs which recognized lexical patterns in text. flex reads the
given input files, or its standard input if no file names are given, for a description
of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, lex.yy.c,
which defines a routine yylex(). This file is compiled and linked with the -lfl
library to produce an executable. When the executable is run, it analyzes its input
for occurrences of the regular expressions. Whenever it finds one, it executes the
corresponding C code. The Symbol Table Generator excludes comment text . We
learned about Lex and structure of a Lex code, how to generate lexical analyzer
using Lex. We installed and run Lex in Linux using terminal commands.We specified a input file instead of taking input from standard input and used C++ STL
SET in order to group the tokens and avoid duplicates. We compiled lex.yy.c using
C++ compiler. Example: g++ lex.yy.c.At last, we learned about LATEX while
preparing this report.
18