Вы находитесь на странице: 1из 3

CS 5300 Compiler Design

Fall 2007

Lex Syntax and Example

Lex is short for "lexical analysis". Lex takes an input file containing a set of lexical analysis rules or regular expres-
sions. For output, Lex produces a C function which when invoked, finds the next match in the input stream.
1. Formatoflexinput:
(beginningincol.1) declarations
%%
tokenrules
%%
auxprocedures
2. Declarations:
a)stringsets; namecharacterclass
b)standardC; %{cdeclarations
%}
3. Tokenrules: regularexpression{optionalCcode}

a)iftheexpressionincludesareferencetoacharacterclass,enclosetheclassnameinbrackets{}
b)regularexpressionoperators;
*,+ closure,positiveclosure
""or\ protectionofspecialchars
| or
^ beginningoflineanchor
() grouping
$ endoflineanchor
? zeroorone
. anychar(except\n)
{ref} referencetoanamedcharacterclass(adefinition)
[] characterclass
[^] notcharacterclass

4. Matchrules:Longestmatchispreferred.Iftwomatchesareequallength,thefirstmatchispreferred.
Remember,lexpartitions,itdoesnotattempttofindnestedmatches.Onceacharacterbecomespartofa
match,itisnolongerconsideredforothermatches.
5. Builtinvariables: yytextptrtothematchinglexeme.(char*yytext;)
yylen lengthofmatchinglexeme(yytext).Note:somesystemsuseyyleng
6. AuxProcedures:CfunctionsmaybedefinedandcalledfromtheCcodeoftokenrulesorfromother
functions.Eachlexfileshouldalsohaveayyerror()functiontobecalledwhenlexencountersanerror
condition.
7.Exampleheaderfile:tokens.h
#defineNUM 1 //defineconstantsusedbylexyy.c
#defineID 2 //couldbedefinedinthelexrulefile
#definePLUS 3
#defineMULT 4
#defineASGN 5
#defineSEMI 6
7. Examplelexfile

D[09] /*notetheselinesbeginincol.1*/
A[azAZ]
%{
#includetokens.h
%}
%%
{D}+ return(NUM); /*matchintegernumbers*/
{A}({A}|{D})* return(ID); /*matchidentifiers*/
"+" return(PLUS); /*matchtheplussign(noteprotection)*/
"*" return(MULT); /*matchthemultsign(noteprotectionagain)*/
:= return(ASGN); /*matchtheassignmentstring*/
; return(SEMI); /*matchthesemicolon*/
. ; /*ignoreanyunmatchedchars*/
%%

voidyyerror() /*defaultactionincaseoferrorinyylex()*/
{printf("error\n");
exit(0);
}

voidyywrap(){} /*usuallyonlyneededforsomeLinuxsystems*/

8. Executionoflex: (togeneratetheyylex()functionfileandthencompileauserprogram)

(MS) c:>flexrulefile (Linux)$lexrulefile

flexproduceslexyy.c lexproduceslex.yy.c

Theproduced.cfilecontainsthisfunction:intyylex()
9. Userprogram:

(Theabovescannerfilemustbelinkedintotheproject)

#include<stdio.h>
#includetokens.h

intyylex(); //scannerprototype
externchar*yytext;

main()
{ intn;
while(n=yylex())//callscanneruntilitreturns0forEOF
printf("%d%s\n",n,yytext);//outputthetokencodeandlexemestring
}
SampleLexPrograms
Prg1.l
%{
#include <stdio.h>
%}

%%
stop printf("Stop command received\n");
start printf("Start command received\n");
%%

Prg2.1
%{
#include <stdio.h>
%}

%%
[0123456789]+ printf("NUMBER\n");
[a-zA-Z][a-zA-Z0-9]* printf("WORD\n");
%%

Prg3.1
%{
#include <stdio.h>
%}

%%
[a-zA-Z][a-zA-Z0-9]* printf("WORD ");
[a-zA-Z0-9\/.-]+ printf("FILENAME ");
\" printf("QUOTE ");
\{ printf("OBRACE ");
\} printf("EBRACE ");
; printf("SEMICOLON ");
\n printf("\n");
[ \t]+ /* ignore whitespace */;
%%

Prg4.l