Вы находитесь на странице: 1из 37

LEXICAL ANALYZER AND PARSER

COMPILER
compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language. source program COMPILER target ( Normally a program written in ( Normally the equivalent program in program a high-level programming language) machine code relocatable object file)
A

error messages
2

PHASES OF COMPILER

PHASES OF A COMPILER
Source Program

Lexical Analyzer

Syntax Semantic Intermediate Code Code Analyzer Analyzer Code Generator Optimizer Generator

Target Program

Each phase transforms the source program from one representation into another representation.
They communicate with error handlers. They communicate with the symbol table.

LEXICAL ANALYZER

INTRODUCTION
A lexical analyzer breaks an input stream of characters into tokens. Programs performing lexical analysis are called lexical analyzer or lexer. A lexer consists of scanner and tokenizer.
Writing lexical analyzers by hand can be a tedious process, so software tools have been developed to ease this task. Perhaps the best known such utility is Lex. Lex is a lexical analyzer generator for the UNIX operating system, targeted to the C programming language

ROLE OF THE LEXICAL ANALYZER

INTRODUCING BASIC TERMINOLOGY


What

are Major Terms for Lexical Analysis?

TOKEN A classification for a common set of strings Examples Include <Identifier>, <number>, etc. PATTERN The rules which characterize the set of strings for a token Recall File and OS Wildcards ([A-Z]*.*) LEXEME Actual sequence of characters that matches pattern and is classified by a token Identifiers: x, count, name, etc
8

The input program as you see it. main () { int i, sum; sum = 0; for (i=1; i<=10; i++); sum = sum + i; printf("%d\n",sum); }

10

11

LEXICAL ANALYZER RESPONSIBILITIES


Lexical
Scan

analyzer [Scanner]
input

Remove

white spaces,tabs,new line characters Remove comments Manufacture tokens Generate lexical errors Pass token to parser

13

14

15

LEX INTRODUCTION
Lex is one of the compiler writing tools, that is used to generate a lexical analyzer or scanner from description of tokens of programming language to be implemented. Lex takes a specially-formatted specification file containing the details of a lexical analyzer. This tool then creates a C source file for the associated tabledriven lexer.

LEX SPECIFICATION
Input

to the Lex is a text file containing regular expression along with the actions to be taken by the generated scanner when each regular expression is matched.
The

output is a file that contains C source code defining procedure yylex(),which implements DFA corresponding to regular expression given in input file.
The

output file is usually called lex.yy.c or lexyy.c, which when compiled linked to the main program acts as a scanner or lexical analyzer recognizing tokens specified by regular expression of the input file.

LEX SPECIFICATIONS

A Lex input file is consists of three parts, a collection of definitions, a collection of rules, and a collection of user subroutines. These three sections are separated by double-percent directives (``%%'').
A proper Lex specification has the following format.

LEX SPECIFICATIONS
{definition} %% {rules} %% {user subroutines}

Where the definition & the user subroutines are often omitted. The second %% is optional, but the first is required to mark the beginning of rules.

The input program as you see it. main () { int i, sum; sum = 0; for (i=1; i<=10; i++); sum = sum + i; printf("%d\n",sum); }

21

22

23

24

25

26

27

28

29

MAIN FEATURES
Simple

implementation. Fast lexical analysis. Efficient resource utilization. Portable.

APPLICATIONS AND FUTURE WORK


Text

Editing Text Processing Pattern Matching File Searching

PARSER

PARSING
Parsing

(syntactic analysis) is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given (more or less) formal grammar.

YACC SPECIFICATION
Yacc

(yet another compiler compiler) is a parser generator, which is a program that takes as its input a specification of syntax of the programming language, and produces as its output a parse procedure for that language whose name is yyparse().
The

notation used for preparing this specification is a grammer(CFG).


Input

to yacc is a specification file usually with .y suffix, containing the rules of grammar specifying the structure of language to be implemented. The output is C source code for parser, usually in a file y.tab.c or ytab.c.

FORMAT OF SPECIFICATION FILE


{ definition } %% { rules } %% { programs } The definition section contains information about tokens, data types, and grammar rules. It also includes any C code that must go directly into the output file at its beginning.

CREDITS

Credits goes out to

A special thanks goes out to

THANK YOU!

Вам также может понравиться