100%(1)100% нашли этот документ полезным (1 голос)
196 просмотров23 страницы
This document provides an overview of Lex/Flex and Yacc/Bison, which are tools used for lexical analysis and parsing. Lex/Flex is used to generate scanners or lexical analyzers from regular expression rules. It divides input into tokens which are passed to a parser generated by Yacc/Bison. Yacc/Bison generates parsers based on grammar rules. Flex uses a specification file with regular expression patterns and actions to generate a C program that scans input and identifies tokens. Bison works with a parser generated by Flex to analyze program syntax based on a grammar.
This document provides an overview of Lex/Flex and Yacc/Bison, which are tools used for lexical analysis and parsing. Lex/Flex is used to generate scanners or lexical analyzers from regular expression rules. It divides input into tokens which are passed to a parser generated by Yacc/Bison. Yacc/Bison generates parsers based on grammar rules. Flex uses a specification file with regular expression patterns and actions to generate a C program that scans input and identifies tokens. Bison works with a parser generated by Flex to analyze program syntax based on a grammar.
This document provides an overview of Lex/Flex and Yacc/Bison, which are tools used for lexical analysis and parsing. Lex/Flex is used to generate scanners or lexical analyzers from regular expression rules. It divides input into tokens which are passed to a parser generated by Yacc/Bison. Yacc/Bison generates parsers based on grammar rules. Flex uses a specification file with regular expression patterns and actions to generate a C program that scans input and identifies tokens. Bison works with a parser generated by Flex to analyze program syntax based on a grammar.
Mekelle University General Lex/Flex Information • lex is a tool to generate lexical analyzers. It was written by Mike Lesk and Eric Schmidt (the Google guy). divides a stream of input characters into meaningful units (lexemes), identifies them (token) and may pass the token to a parser generator, yacc lex specifications are regular expressions • flex (fast lexical analyzer generator) –Free and open source alternative. –You’ll be using this. General Yacc/Bison Information • Yacc (yet another compiler compiler) –Is a tool to generate parsers (syntactic analyzers). –Generated parsers require a lexical analyzer. –It isn’t used anymore. • Bison –Free and open source alternative. –You’ll be using this. Lex/Flex and Yacc/Bison relation to a compiler tool chain FLEX IN DETAIL How Flex Works • Flex uses a .l spec file to generate a tokenizer/scanner.
• The tokenizer reads an input file and chunks it into a
series of tokens which are passed to the parser. • Flex is a program that automatically creates a scanner in C, using rules for tokens as regular expressions. Internal Structure of Lex/Flex
Regular NFA DFA Minimal DFA expressions
The final states of the DFA are associated with actions
Flex/Lex Structure • Format of the input file is like … Definitions section (1) • There are three things that can go in the definitions section: • C code Any indented code between %{ and %} is copied to the C file. This is typically used for defining file variables, and for prototypes of routines that are defined in the code segment. • definitions A definition is very much like a #define cpp directive. For example letter [a-zA-Z] digit [0-9] punct [,.:;!?] nonblank [ˆ \t] These definitions can be used in the rules section: one could start a rule {letter}+ {... Definitions section (2) • State definitions If a rule depends on context, it’s possible to introduce states and incorporate those in the rules. A state definition looks like %s STATE, and by default a state INITIAL is already given. Rules section • The rules section has a number of pattern-action pairs. • The patterns are regular expressions and the actions are either a single C command, or a sequence enclosed in braces. • Example: RE Action \n linenum++; [0-9]+ printf(“integer”); [a-zA-Z] printf(“letter”); Lex/Flex Regular Expression (1) • Regular Expression contains: – text characters (which match the corresponding characters in the strings being compared) and – operator characters (which specify repetitions, choices, and other features). • Text characters: the letters of the alphabet and the digits are always text characters. • Operator Characters: the operator characters are: “\[]^-?.*+|()$/{}%<> • and, if they are to be used as text characters, an escape (\) should be used. • The quotation mark operator (“) indicates that whatever is contained between a pair of quotes is to be taken as text character. Lex/Flex Regular Expression (2) • Character classes: • Classes of characters can be specified using the operator pair []. • The construction [abc] matches a single character, which may be a, b, c. • Within square brackets, most operator meanings are ignored. Only three characters are special: these are \ - and ^ • The – character indicates ranges, for example, [a-z0-9] (i.e., it indicates the character class containing all the lower case letters and digits) • If it is desired to include the character - in a character class, it should be first or last; Ex: [-+0-9] matches all digits and two signs. Lex/Flex Regular Expression (3) • In character classes, the ^ operator must appear as the first character after the left bracket; • It indicates that the resulting string is to be complemented with respect to the computer character set. • Example: • [^abc] matches all characters except a, b, or c • [^a-zA-Z] matches any character which is not a letter. • The \ character provides the usual escapes within character class brackets. Lex/Flex Regular Expression (4) • Arbitrary character: • To match almost any character, the operator character. (dot) is the class of all characters except newline. • Optional Expression: • The operator ? Indicates an optional element of an expression. Ex: ab?c matches either ac or abc • Repeated Expressions: • Repetitions of classes are indicated by the operators * and + • Ex: a* is any number of consecutive a characters including zero; while a+ is one or more instance of a. Lex/Flex Regular Expression (5) • Example: • [a-z]+ is all strings of lower case letters • [A-Z][a-z]+ indicates strings with a first upper case letter followed by any number of lower case letters. • Alternation and Grouping: • The operator | indicates alternation: • Ex: (ab|cd) matches either ab or cd. • Ex: (ab | cd+)?(ef)* matches such strings as abefef, efefef, cdef, or cddd; but not abc, abcd, or abcdef Lex/Flex Regular Expression (6) • Repetition and Definitions: • The operators { } specify either repetitions (if they enclose number) or definition expression (if they enclose a name) • Ex: {digit} looks for a predefined string named digit and inserts it at that point in the expression. The definitions are given in the first part of the Lex input, before the rules. • In contrast, a{1, 5} looks for 1 to 5 occurrences of a. Example Pattern Meaning c The char “c” “c” The char “c” even if it is a special char in this table \c Same as “c”, used to quote a single char [cd] The char c or the char d [a-z] Any single char in the range a through z [^c] Any char but c . Any char but newline ^x The pattern x if it occurs at the beginning of a line x$ The pattern x at the end of a line x? An optional x Example x* Zero or more occurrences of the pattern x x+ One or more occurrences of the pattern x xy The pattern x concatenated with the pattern y x|y An x or a y (x) An x x/y An x only if followed by y <S>x The pattern x when lex is in start condition S {name} The value of a macro from definitions section x{m} m occurrences of the pattern x x{m,n} m through n occurrences of x (takes precedence over concatenation) Flex/Lex Predefined Variables Name function int yylex(void) call to invoke lexer, returns token char *yytext pointer to matched string yyleng length of matched string yylval value associated with token int yywrap(void) wrapup, return 1 if done, 0 if not done FILE *yyout output file FILE *yyin input file INITIAL initial start condition BEGIN condition switch start condition ECHO write matched string Flex/Lex Action • When an expression written matched, Lex/Flex executes the corresponding actions. • Example: %% [a-z]+ printf (“alpha\n”); [0-9]+ printf (“numeric\n”); [a-z0-9]+ printf (“alphanumeric\n”); [ \t]+ printf (“white space\n”); . printf (“special char\n”); \n ; %% Disambiguation Rules • If there are several patterns which match the current input, yylex() chooses one of them according to these rules:
1. The longest match is preferred.
2. Among rules that match the same number of characters, the rule that occurs earliest in the list is preferred. Example • Show the output if the input to yylex() generated by the lex program above is abc123 abc 123?x • Solution: alphanumeric white space alpha white space numeric special char alpha