Вы находитесь на странице: 1из 7

Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject

https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]


Home VROSNET Crafting an interpreter Part 2 - The Calc0 Toy Language
Workspace home
Co
Ta
Arti
Add app

Crafting an interpreter Part 2 - The Calc0
Toy Language
by: VROSNET forked from: martinholzherr / crafting-an-interpreter-part-2-the-calc0-toy-langu
Second article on building a language interpreter describing error handling and direct evaluation during parsing.
(Back to article)

0

Share with others
Congratulations! Your download is complete
You now have a new workspace in the cloud!
Your downloaded code is ready to be worked on and shared. You can now access your
downloaded code from anywhere at any time.
What can I do with this Workspace?
Connect directly via Visual Studio
Make changes to the code without affecting the authors original
Issue a pull request to the author to alert them of fixes or updates youve made
View your workspace
This workspace will provide you a snapshot of
the download. If the author updates or
changes the code (uploads zip or push it over
git) then you can update this snapshot easily
or keep your version locked to ensure stability.
Learn More
Check out our selection of articles to get you
started!
Working with ::GitMachine
Introduction to ::Tasks
Introduction to git for Windows users
Migrating a SVN Repository to Git using
::GitMachine
Customizing ::Tasks - tips and tricks

CREATE A
WORKSPACE
EXPLORE
WORKSPACES
My favorites
Nothing to display
My workspaces
Nothing to display
Shared with me
Nothing to display
WCode WTasks
Forum

VROSNET Sign out S home workspaces premium account
Submit
Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject
https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]
Download sample - 107 Kb
Download source (procedural implementation) - 9.71 Kb
Download source (template implementation) - 10.4 Kb
Introduction
In this second article on building an interpreter, we discuss the implementation of an interpreter for a very simple
language. We use a Parsing Expression Grammar [1] [2] to define the syntax and implement the parser. We show
how evaluation can be done during parsing and how to extend the grammar, so that error messages can be issued
in case of parsing errors. We show furthermore, that a Grammar sometimes must be slightly changed in order to
alleviate direct evaluation during parsing.
The Calc0 toy language used for this purpose supports expressions, assignments, variables and print statements.
Immediately after starting, the interpreter for Calc0 enters an infinite loop consisting of the following processing
steps:
It reads a line entered by the user.
It parses the line, and during parsing, it evaluates the recognized expression, assignment or print statement.
It updates an internal map of variable names and their associated values.
It outputs the results of the evaluation.
Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject
https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]
The Calc0 Grammar
The Calc0 Interpreter expects one line of input. The most basic elements on such a line are spaces, identifiers and
numbers. We allow only the white space character, tabulators and carriage return as spaces. We choose C/C++
identifiers and allow floating point or integer values as numbers. Expressions support addition, subtraction,
multiplication, division and the power operator.
We need a context free grammar which is compatible with the parsing method used. Since we chose the Parsing
Expression Grammar formalism, which is an LL parsing method, we must avoid left recursive rules. This results in the
following Parsing Expression Grammar:
//first attempt to calc0_grammar
calc0_grammar: S (print_variables / assignement / expression) S '\0';
assignement: identifier S '=' S expression ;
print_variables: '!' S (identifier (S '-' S identifier)?)? ;
expression: multiplicative_expression
(S ('+' / '-') S multiplicative_expression)* ;
multiplicative_expression:
power_expression (S ('*' / '/') S power_expression)* ;
power_expression: primary_expression (S '^' S primary_expression)? ;
primary_expression: ('+' / '-')? S
(identifier / number / '(' S expression S ')') ;
number: ([0-9]+ '.'? / '.'[0-9])
[0-9]* (('e' / 'E') [0-9]+)? ;
identifier: [A-Za-z_][A-Za-z_0-9]* ;
S: [ \r\v\t]* ;
The start rule of this grammar is called calc0_grammar. The PEG operational interpretation of this rule is the
following: parse the input text by first matching against possible spaces, then try to match the remaining input
against exactly one of the rules print_variables, assignement or expression (try in this order). If
successful, continue parsing by skipping possible spaces, and finally expect the input line terminator which is a zero.
The other rules can be interpreted similarly.
Direct Evaluation
Direct evaluation during parsing means that we carry out the arithmetic operations, the variable assignments and
variable accesses at the place where we parse them. Additions and subtractions, e.g., have to be done inside the
grammar function responsible for parsing an expression.
expression: multiplicative_expression
(S ('+' / '-') S multiplicative_expression)* ;
The evaluation of the value of expression is done by adding/subtracting the result values of the components of
this rule. Intermediate results are stored in the context structure passed to each grammar rule function. This is shown
in the following table:
rule component of expression code executed effect on evaluation
expression:
multiplicative_expression
call multiplicative_expression.
Then copy the result from the context
structure to a local variable.
double result= pS->result;
evaluation value of first rule
component is in a local
variable of the parsing
function of expression
(S ('+' / '-') S
multiplicative_expression)*
for each loop iteration, we add or
subtract the value of the
multiplicative_expression
parsed in this loop to the local variable
result
result is accumulated in the
loop
The resulting code for actually carrying out the addition inside the parsing loop is shown in the following code
fragment:
//code fragment taken from the template implementation of the rule expression
double result= pS->result;
for(;;){/*(S ('+' / '-') S multiplicative_expression)*; */
Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject
https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]
S::Matches(p,pS);
if( And<Char<'+'>,S,multiplicative_expression>::Matches(p,pS)){
result+= pS->result;
}else if( And<Char<'-'>,S, multiplicative_expression>::Matches(p,pS) ){
result-= pS->result;
}else ...
The code for the rule multiplicative_expression, where multiplications and divisions are carried out is
similar. One major problem in direct evaluation arises for the grammar rule identifier. Depending on the place
where an identifier is used, we must evaluate completely different things as shown in the following table.
place of usage of identifier evaluation to be carried out
identifier S '=' S
expression ;
store the name of the identifier in order to be able to store the
value of expression later under this name.
identifier / number / '('
S expression S ')'
store the value of identifier as operand for later to be carried out
additions, multiplications and assignments.
'!' S (identifier (S '-' S
identifier)?)?
store the names of the identifiers to determine the lexicographical
beginning and end of the variables to be printed out.
In the PEG parsing method, there is a one to one correspondence between grammar rule and parsing function. But
we have to do very different things inside the parsing function for identifier depending on the caller. One
solution would be to check for the caller inside the parsing function for identifier. But there is a much simpler
solution. Use different grammar rules for identifier depending on the place where it is used. A similar problem
arises for the rule expression. When used as top element (one of the alternatives for calc0_grammar) we
should print out the calculated value, but not if the expression is on the left hand side of an assignment or is
enclosed in parens ('(' expression ')'). We can solve this problem by introducing two equivalent
grammar_rules, namely expression and full_expression. In this way, we get the following revised calc0
grammar:
//Revised calc0_grammar
calc0_grammar: S (full_expression / assignement / print_variables) S '\0';
assignement: variable S '=' S expression ;
print_variables: '!' S (identFirst (S '-' S identLast)?)? ;
expression: multiplicative_expression
(S ('+' / '-') S multiplicative_expression)* ;
full_expression: expression ;
multiplicative_expression:
power_expression (S ('*' / '/') S power_expression)* ;
power_expression: primary_expression (S '^' S primary_expression)? ;
primary_expression: ('+' / '-')? S
(identifier / number / '(' S expression S ')') ;
number: ([0-9]+ '.'? / '.'[0-9])
[0-9]* (('e' / 'E') [0-9]+)? ;
identifier: [A-Za-z_][A-Za-z_0-9]* ;
variable: identifier ;
identFirst: identifier ;
identLast: identifier ;
S: [ \r\v\t]* ;
Handling of Parsing Errors in PEG's
An input for a parser is erroneous if an expected element is missing from the input and there is no alternative left.
The PEG formalism allows to integrate error recognition into the grammar, as the following table shows:
Error Situation Error Recognizing Rule
E is expected
S: A B E U ;
S: A B (E / Error<eSymbol_E_expected>) U ;
One of the alternatives B or C must be present
S: A ( B / C ) D ;
S: A (B / C / Error<eOneOf_B_C_expected>) D ;
The simplest kind of error handling in a PEG grammar is to stop the parsing process at the error position, give an
error message, and throw an exception. We will use this technique here, because the input never exceeds one line.
We enhance the grammar with indicators for error situations by appending an alternative Fatal.
Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject
https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]
//calc0_grammar with error handling indicators
calc0_grammar: S
( print_variables /
assignement /
full_expression /
Fatal<eExprAssPrintExpected>;
)
S
( '\0' / Fatal<eEndOfInputExpected>) ;
assignement: variable S '=' S expression ;
print_variables: '!'
S
(
identFirst
S
( '-' S ( identLast / Fatal<eIdentExpected>) )?
)? ;
expression: multiplicative_expression
(S ('+' / '-') S multiplicative_expression)* ;
full_expression: expression ;
multiplicative_expression:
power_expression (S ('*' / '/') S power_expression)* ;
power_expression: primary_expression (S '^' S primary_expression)? ;
primary_expression: ('+' / '-')? S
( identifier /
number /
'(' S expression S ( ')' / Fatal<eRParenExpected> ) /
Fatal<eIdentNumOrExprExpected>
) ;
number: ([0-9]+ '.'? / '.' ([0-9] / Fatal<eFracDigitsExpected>) )
[0-9]*
( ('e'|'E')
( '+' / '-' )?
([0-9]+ / Fatal<eExponentDigitsExpected>)
)?;
identifier: [A-Za-z_][A-Za-z_0-9]* ;
variable: identifier ;
identFirst: identifier ;
identLast: identifier ;
S: [ \r\v\t]* ;
In the PEG implementation, there is a one to one relation between the grammar rules given above (inclusive error
handling) and the template instantiation. This is shown for the start rule for both the template and procedural
implementation:
calc0_grammar:
S (print_variables / assignement / full_expression / Fatal) S ('\0' / Fatal) ;
template implementation procedural implementation
typedef
And<
S,
Or<
print_variables,
assignement,
full_expression,
Fatal<
eExprAssPrintExpected
>
>,
S,
inline bool calc0_grammar_rule(
const uchar*& p, StateInfo* pS)
{ const uchar *p0,*p1;
return
PEG_AND(p0,p,
S (p,pS)
&& PEG_OR4(p1,p,
print_variables (p,pS),
assignement (p,pS),
full_expression (p,pS),
pS->Fatal(p0,
"expression, "
Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject
https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]
Or<
Char<0>,
Fatal<eEndOfInputExpected>
>
> rule;
"v= expr or '!' expected")
)
&& S(p,pS)
&& ( Char(p,'\0')
||pS->Fatal(p,
"end of input expected")
)
);
}
Maintaining Information across Runs
This interpreter runs in an eternal loop of waiting for input, and then reading, parsing and evaluating it. Each time the
user enters an assignment, it not only calculates the new value of the variable, which is directly involved into the
assignment, but it also looks for other formulas which use the changed value and recalculates them. This information
is kept in a map with the following value type:
typedef std::set<std::string,CompareNoCase> set_istring;
struct Formula{
Formula(std::string e="",double v=0.0, const set_istring& sUsed=set_istring())
:expression(e),value(v),setUsedIdents(sUsed){}
std::string expression; //needed when recalculation necessary
double value; //current value of the formula variable
set_istring setUsedIdents;//identifiers used by the formula
};
The code responsible for calculating the new variable value and recalculating the dependent formulas is contained in
the parser part for the rule assignement: variable S '=' S expression;. The variable, which is the
target of the assignment, is stored with its value and name, the whole input line, and the variable names which
appear on the right hand side of the assignment as Formula in the variables map.
Handling of Runtime errors
Runtime error handling is one of the most underestimated subjects. The two main questions in runtime error
handling are:
Amount of runtime error checks.
Kind of error action: program abortion, raising an exception, setting a flag, or autocorrection.
The expectations regarding the amount of runtime checks are different for interpreted and compiled languages and
differs from language to language. Array bound checks, e.g., are standard in Pascal, but not available for C/C++.
Even if the language specification defines a runtime error as severe, the programming language culture eventually
may gently force the implementor to dispense with issuing a runtime error. A good example is integer overflow in
C/C++, which results in undefined behaviour but will not be caught by most C/C++ implementations.
The answer to the second question regarding runtime error handling, namely what kind of error action should be
carried out, is currently biased towards "raising an exception". C++ and Java even define a complete inheritance tree
of runtime exceptions in their standard libraries. The only notable exception, where throwing an exception may not
be the right thing, is floating point error handling. IEEE 754 (the standard for floating point arithmetic) supports both,
setting an error flag and taking a trap. Once a floating point error flag is set, it will stay until it is explicitly cleared.
The standard stream library recognizes the IEEE 754 error flags and prints them out as in the sample shown below:
3/0
>> = 1.#INF
1e300*-1e300
>> = -1.#INF
When a floating point overflow occurs, it may be meaningful to continue the computation by setting the floating
point value to the maximum floating point value. This is done by the Calc0 interpreter when assigning to a variable
in order to avoid illegal variable values in the variables map.
x= 3/0
>> x= 1.#INF
.. floating point overflow (=DBL_MAX)
>> x= 1.797693134862316e+308
Crafting an interpreter Part 2 - The Calc0 Toy Language - VROSNET - CodeProject
https://workspaces.codeproject.com/vrosnet/crafting-an-interpreter-part-2-the-calc0-toy-langu[19/06/2014 4:44:44 AM]
Advertise | Privacy | Pricing | Report problem | Forum | Twitter Copyright CodeProject, 1999-2014

All Rights Reserved. Terms of Use
layout: fixed | fluid
The Calc0 interpreter recognizes floating point errors by using the non standard _fpclass function which returns
error indicators like _FPCLASS_NINF (negative overflow), _FPCLASS_PINF (positive overflow), and so on. C99 [2]
standardizes floating point error handling with additions to the math.h header, but VC6 and VC7 do not yet support
it.
Take-Home Points
Direct evaluation during parsing can be difficult because of the limited context information available during parsing.
We can sometimes improve the situation by changing the grammar.
Parsing errors can be handled easily within the PEG framework, because the components of a PEG grammar rule are
sequentially evaluated, which makes execution order predictable.
Runtime error handling must be adapted to the expectations of the target user group. Standards for runtime error
handling are rare. An exception is floating point error handling.
History
2005 April: initial version.
References
1. Parsing Expression Grammars, Bryan Ford, MIT, January 2004 [^]
2. Parsing Expression Grammars, Wikipedia, the free encyclopedia[^]
3. C99 Library Reference for <math.h>[^]

Вам также может понравиться