Академический Документы
Профессиональный Документы
Культура Документы
1. Instructions:
You will be given a group code. This is the last letter of your section and a group number.
Design your own small original imperative programming language.
Give your programming language a name.
The more complex your language, the higher will be its potential score. However, a very
complex language could result in difficulties in succeeding project phases.
This language score will be used to compute your grade in this phase and succeeding phases
of the project.
2. Language Requirements:
Document your language. Identify the following features of your language:
2b. Optional Features (Youll earn more points for each optional feature. Enumerate and
describe each of these features.)
Comments: Allowing In-line and multiline comments.
Allowing constant data types: Identifiers to hold constant data.
you must allow initialization and check that their values are not changed
Allowing tokens including constants to span more than one line in source codes.
Allowing tokens and constants that can be continued after whitespaces (like FORTRAN)
Having special characters indicated by Escape-sequences like newline characters in strings.
Including noise tokens i.e. optional reserved words
Having more data types, various levels of precision or non-decimal radixes.
Negative, fixed point or floating point numbers.
short and int and long; float and double; with characteristic and mantissa;
characters
binary, octal, and hexadecimal data types
Your compiler will be (later) required to perform type-checking for each data type.
Including other data types:
records/structs,
composite types like ordered pairs, etc.
enumerations
Arrays
1 dimensional or 2 dimensional or n-dimensional;
same type or mixed-types
Allowing definitions of new types from existing types (like typedef)
Allowing both global and block/local identifers.
Allowing identifiers with same name but of different types or of different scope
Additional kinds of statements
Switch/case,
Goto statements and statement labels
Break loop or continue loop
Additional looping constructs
Having more operators in expressions
Unary plus, unary minus, (more points if you use the same symbol as binary plus and
binary minus)
subtraction, division, exponentiation,
built-in numeric functions (trig, log, e, hyperbolic, statistical, etc.)
left-shift and right-shift operators
Logical operators: NAND, NOR, XOR, etc.
Relational operators: >=, <=, !=, etc.
String lexicographic relational comparison (< or >).
built-in string functions: concatenation, substring, search, length, etc.
Allowing mixed-mode expressions (operands of different types, upcasting, downcasting)
Handling lists of items
ex. In declaratons, listing ids of the same type,
ex. In I/O, listing items to input or output,
ex. In procedure calls, listing ids in parameter lists, etc.
Having programmer-defined callable procedures
with or without parameters?
with return value?
what parameter passing methods are allowed?
allowing recursion
Object-Oriented Features (like Java or C++)
Classes, Objects
Functional Programming (like Lisp)
Logic Programming (like Prolog)
etc.
6. Grading:
Your grade in this phase will be based on the following:
complexity of your language
the completeness, clarity and accuracy of your write-up and examples,
ability to answer correctly questions about the language and the documentation.
your timeliness,
your peer evaluation.
If you do not deliver on promised features :
Points earned for those features in this phase will be removed.
You get additional small deductions.
Optional features added after language design phase could increase your language score, but
not as much as if they were declared during the language design phase.
Phase II. Write a scanner.
1. Requirements:
Write at least two methods or programs: a tester and a scanner.
The tester specifies the input file containing the source program to be scanned.
The tester creates a symbol table for identifiers, with the lexeme as key, and details about
the identifier as value. Initially, the details will include the lexeme (again) and token type
(id) but later would include its data type and its value.
The tester makes one request per token to the scanner. The tester must keep requesting
tokens from the scanner until the end of the input file is reached.
The scanner returns the next good token for each request from the tester.
If the scanner detects an identifier, it should check if the id is already in the symbol table.
if not, it adds a new entry for the identifier.
If the scanner consumes an error, it should print it. If an error is passed to the tester, the
tester should print it.
The tester should print all good tokens.
Tokens should be printed depending on their type.
Whitespaces should not be printed.
Reserved words and operators should be printed using their token names. (ex.
[COMMA],[READ],[GT],[SUBT],[END] )
Identifiers should be printed to show their lexemes. (ex. x or numStudents)
Constant values should be printed to show their actual values. (ex. Hello, World! or
TRUE)
You may write separate routines that perform one or more of the following:
returns the current position of the next character to be read from the file,
reads the source program and returns the next character, if any,
reads a sequence of characters from the file
unreads a character or a sequence of characters
You must build your own scanner from scratch. You cannot use pre-existing scanners
(like Lex and Flex) created by others.
Optional features of your scanner (You earn more points for doing some of these):
Indicating line number of errors,
Fixing errors,
Skipping sequences of characters in error or do not make real tokens
Write test input source program(s) for scanning.
They must have all kinds of tokens to be identified
They must have representative character sequences that do not make good tokens.
You will use these to demonstrate what your scanner can and can not do.
Document all changes you made in the language specifications since the documentation you
provided for the language design phase. Highlight these changes. Did you add, subtract or
change features of your language? The graded original and the revised version of your
language specifications must both be part of your phase 2 documentation.
Document your phase 2:
Identify the tokens and draw the DFA(s) actually used by the scanner.
How do these compare against your DFA(s) from language design phase?
Why did you make these changes?
If you have multiple DFAs, in what sequence are these DFAs executed?
Indicate if reserved words are checked by table or by DFA(s).
Provide a system architecture diagram showing your modules and the interfaces between
modules.
Show the inputs to each module and where do these inputs come from (user file or
which module).
Show the outputs from each module and who (user or which module) consume these
outputs.
Explain the purpose of each module.
How do you handle errors discovered by the scanner?
What errors can not be handled by your scanner, if any?
Specify if your scanner is consuming whitespaces and/or erroneous character sequences,
or if they are being passed to the caller.
Demonstrate your scanner to the teacher with your computer.
Hand-in your printed documentation as described above.
Show your scanner routines source codes and the input source codes for scanning
The teacher may modify some of your input source codes to test language features you
promised to implement in your language design.
Have a back-up of all your source codes.
2. Grading:
Your grade in this phase will be based on the following:
Delivery of scanner requirements
Quality of test input programs
Correct classification and printout of tokens
Error Handling
Maintenance of symbol table for identifiers
Scanner optional features
Presentation and ability to answer questions correctly
the completeness, clarity and accuracy of documentation
your timeliness,
your peer evaluation
Phase III. Write a parser.
1. Requirements:
Write at least two new methods or programs: a tester and a parser.
The tester specifies the input file containing the source program to be parsed.
The tester calls a routine that prints all the tokens of the source program to produce an
output similar to the printout of the scanner phase.
Note: This step is unnecessary and redundant for parsing. It would however be useful
to check the sequence of tokens that will be processed during parsing.
The tester calls the parser to parse the source program, and receives back the resulting k-
way parse or syntax tree.
It is preferable that an abstract syntax tree be created instead of a concrete syntax tree
(parse tree) in preparation for phase 4.
The tester prints the k-way tree generated.
The parser parses the source program, and generates and returns the corresponding k-
way parse or syntax tree.
The parser should call the scanner repeatedly to fetch one token from the file per call.
You must manually write your own Recursive Descent, LL, LALR, or LR parser.
For top-down parsers, you have to remove left-recursions and left-factor your CFG.
For bottom-up parsers, you have to compute first and follow sets.
You must build your own parser from scratch. You cannot use pre-existing parsers (like
YACC) created by others.
Optional features of your parser (You earn more points for doing some of these):
Indicating line numbers of statements that have parsing errors,
Fixing parsing errors,
Skipping sequences of tokens in error or bypassing grammar variables
Write test input source program(s) for parsing.
They must demonstrate the usage of most of your grammar rules.
Some of them must include code segments with parsing errors.
If you dont provide test programs for optional features, you cant get the additional points.
Document all changes you made in the language specifications since the documentation you
provided for the scanner phase. Highlight these changes. Did you add, subtract or change
features of your language?
Document your phase 3:
The graded original versions of phases 1 and 2 documentations should be submitted as
part of your phase 3 documentation.
Specify in a table the CFG actually used by your parser.
How does this CFG compare against your CFG from the language design phase?
Did you remove left-recursion or left-factor your CFG? What are the results?
Did you compute first and follow sets? What are the results?
Why did you make these changes?
Indicate the parsing method used: Recursive Descent, LL, LALR, or LR.
Did you use a parsing table? If so, your documentation should refer to a spreadsheet that
stores your parsing table.
Provide a system architecture diagram showing your modules and the interfaces between
modules. Include the modules in phase 2, too.
Show the inputs to each module and where do these inputs come from (user file or
which module).
Show the outputs from each module and who (user or which module) consume these
outputs.
Explain the purpose of each module.
Include a short sample source program and draw (using nodes and edges) the
corresponding parse or syntax tree generated by the parser as part of your
documentation.
How do you handle errors discovered by the parser?
What errors can not be handled by your parser, if any?
Demonstrate your parser to the teacher with your computer.
Hand-in your printed documentation as described above.
Show your parser routines source codes and the input source codes for parsing
The teacher may modify some of your input source codes to test language features you
promised to implement in your language design.
Have a back-up of all your source codes.
2. Grading:
Your grade in this phase will be based on the following:
Delivery of parser requirements
Quality of test input programs
Actual parsing vs. expected correct parsing
Correct resolution of ambiguities
Error Handling
Maintenance of symbol table, if any
Working LR(1) and LALR(1) parsers will get significantly higher scores than working LL(1)
and recursive descent and parsers applied to the same language.
Parser optional features
Presentation and ability to answer questions correctly
the completeness, clarity and accuracy of documentation
your timeliness,
your peer evaluation
Phase IV. Write an interpreter and semantic type checker.
1. Requirements:
Write at least two new methods or programs: a tester and an interpreter.
The tester specifies the input file containing the source program to be interpreted.
The tester calls the parser to generate a tree for the input source program.
The tester prints the obtained tree.
The tester calls the interpreter to type-check and execute the source programs tree.
The interpreter should check the semantic type rules of the language.
The interpreter should execute the statements in the tree.
The interpreter should update the symbol table to include information such as data types
and data values.
The interpreter should print informational and error messages.
Optional features of your interpreter (You earn more points for doing some of these):
Indicating line numbers of statements that have interpreter errors,
Fixing type errors by automatic casting, insertion of declarations, providing default values,
etc.
Providing meaningful error messages during interpretation of input source programs.
Write test input source program(s) for interpretation.
They must demonstrate the usage of ALL of your language features.
Some of them must include code segments with interpreter / semantic checking errors.
If you dont provide test programs for your language features, you dont get additional
points. Points may also be taken off from your previous phases.
Document all changes you made in the language specifications since the documentation you
provided for the parser phase. Highlight these changes. Did you add, subtract or change
features of your language?
Document your phase 4:
The graded original versions of phases 1, 2 and 3 documentations should be submitted as
part of your phase 4 documentation.
Enumerate the semantic rules you actually checked in your interpreter. (Ex. Declaration
of an identifier and its type before use, assignment of a value whose type is incompatible
with the type of the identifier, etc.)
How do these compare against the semantic rules you declared in the language
design phase?
Enumerate run-time checking that your interpreter actually performs. (Ex. division by
zero, assignment of a value that is out of the range of the type of an identifier, etc.)
Enumerate the language constructs you are able to interpret correctly, and those that you
have difficulties with. Explain these difficulties briefly.
Does your interpreter create implicitly or explictly an abstract syntax tree from a parse
tree? Or is the abstract syntax tree already provided by the parser?
Provide a system architecture diagram showing your modules and the interfaces between
modules. Include the modules in phases 2 and 3, too.
Show the inputs to each module and where do these inputs come from (user file or
which module).
Show the outputs from each module and who (user or which module) consume these
outputs.
Explain the purpose of each module.
Include a short sample source program, draw (using nodes and edges) the corresponding
abstract syntax tree, indicate a sample input to this source program, and the
corresponding output.
How do you handle errors discovered by the interpreter?
What errors can not be handled by your interpreter, if any?
Demonstrate your interpreter to the teacher with your computer.
Hand-in your printed documentation as described above.
Show your interpreter routines source codes and the input source codes for interpretation
The teacher may modify some of your input source codes to test language features you
promised to implement in your language design.
Have a back-up of all your source codes.
2. Grading:
Your grade in this phase will be based on the following:
Delivery of interpreter requirements
Quality of test input programs
Checking for and catching type errors correctly
Correct execution of test input source programs
Error Handling
Maintenance of symbol table, if any
Interpreter / Semantic Type Checker optional features
Presentation and ability to answer questions correctly
the completeness, clarity and accuracy of documentation
your timeliness,
your peer evaluation
An Example Language:
Language Specifications :
not case sensitive
tokens and constants cannot span more than one line.
Whitespaces:
Comments:
// comment till the end of line
/* */ inline and multiline comment; ignore anything in between /* and */
Comments serve as delimiters of tokens; comments cannot be inside any single token
Blanks, Tabs, Line breaks also delimit tokens
Statements can continue for several lines
Noise word : ANG skipped anywhere it appears as a lexeme
identifiers: start with a letter; followed by letters and digits; delimited by non-letters/digits
Data Types:
boolean: TUTOT values OO and HINDI
for comparison: OO > HINDI
Statements:
all statements begin with reserved words; No statement separators used.
PROGRAM id SIMULA statements TAPOS
Declarations: URI type-name comma-separated identifiers
All identifiers must be declared before used.
All identifiers are global regardless of block structure.
Names of identifiers must be unique and different from reserved words.
Assignments: ILAGAYSA-ANG (ILAGAYSA X ANG 3) ANG is optional
Conditional: KUNG-EHDI-IBA-KUNGFI (if then else endif; if then endif)
Loops: HABANG-GAWIN statement while do loop
I/O: BASAHIN - read from keyboard a string ended by newline character
(BASAHIN ANG X) ANG is optional
I/O: ISULAT write to output display a string; no formatting except newline
(ISULAT SAGOT= ~ X)
blocks indicated by SIMULA TAPOS
no procedures, methods nor functions.
Expressions:
Parenthesization allowed
Operators:
Salita ~(concatenation), <, >, =, !=, <=, >= (lexicographic comparison)
Bilang +(unary),-(unary),+(binary),-(binary),*,/ (integer division),<, >, =, !=, <=, >=
Lutang +(unary),-(unary),+(binary),-(binary),*,/ (real number division), <, >, =, !=, <=, >=
Tutot: ATSAKA (and), OKAYA (or), DEHINS (not), <, >, =, !=, <=, >= (OO < HINDI)
Precedence Level Operations Associativity
(Highest to lowest)
1 parenthesization L to R
2 unary plus (+), unary minus (-) R to L
3 * and / L to R
4 binary plus (+), binary minus (-) L to R
5 comparison operators <, >, =, !=, <=, >= L to R
6 DEHINS R to L
7 ATSAKA L to R
8 OKAYA L to R
9 concatenation (~) L to R
Special characters:
\ - in String literals denote a newline
Tokens (Internal)
CONSTBILANG, CONSTLUTANG, CONSTSALITA,
RELOP (comparison operator),
MULT, DIV, ADD, SUBT, COMMA, UPLUS, UMINUS, LPAREN, RPAREN, DIKIT,
ID, EOF,
SKIP (comments, EOL, ANG),
ERR