Вы находитесь на странице: 1из 6

MC0073 – System Programming Assignment Set –2

1. Explaining the following:

a. Data Formats:
The Data Format is a base topic for topics describing a defined way of coding information adhering to some Data Model for
storage or transfer. Data format in information technology can mean:

1) Data type, constraint placed upon the interpretation of data in a type system
2) Recording format, a format for encoding data for storage on a storage medium
3) File format, a format for encoding data for storage in a computer file
4) Content format, a format for converting data to information
5) Audio format, a format for processing audio data
6) Video format, a format for processing video data

A data type is a type of data. Of course, that is rather circular definition, and also not very helpful. Therefore, a better definition of a
data type is a data storage format that can contain a specific type or range of values. When computer programs store data in
variables, each variable must be assigned a specific data type. Some common data types include integers, floating point numbers,
characters, strings, and arrays. They may also be more specific types, such as dates, timestamps, boolean values, and varchar
(variable character) formats. Some programming languages require the programmer to define the data type of a variable before
assigning it a value. Other languages can automatically assign a variable's data type when the initial data is entered into the
variable. For example, if the variable "var1" is created with the value "1.25," the variable would be created as a floating point data
type. If the variable is set to "Hello world!," the variable would be assigned a string data type. Most programming languages allow
each variable to store a single data type. Therefore, if the variable's data type has already been set to an integer, assigning string
data to the variable may cause the data to be converted to an integer format.

Data types are also used by database applications. The fields within a database often require a specific type of data to be input. For
example, a company's record for an employee may use a string data type for the employee's first and last name. The employee's
date of hire would be stored in a date format, while his or her salary may be stored as an integer. By keeping the data types uniform
across multiple records, database applications can easily search, sort, and compare fields in different records.

b. Introduction to RISC & CISC machines:


Reduced Instruction Set Computer) A computer architecture that reduces chip complexity by using simpler instructions.
RISC compilers have to generate software routines to perform the equivalent processing performed by more comprehensive
instructions in "complex instruction set computers" (CISC computers).

c. Addressing Modes:
Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The
various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that
architecture identify the operand (or operands) of each instruction. An addressing mode specifies how to calculate the effective
memory address of an operand by using information held in registers and/or constants contained within a machine instruction or
elsewhere. In computer programming, addressing modes are primarily of interest to compiler writers and to those who write code
directly in assembly language. One of a set of methods for specifying the operand(s) for a machine code instruction. Different
processors vary greatly in the number of addressing modes they provide. The more complex modes described below can usually be
replaced with a short sequence of instructions using only simpler modes.
The most common modes are "register" - the operand is stored in a specified register; "absolute" - the operand is stored at
a specified memory address; and "immediate" - the operand is contained within the instruction. Most processors also have indirect
addressing modes, e.g. "register indirect", "memory indirect" where the specified register or memory location does not contain the
operand but contains its address, known as the "effective address". For an absolute addressing mode, the effective address is
contained within the instruction.

Indirect addressing modes often have options for pre- or post- increment or decrement, meaning that the register or
memory location containing the effective address is incremented or decremented by some amount (either fixed or also specified in
the instruction), either before or after the instruction is executed. These are very useful for stacks and for accessing blocks of data.
Other variations form the effective address by adding together one or more registers and one or more constants which may
themselves be direct or indirect. Such complex addressing modes are designed to support access to multidimensional arrays and
arrays of data structures.

The addressing mode may be "implicit" - the location of the operand is obvious from the particular instruction. This would
be the case for an instruction that modified a particular control register in the CPU or, in a stack based processor where operands are
always on the top of the stack.

2. Explaining the following:

a. Basic Assembler Functions:


Often the assembler cannot generate debug information automatically. This means that you cannot get a source report
unless you manually define the neccessary debug information; read your assembler documentation for how you might do that. The
only debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly without
debugging info you can always get report for symbols, and optionally for VMA, through opreport -l or opreport -d, but this works only
for symbols with the right attributes.

Basic assembler directives


START, END, BYTE, WORD, RESB, RESW

Purpose: reads records from input device (code F1). copies them to output device (code 05) at the end of the file, writes EOF on the
output device, then RSUB to the operating system
Data transfer (RD, WD) a buffer is used to store record buffering is necessary for different I/O rates the end of each record is
marked with a null character (0016) the end of the file is indicated by a zero-length record Subroutines (JSUB, RSUB) RDREC,
WRREC
save link register first before nested jump.

Assembler’s functions
Convert mnemonic operation codes to their machine language equivalents Convert symbolic operands to their equivalent machine
addresses. Build the machine instructions in the proper format. Convert the data constants to internal machine representations
Write the object program and the assembly listing

b. Design of Multi-pass(two pass) Assemblers Implementation:


A programming language that is one step away from machine language. Each assembly language statement is translated
into one machine instruction by the assembler. Programmers must be well versed in the computer's architecture, and,
undocumented assembly language programs are difficult to maintain. It is hardware dependent; there is a different assembly
language for each CPU series.
Pass 1
• Assign addresses to all statements in the program
• Save the values assigned to all labels for use in Pass 2
• Perform some processing of assembler directives
Pass 2
• Assemble instructions
• Generate data values defined by BYTE, WORD
• Perform processing of assembler directives not done in Pass 1
• Write the object program and the assembly listing
c. Examples: MASM Assembler and SPARC Assembler:
You can assemble this by typing: "tasm first [enter] tlink first [enter]" or something like: "masm first [enter] link first
[enter] You must have an assembler and the link/tlink program.

.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main

.model small: Lines that start with a "." are used to provide the assembler with infomation. The word(s) behind it say what kind of
info. In this case it just tells the assembler the program is small and doesn't need a lot of memory. I'll get back on this later.
.stack: Another line with info. This one tells the assembler that the "stack" segment starts here. The stack is used to store
temporary data. It isn't used in the program, but it must be there, because we make an .EXE file and these files MUST have a stack.
.data: indicates that the data segment starts here and that the stack segment ends there.
.code : indicates that the code segment starts there and the data segment ends there.

There are very few addressing modes on the SPARC, and they may be used only in certain very restricted combinations. The
three main types of SPARC instructions are given below, along with the valid combinations of addressing modes. There are only a
few unusual instructions which do not fall into these catagories.

1. Arithmetic/Logical/Shift instructions
opcode reg1,reg2,reg3 !reg1 op reg2 -> reg3
2. Load/Store Instructions
opcode [reg1+reg2],reg3

The SPARC code for this subroutine can be written several ways; two possible approaches are given below. (The 'X's in
the center line indicate the differences between the two approaches.)

.global prt_sum | .global prt_sum


prt_sum: | prt_sum:
save %sp,-96,%sp | save %sp,-96,%sp
|
clr %l0 | clr %l0
clr %l1 | clr %l1
mov %i0,%l2 X
loop: | loop:
cmp %l0,%i1 | cmp %l0,%i1
bge done | bge done
nop | nop
X sll %l0,2,%l2
ld [%l2],%o0 X ld [%i0+%l2],%o0
add %l1,%o0,%l1 | add %l1,%o0,%l1
add %l2,4,%l2 X
inc %l0 | inc %l0
ba loop | ba loop
nop | nop
done: | done:

3. Explaining the following:

a. Non-Deterministic Finite Automata:


Regular Expression -> generalized transition diagram -> finite automaton
A finite automaton can be deterministic or non-deterministic, where non-deterministic means more than one transition out
of a state may be possible on the same input symbol. A deterministic automaton can be much bigger than an equivalent non-
deterministic automaton.
A non-deterministic finite automaton (NFA) is a mathematical model that consists of
1. a set of states S
2. a set of input symbols å (the input symbol alphabet)
3. a transition function move that maps state-symbol pairs to sets of states
4. a state s0 that is distinguished as the start (or initial) state
5. a set of states F distinguished as accepting (or final) states
A graphical representation of a NFA is called a transition graph.

b. Generalized Non-Deterministic Finite Automata (GFNA):


GNFA: a generalized non-deterministic FA. A GNFA is simply a NFA whose transitions are labeled with regular expressions
instead of characters from the input alphabet. Although we will use GNFA’s to create our regular expressions from DFAs, you are
NOT responsible for the details of the GNFA

Overview of the Algorithm:


)i The algorithm works by
()1 creating a GNFA from the original DFA
)ii add a new start state, new final state. ε moves from new start state to old start state; ε moves from all of
the old final states to the new single final state.
()1 eliminating the states of the DFA one at a time from the GFNA and replacing the missing states with transitions
that do the same thing
)iii at the end of the algorithm, there will be a single start state and a single end state with one transition in between.
That transition will be labeled with the regular expression that is equivalent to the language recognized by the original DFA.

The final algorithm: Given a DFA M = (Q, Σ , δ , q0, F), we can create a regular expression that describes L(M) as follows:
)iv Create a GNFA from M by:
()1 Adding a new start state, qstart, and a new final state, qfinal to the state set. (Sipser names these states "s"
and "a".)
()2 Add an ε transition from qstart to q0, the old start state.
()3 Add ε transitions from all of the old final states of M to qfinal. Make qfinal the only final state of the GNFA.
)v Eliminate the old states of M from the GNFA one at a time, adjusting the labels on the transitions as we've described
after each elimination.

When nothing remains in the GNFA except qstart and qfinal, the label of the single transition from qstart to qfinal is the regular
expression we are interested in.

c. Moore Machine and Mealay Machine:


4. Explain the following:

a. YACC Compiler-Compiler:
If you have been programming for any length of time in a Unix environment, you will have encountered the mystical
programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by
Vern Paxson and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are
upwardly compatible, so you can use Flex and Bison when trying our examples.

These programs are massively useful, but as with your C compiler, their manpage does not explain the language they
understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison manpage does not
describe how to integrate Lex generated code with your Bison program. YACC can parse input streams consisting of tokens with
certain values. This clearly describes the relation YACC has with Lex, YACC has no idea what 'input streams' are, it needs
preprocessed tokens. While you can write your own Tokenizer, we will leave that entirely up to Lex.

A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers:
programs. Programs written in a programming language for computers are typically *not* ambiguous - they have just one meaning.
As such, YACC does not cope with ambiguity and will complain about shift/reduce or reduce/reduce conflicts.

Example:
%{
#include <stdio.h>
#include <string.h>

void yyerror(const char *str)


{
fprintf(stderr,"error: %s\n",str);
}

int yywrap()
{
return 1;
}

main()
{
yyparse();
}

%}

%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE


b. Interpreters:
A program that executes instructions written in a high-level language. There are two ways to run programs written in a
high-level language. The most common is to compile the program; the other method is to pass the program through an interpreter.
An interpreter translates high-level instructions into an intermediate form, which it then executes. In contrast, a compiler translates
high-level instructions directly into machine language.

Compiled programs generally run faster than interpreted programs. The advantage of an interpreter, however, is that it
does not need to go through the compilation stage during which machine instructions are generated. This process can be time-
consuming if the program is long. The interpreter, on the other hand, can immediately execute high-level programs. For this reason,
interpreters are sometimes used during the development of a program, when a programmer wants to add small sections at a time
and test them quickly. In addition, interpreters are often used in education because they allow students to program interactively.

Both interpreters and compilers are available for most high-level languages. However, BASIC and LISP are especially
designed to be executed by an interpreter. In addition, page description languages, such as PostScript, use an interpreter. Every
PostScript printer, for example, has a built-in interpreter that executes PostScript instructions.

c. Compiler writing tools:


A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the
source language) into another computer language (the target language, often having a binary form known as object code). The most
common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to
a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher
level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to
source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a
change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing,
semantic analysis, code generation, and code optimization.

Purdue Compiler-Construction Tool Set tool:


(PCCTS) A highly integrated lexical analser generator and parser generator by Terence J. Parr , Will E. Cohen and Henry G.
Dietz , both of Purdue University. ANTLR (ANother Tool for Language Recognition) corresponds to YACC and DLG (DFA-based Lexical
analyser Generator) functions like LEX. PCCTS has many additional features which make it easier to use for a wide range of
translation problems. PCCTS grammars contain specifications for lexical and syntactic analysis with selective backtracking ("infinite
lookahead"), semantic predicates, intermediate-form construction and error reporting. Rules may employ Extended BNF (EBNF)
grammar constructs and may define parameters, return values, and have local variables.
Languages described in PCCTS are recognised via LLk parsers constructed in pure, human-readable, C code. Selective backtracking is
available to handle non-LL(k) constructs. PCCTS parsers may be compiled with a C++ compiler. PCCTS also includes the SORCERER
tree parser generator. Latest version: 1.10, runs under Unix, MS-DOS, OS/2, and Macintosh and is very portable.

If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for
your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task.
These compiler construction kits, parser generators, lexical analyzer / analyser (lexers) generators, code optimzers (optimizer
generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code
for your software.

Вам также может понравиться