Вы находитесь на странице: 1из 85

Systems programming (or system programming) is the activity of programming system software.

The primary distinctive characteristic of systems programming when compared to application


programming is that systems programming requires a greater degree of hardware awareness.

What is Line Editor?

The editor responsible for all textual issues not in the purview of the copy editor. The line editor
reads the text for sense, clarity, tone, flow, logic, quality of expression, redundancy, good order,
conciseness, and consistency, covering some of the same ground as the copy editor but with a
view to the text as a whole (as opposed to the copy editor, whose focus tends to be on individual
words and sentences).

A line editor is a text editor computer program that is oriented around lines.Now considered
extremely old-fashioned, they stem from the days when a computer operator would be sitting in
front of a teletype (essentially a printer with a keyboard), so there was no screen and no way to
move a cursor around a document.

Assembly Language
© Copyright Brian Brown, 1988-2000. All rights reserved.
| Notes | Home Page |

THE HISTORY OF ASSEMBLY LANGUAGE PROGRAMMING, Part 1

Early computer systems were literally programmed by hand. Front panel switches were
used to enter instructions and data. These switches represented the address, data and
control lines of the computer system.To enter data into memory, the address switches
were toggled to the correct address, the data switches were toggled next, and finally the
WRite switch was toggled. This wrote the binary value on the front panel data switches to
the address specified. Once all the data and instruction were entered, the run switch was
toggled to run the program.

The programmer also needed to know the instruction set of the processor. Each
instruction needed to be manually converted into bit patterns by the programmer so the
front panel switches could be set correctly. This led to errors in translation as the
programmer could easily misread 8 as the value B. It became obvious that such methods
were slow and error prone.

With the advent of better hardware which could address larger memory, and the increase
in memory size (due to better production techniques and lower cost), programs were
written to perform some of this manual entry. Small monitor programs became popular,
which allowed entry of instructions and data via hex keypads or terminals. Additional
devices such as paper tape and punched cards became popular as storage methods for
programs.
Programs were still hand-coded, in that the conversion from mnemonics to instructions
was still performed manually. To increase programmer productivity, the idea of writing a
program to interpret another was a major breakthrough. This would be run by the
computer, and translate the actual mnemonics into instructions. The benefits of such a
program would be

 reduced errors
 faster translation times
 changes could be made easier and faster

As programmers were writing the source code in mnemonics anyway, it seemed the
logical next step. The source file was fed as input into the program, which translated the
mnemonics into instructions, then wrote the output to the desired place (paper-tape etc).
This sequence is now accepted as common place.

The only advances have been the increasing use of high level languages to increase
programmer productivity.

Assembly language programming is writing machine instructions in mnemonic


form, using an assembler to convert these mnemonics into actual processor
instructions and associated data.

The disadvantages of assembly language programming are

 the programmer requires knowledge of the processor architecture and instruction


set
 many instructions are required to achieve small tasks
 source programs tend to be large and difficult to follow
 programs are machine dependent, requiring complete rewrites if the hardware is
changed

THE PROGRAM TRANSLATION SEQUENCE


developing a software program to accomplish a particular task, the implementor chooses
an appropriate language, develops the algorithm (a sequence of steps, which when carried
out in the order prescribed, achieve the desired result), implements this algorithm in the
chosen language (coding), then tests and debugs the final result.

here is also a probable maintenance phase also associated. The chosen language will
undoubtably need to be converted into the appropriate binary bit-patterns which make
sense to the target processor (the processor on which the software will be run). This
process of conversion is called translation.

The following diagram illustrates the translation sequence necessary to generate machine
code from specific languages.
ASSEMBLY LANGUAGE PROGRAMMING
Asemblers are programs which generate machine code instructions from a source code
program written in assembly language. The features provided by an assembler are,

 allows the programmer to use mnemonics when writing source code programs.
 variables are represented by symbolic names, not as memory locations
 symbolic code is easier to read and follow
 error checking is provided
 changes can be quickly and easily incorporated with a re-assembly
 programming aids are included for relocation and expression evaluation

In writing assembly language programs for micro-computers, it is essential that a


standardized format be followed. Most manufacturers provide assemblers, which are
programs used to generate machine code instructions for the actual processor to execute.

The assembler converts the written assembly language source program into a format
which run on the processor. Each machine code instruction (the binary or hex value) is
replaced by a mnemonic. A mnemonic is an abbreviation which represents the actual
instruction.

+----------+---------+-----------------+
| Binary | Hex | Mnemonic |
+----------+---------+-----------------+
| 01001111 | 4F | CLRA | Clears the A accumulator
+----------+---------+-----------------+
| 00110110 | 36 | PSHA | Saves A acc on stack
+----------+---------+-----------------+
| 01001101 | 4D | TSTA | Tests A acc for 0
+----------+---------+-----------------+
Mnemonics are used because they

 are more meaningful than hex or binary values


 reduce the chances of making an error
 are easier to remember than bit values

Assemblers also accept certain characters as representing number bases and addressing
modes.

$ prefix or h suffix for hexadecimal


$24 or 24h

D for decimal numbers


24D 67

B for binary numbers


0101111B

O or Q for octal numbers


377O 232Q

# for immediate addressing


LDAA #$34

,X for indexed addressing


LDAA 01,X

Assembly language statements are written one per line. A machine code program thus
consists of a sequence of assembly language statements, where each statement contains a
mnemonic. Each line of an assembly language program is split into four fields, as shown
below

LABEL OPCODE OPERAND COMMENTS

The label field is optional. A label is an identifier (or text string symbol). Labels are used
extensively in programs to reduce reliance upon programmers remembering where data
or code is located. A label can be used to refer to<

a memory location the value of a piece of data the address of a program, sub-
routine, code portion etc.
The maximum length of a label differs between assemblers. Some accept up to 32
characters long, others only four characters. A label, when declared, is suffixed by
a colon, and begins with a valid character (A..Z). Consider the following example.

START: LDAA #24H

Here, the label START is equal to the address of the instruction LDAA #24H.
The label is used in the program as a reference, eg,

JMP START

This would result in the processor jumping to the location (address) associated
with the label START, thus executing the instruction LDAA #24H immediately
after the JMP instruction. When a label is referenced later on in the program, it is
done so without the colon suffix.

An advantage of using labels is that inserting or re-arranging code statements do


not necessitate re-working actual machine instructions. A simple re-assembly is
all that is required. In hand-coding, such changes can take hours to perform.

Each instruction consists of an opcode and possible one or more operands. In the
above instruction

JMP START

the opcode is JMP and the operand is the address of the label START.

The opcode field contains a mnemonic. Opcode stands for operation code, ie, a
machine code instruction. The opcode may also require additional information
(operands). This additional information is separated from the opcode by using a
space (or tab stop).

The operand field consists of additional information or data that the opcode
requires. In certain types of addressing modes, the operand is used to specify

o constants or labels
o immediate data
o data contained in another accumulator or register
o an address

Examples of operands are


TAB ; operand specified by opcode
LDAA 0100H ; two byte operand
LDAA START ; label operand
LDAA #0FH ; immediate operand

The comment field is optional, and is used by the programmer to explain how the
coded program works. Comments are preceded by a semi-colon. The assembler,
when generating instructions from the source file, ignores all comments. Consider
the following examples,

; H means hexadecimal valuesORG


0100H ;This program starts at address 0100 hex
STATUS: DFB 23H ;This byte is identified as STATUS, and
is
;initialized to a value of 23 hex
CODE: LDAA STATUS ;The label called CODE is identified as a
;machine code instruction which loads the
;A accumulator with the contents of the
;memory location associated with the label
;STATUS, ie, the value 23
JMP CODE ;Jump to the address associated with CODE

Note that the programmer does not need to worry about bit patterns, hex values,
and the addresses of STATUS or CODE. The assembler, when fed the above
program, will generate the correct code. The code output from the assembler will
be,

Memory location Byte value


0100 23
0101 B6
0102 01
0103 00
0104 7E
0105 01
0106 01

Location 0100 holds the value associated with the label STATUS
Locations 0101 to 0103 perform the LDAA STATUS instruction
Locations 0104 to 0106 perform the JMP CODE instruction
The statement ORG 0100H in the above program is not a machine code
instruction. It is an instruction to the assembler, which instructs the assembler to
generate the code to run at the designated origin address. Instructions to
assemblers are called pseudo-ops. These are used for

o reserving memory for data variables, arrays and structures


o determining the start address of the program
o determining the entry address of the program
o initializing variable values

The assembler does not generate any machine code instructions for pseudo-ops or
comments. Assemblers scan the source program, generating machine instructions.
Sometimes, the assembler reaches a reference to a variable which has not yet been
defined. This is referred to as a forward reference problem. The assembler can
tackle this problem in a number of ways. It is resolved in a two pass assembler as
follows,

On the first pass, the assembler simply reads the source file, counting up the
number of locations that each instruction will take, and builds a symbol table in
memory which lists all the defined variables cross-referenced to their associated
memory address. On the second pass, the assembler substitutes opcodes for the
mnemonics, and variable names are replaced by the memory locations obtained
from the symbol table.

OPERATION OF A TWO-PASS ASSEMBLER


Consider the following source code program for a hypothetical computer. The
program computes the so-called Fibonacci numbers, printing all such numbers
up to that specified by LIMIT.

Line Label Operation Operand 1 Operand 2


1 COPY ZERO OLDER
2 COPY ONE OLD
3 READ LIMIT
4 WRITE OLD
5 FRONT: LOAD OLDER
6 ADD OLD
7 STORE NEW
8 SUB LIMIT
9 BRPOS FINAL
10 WRITE NEW
11 COPY OLD OLDER
12 COPY NEW OLD
13 BR FRONT
14 FINAL: WRITE LIMIT
15 STOP
16 ZERO: CONST 0
17 ONE CONST 1
18 OLDER SPACE
19 OLD SPACE
20 NEW SPACE
21 LIMIT SPACE

The instruction set of the computer is as follows,

Operation Code Number of


Symbolic Machine Length Operands Action
ADD 02 2 1 ACC <-
ACC + OPD1
BR 00 2 1 Branch to
OPD1
BRPOS 01 2 1 Branch to
OPD1 if ACC> 0
COPY 13 3 2 OPD2 <-
OPD1
LOAD 03 2 1 ACC <-
OPD1
READ 12 2 1 OPD1 <-
input stream
STOP 11 1 0 Halt
execution
STORE 07 2 1 OPD1 <-
ACC
SUB 06 2 1 ACC <-
(ACC - OPD1)
WRITE 08 2 1 output
stream <- OPD1

The functions that the assembler will perform in translating the program are,

o replace symbolic addresses by numeric addresses


o replace symbolic operation codes by machine operation codes
o reserve storage for instructions and data
o translate constants into machine representation
IMPLEMENTATION
The assembler uses two counters to keep track of the machine language program.
One counter, called the location counter, keeps track of the physical address
location being used, and will initially be set to zero for this program (or the value
designated by the ORG directive).

The other counter is the line counter, which keeps track of the line number being
processed. After each source line has been examined on the first pass, the location
counter is incremented by the correct number of bytes.

When the assembler processes line 1 of the source, it cannot replace the symbols
ZERO and OLDER by their addresses because those symbols have not yet been
defined. This is called a forward reference problem.

The assembler will place the symbols into the symbol table, determine the number
of bytes to advance by altering the contents of the location counter to 3, then
proceed to process the next source line. After processing line 3 of the source, the
current state will be,

Line Address Label Operation OPD1 OPD2


1 0 COPY ZERO OLDER
2 3 COPY ONE OLD
3 6 READ LIMIT

and the contents of the symbol table will be

Symbol Address
ZERO ---
OLDER ---
ONE ---
OLD ---
LIMIT ---
Location Counter: 8
Line Counter: 4

The symbol table currently holds five symbols, none of which yet has an address.
During processing of line 4, the assembler picks up the symbol OLD. It
establishes that it is already in the symbol table, so does not enter it again.

During line 5, the assembler encounters FRONT, and it is entered into the symbol
table. The assembler also knows its address (10), so it is also placed into the table.
After processing line 9 of the program, the current state is,
Line Address Label Operation OPD1 OPD2
1 0 COPY ZERO OLDER
2 3 COPY ONE OLD
3 6 READ LIMIT
4 8 WRITE OLD
5 10 FRONT LOAD OLDER
6 12 ADD OLD
7 14 STORE NEW
8 16 SUB LIMIT
9 18 BRPOS FINAL

and the contents of the symbol table will be

Symbol Address
ZERO ---
OLDER ---
ONE ---
OLD ---
LIMIT ---
FRONT 10
NEW ---
FINAL ---
Location Counter: 20
Line Counter: 10

The first pass continues, building up the symbol table. When the assembler
determines the address of the various symbols in lines 16 to 21, these are entered
into the table. At the end of pass 1, the symbol table should list all declared
symbols as well as their addresses.

The state at the end of the first pass is,

Line Address Label Operation OPD1 OPD2


1 0 COPY ZERO OLDER
2 3 COPY ONE OLD
3 6 READ LIMIT
4 8 WRITE OLD
5 10 FRONT LOAD OLDER
6 12 ADD OLD
7 14 STORE NEW
8 16 SUB LIMIT
9 18 BRPOS FINAL
10 20 WRITE NEW
11 22 COPY OLD OLDER
12 25 COPY NEW OLD
13 28 BR FRONT
14 30 FINAL WRITE LIMIT
15 32 STOP
16 33 ZERO CONST 0
17 34 ONE CONST 1
18 35 OLDER SPACE
19 36 OLD SPACE
20 37 NEW SPACE
21 38 LIMIT SPACE

and the contents of the symbol table will be

Symbol Address
ZERO 33
OLDER 35
ONE 34
OLD 36
LIMIT 38
FRONT 10
NEW 37
FINAL 30
Location Counter: 39
Line Counter: 22

Code generation is performed on the second pass. Before starting, the line and
location counters will be reset to 1 and 0 respectively. The assembler now
generates one line of object code for each source line. Line one is translated to

Address Length Opcode OPD1 OPD2


00 3 13 33 35

Successive lines are translated in the same manner. On encountering the label
FRONT in line 5, the assembler ignores it. Lines 16 to 21, where space is reserved
for variables, the assembler may leave these undefined, or initialize them to zero.
The object code generated by the second pass will be,

Address Length Opcode OPD1 OPD2


00 3 13 33 35
03 3 13 34 36
06 2 12 38
08 2 08 36
10 2 03 35
12 2 02 36
14 2 07 37
16 2 06 38
18 2 01 30
20 2 08 37
22 3 13 36 35
25 3 13 37 36
28 2 00 10
30 2 08 38
32 1 11
33 1 00
34 1 01
35 1 xx
36 1 xx
37 1 xx
38 1 xx

© Copyright Brian Brown, 1988-2000. All rights reserved.

ASSEMBLY LANGUAGE PROGRAMMING, Part 2

ASSEMBLER DIRECTIVES
As mentioned previously, assembler directives are instructions to the assembler, and are
not translated into machine instructions. The use of directives gives the programmer
some control over the operation of the assembler, increasing flexibility in the way
programs are written. The following is a list of the common pseudo-ops.

 EQUATE
is used to make programs easier to write. The EQU directive creates absolute
symbols and aliases by assigning an expression or value to the declared variable
name. Its format is,

name: EQU expression

Consider the following statement.

NUMBER1: EQU 36H


The assembler will replace every occurrence of the label NUMBER1 with the
value its been equated to, ie, 36 hexadecimal. The statement

LDAA #NUMBER1

will be interpreted by the assembler as

LDAA #36H

An absolute symbol represents a 16bit value; an alias is a name that represents


another symbol. The declared name must be unique, one that has not been
previously declared. The redefining of a previous symbol is normally not allowed.

NUM1: EQU 20H


... ...
NUM1: EQU 30H ; error

 ORIGIN
This specifies the address to be used for the generation of code. Subsequent
instructions and data address's begin at the new value. Normally, it is used to set
the start address of the program, but can also set the location counter to the value
specified.

ORG 120H
LDAA #FFH

The statement LDAA #FFH begins at byte 120h.

ORG $ + 2
start: LDAA #34H

The instruction associated with the label start is declared to start at the address
2bytes beyond the current value of the location counter (specified by $).

 CPU TYPES
This directive is used by multi-purpose assemblers to specify which target
processor is being used. The format for CRS8 is

CPU cpuname

where cpuname consists of a valid processor name, eg

CPU 6802

This directive appears before any machine instructions.


 OUTPUT FORMATS
This directive is used to select the output format for the generated machine
instructions. Several output formats are available for downloading into EPROM
or a target system.

HOF recordtype

where recordtype is one of the following

MOT ; motorola formats


INT ; intel formats
TEK ; tektronix formats

This directive appears before any machine instructions.

 BYTE STORAGE
The directive used to allocate and initialize bytes (8bits) of storage is

DFB definebyte

Its format is,

name: DFB initialvalue,,,

The name portion is optional. Consider the following examples for CRS8.

value1: DFB 16
form: DFB 6*2
text: DFB "Enter your name: "

In the first example, the label value1 is assigned a single byte of storage, which is
initialized to 16 decimal.The second example allocates a single byte of storage for
the label form, and initializes it equal to 12. The last example allocates 17 bytes
of storage for the label text. The first byte will be initialized to E, whilst the last
byte is initialized to an ASCII space.

 WORD STORAGE
The directive used to allocate and initialize words (two bytes) of memory storage
is,

DWM define word, most significant byte first


DWL define word, least significant byte first

Its format is,

name: DWM initialvalue,,,


The name portion is optional.

DWM 1687H
mess: DWM 'ab'

The first example allocates one word of storage, having the values 16H followed
by 87H. The second example defines mess as a word initialized with the character
values a followed by b. The b will be placed in the low-order byte, and the a will
be placed in the high order byte. If only one character is specified, the high-order
byte will contain 0.

Strings when using the DW directive must not contain more than two characters.

 DATA STORAGE RESERVATION


This directive is used to reserve storage for later use.

array: DFS 100

This example allocates 100 storage bytes, associating the first byte with the label
array. The value of these bytes is indeterminate at this point. The 100 bytes will
be allocated relative to the current location counter.

 END and Optional Start Address


The END directive specifies the end of the assembly language source listing. It
may be followed by an optional entry address. The optional entry address is used
by LOADERS to initialize the Program Counter before running the program. If no
entry address is specified, execution will start at the first location allocated by the
assembler.

END

In this example, the END directive informs the assembler that there is no more
source statements.

ORG 0100H
start: LDAA #3FH
JMP start
END start

In this example, the END directive also specifies that the entry point to the
program is the label start, whose address is 0100H.

SAMPLE PROGRAM FOR MC6802 USING CRS8


The following source file has been named MC6802.ASM
CPU 6802 ; 6802 processor
HOF MOT ; Motorola Records
ORG 0100H ; Start of Data
Source: DFB 'Hello and Welcome'
Length: EQU $ - Source ;Length of Source
Destin: DFS Length ; Buffer which has same
; length as Source
ORG 0120H ; Start of Code
Entry: LDX #Source ; Point Index Reg to
; Source string
LDAB #Length ; Number of characters to move
Loop: LDAA 0,X
STAA Length,X
INX
DECB
BNE Loop
Fin: JMP Fin
END Entry

This program is assembled by typing the following command

CRS8 MC6802

It is not necessary to type the extension .ASM, and CRS8 will produce two output files.

MC6802.PRN ; a list file showing the code generated


MC6802.HEX ; the record file for downloading to the
; target system or Eprom programmer

The listing file MC6802.PRN looks like

C:6802.TBL CPU 6802 ; 6802 processor


C:6802.HEX HOF MOT ; Motorola Records

0100 ORG 0100H ; Start of Data


0100 48656C6C6F Source: DFB 'Hello and Welcome'
0011 = Length: EQU $ - Source ; Length of Source
0111 Destin: DFS Length ; Buffer which has
same
; length as Source
0120 ORG 0120H ; Start of Code
0120 CE0100 Entry: LDX #Source ; Point Index Reg to
; Source string
0123 C611 LDAB #Length ; Number of characters
; to move
0125 A600 Loop: LDAA 0,X
0127 A711 STAA Length,X
0129 08 INX
012A 5A DECB
012B 26F8 BNE Loop
012D 7E012D Fin: JMP Fin
0130 END Entry

The first column is the address, the second the instructions or data, and then the
mnemonics and comments. This listing is used by the programmer to verify that the
assembler has produced the correct instructions and data at the correct addresses. We can
clearly see that it has correctly interpreted the address of Source in the statement LDX
#Source as the bytes CE 0100.

The record format file MC6802.HEX looks like

S00D0000433A363830322E48455892
S113010048656C6C6F20616E642057656C636F6D1D
S10401106585
S1130120CE0100C611A600A711085A26F87E012D9F
S9030120DB

The format of a motorola record is

Digit
0,1 Record Type = S0, S1 or S9
2,3 Number of bytes in Record which includes the load address
and checksum bytes
4,5,6,7 Load Address 8 to n-2 Data or coded instructions
n-1 to n Checksum value

The S0 record identifies the program name


The S1 record identifies the data and coded instructions
The S9 record identifies the program entry point
eg
S1 04 0110 65 85
^ ^ ^ ^ ^checksum
^ ^ ^ ^ data
^ ^ ^ load address
^ ^ number of bytes in record
^ record type

The file is then downloaded to the target system.

ELEMENTARY DATA TYPES


Most programming languages support data types like characters and integers. At the
processor level, some instructions support integer type operations such as multiply or
divide (except 6802).The programmer is responsible for keeping track of data types. The
processor treats all data the same, and if the program goes astray, can interpret data as
instructions and vise versa.

Lets look at how elementary data is represented by the programmer for use in assembly
language programs.

 Characters
are single eight bit values represented using the ASCII code. Values range from 0
to 127. The statement

 Letter: DFB 'A'

associates one byte of storage to the variable Letter, initializing it to the ASCII
character 'A' (41H).

 Character Strings
are multiple bytes, each byte holding an ASCII character. The statement

 String: DFB 'Hello there.'

allocates 12 bytes of storage space. The variable String has the address of the first
byte, which has been allocated the character 'H' (48H).

 Integers
are stored as 16 bit values (two bytes or one word), and are signed or unsigned.
The statement

 Number1: DWM -17D

allocates a word of storage for the variable Number1, initializing the word to -17
decimal. Some processors have different instructions for operations on signed and
unsigned integers. If the processor cannot handle a 16bit value (ie, has only eight
bit registers), software will need to be written to do any comparisons on these
types.
 Character Arrays
are essentially text strings. Each element of the array has storage space for an
ASCII character. Strings in some HLL's are terminated with an End-Of-String
symbol (in C it is ASCII 00h, in PCDOS it is ASCII '$'). The following statement

 Digits: DFS 10

allocates 10 locations for a character based array called Digits. The following
code routine initializes the Digits array (each successive element) to the digits 0
to 9.

ORG 0120H
start: LDAA #30H ; ASCII '0'
LDAB #11 ; ten digits
LDX #Digits
loop: STAA 0,X
INX ; next element
INCA ; next digit
DECB
BNE loop
exit: ....

Typical Array Operations


The following routines are typical of functions which are performed on character based
arrays.

 Copy
This copys a source string to a destination area. The declaration of the routines is,
copystr( src, dest )where src and dest represent the address of the source and
destination strings. Writing this type of routine is ideally suited to a processor
which has more than one Index or Base register. The MC6802, having only one
Index register, presents a small problem. The following code shows this program
implemented for the MC6802.

 CPU 6802
 HOF MOT
 ORG 100H
 str1: DFS 10
 st1len: EQU $ - str1
 str2: DFS 10

 ORG 120
 HSRCINX:DFS 02H ; pointer for src string
 DSTINX: DFS 02H ; pointer for dest string
 start: LDX #str1 ; store address of str1
 STX SRCINX
 LDX #str2 ; store address of str2
 STX DSTINX
 jsr initstr1 ; initialise str1
 jsr copystr ; copy str1 to str2
 exit: bra exit

 initstr1: LDAA #41H ; character 'A'
 LDAB #11 ; elements 1 - 30
 LDX #str1 ; point to str1
 lp1: STAA 0,X ; store character
 INX ; next element
 INCA ; next value
 DECB ; loop around
 BNE lp1
 LDAA #00 ; null terminator
 STAA 0,X
 RTS

 copystr: LDX SRCINX ; pick up source pointer
 cplp2: LDAA 0,X ; get source character
 CMPA #00H ; eostr?
 BEQ cpstrq ; yes, then exit
 INX ; else inc source pointer
 STX SRCINX ; store source ptr
 LDX DSTINX ; get destination pointer
 STAA 0,X ; store character
 INX ; inc dest ptr
 STX DSTINX ; store dest ptr
 LDX SRCINX ; reload source pointer
 BRA cplp2 ; repeat
 cpstrq: LDX DSTINX ; Null terminate dest str
 CLR 00,X
 RTS
 END start1

 String Length
Returns the length of a terminated string. The following code shows this routine
implemented for the MC6802. Strlen is entered with the Index register pointing
to the string, and returns the length of the string in the B accumulator.

 CPU 6802
 HOF MOT
 EOFSTR: EQU 00H
 ORG 100H
 str1: DFB 'Hello and Welcome.', 00H
 ORG 120H
 start: LDX #str1 ; point to string jsr strlen
 ; find length of str1
 exit: bra exit
 strlen: LDAB #00 ; character count

 strlp1: LDAA 0,X ; read character
 CMPA #EOFSTR ; is it end of string
 BEQ strexit ; yes, then exit
 INX ; no, inc str ptr
 INCB ; inc character count
 BRA strlp1 ; and repeatstr
 exit: RTS ; acc B has length
 END start2

 Search for first occurrence of a character
This routine returns the address of the specified character found in the string. If
the address returned is zero, it indicates the character was not found. The
following code shows this routine implemented for the MC6802. Strpos is
entered with the Index register pointing to the search string, and the A
accumulator with the character search value.

 CPU 6802
 HOF MOT
 EOFSTR: EQU 00H
 ORG 100H
 str1: DFB 'Hello and Welcome.', 00H
 ORG 120H
 start: LDAA #6FH ; ASCII 'o'
 LDX #str1 ; point to src string jsr strpos
 ; find first 'o' in str1
 exit: bra exit

 strpos: CMPA 0,X ; is char = search value
 BEQ strex2 ; yes then exit
 CMPA #EOFSTR ; is it end of string
 BEQ strex1 ; yes, then exit
 INX ; no, inc str ptr
 BRA strpos ; and repeat
 strex1: LDX #0000H ; not found
 strex2: RTS ; Index reg has address
 END start3

 Search for Substring
This routine is used to find the starting address of a substring within a larger
string. It accepts a source string pointer, and a pointer to a substring. It returns the
address of the substring, if not found it returns address zero.

 Substring Insertion/Replacement
This routine inserts a substring into an existing string. Most versions overwrite
existing characters. It accepts a pointer to the source string, a pointer to the
substring to insert, and a numeric value representing the start position where
insertion should begin. No characters should overwrite the end-of-string
terminator, or be written to memory locations after the terminator. The routine
should return a numeric value representing the number of characters inserted.

 String Reverse
This routine reverses the characters in a string. It accepts the address of the string.
Hello becomes olleH

 String to Uppercase
This routine converts all characters of a string to uppercase. It accepts the address
of the string. Hello becomes HELLO

 String to Lowercase
This routine converts all characters of a string to lowercase. It accepts the address
of the string. Hello becomes hello

ARRAY INDEX CALCULATIONS


This refers to calculating the address of a specified element within an array. In single
dimensioned arrays, this is equivalent to

BASE_ADDRESS + (ELEMENT_NUMBER *
NUMBER_OF_BYTES_PER_ELEMENT)

In multi-dimensioned arrays, this is equivalent to

BASE_ADDRESS + (Col_Num + (Row_num * Num_Col_per_row)) *


Num_Bytes_per_Element)
ASSEMBLY LANGUAGE PROGRAMMING, Part 3

IMPLEMENTATION OF HIGH LEVEL LANGUAGE CONSTRUCTS


In High Level Languages such as PASCAL and BASIC, several constructs are available
which help to implement programs. You should know how these constructs are
implemented in assembly language.The constructs that we will now deal with involve
SELECTION and ITERATION. Both types of constructs are implemented using the
conditional BRANCH instructions of the processor.

These types of instructions test the state of the various flags of the status register. All
variables are memory based. Any manipulation of variables normally involves three
steps,

1. Load the variable into a register


2. Perform the operation
3. Store the result back into the variables location

 SIMPLE STATEMENT ASSIGNMENTS


Assigning a constant value to a variable
1. Load the constant into a register
2. Store the register to the variables memory location eg,
3.
4. X1 := 20;
5.
6. LDX #20
7. STX X1

Use eight bit registers for bytes/characters, and 16bit registers for integers. eg,

Letter := 'Y';

LDAA #'Y'
STAA Letter

Assigning a variables value to another variable

8. Load the second variable into a register


9. Store the register into the first variables memory location eg,
10.
11. X1 := Y;
12.
13. LDX Y
14. STX X1
15.
 Addition

 X1 := Y + 7 ; Calculate the right side first.
 ; Load Y into a register, use an immediate add with 7,
 ; then store into variable X1 (following example uses
 ; BYTE integers)

 LDAA Y
 ADDA #7
 STAA X1

 eg,
 X1 := Y + Z ; Calculate the right side first. Load Y and Z into
 ; registers, add the two registers together, store the
 ; result into variable X1.
 LDAA Y
 LDAB Z
 ABA
 STAA X1

 eg,
 X1 := Y + Z + 3 + T ; Calculate the right hand side first. If the
 ; number of variables/constants exceed the number of
 ; registers available, parenthesise and calculate portions
 ; at a time. Finally, store the result back into the left
 ; side variable X1.
 LDAA T
 ADDA #3 ;3+T
 LDAB Z
 ABA ;+Z
 LDAB Y
 ABA ;+Y
 STAA X1

 Subtraction

 X1 := Y - 7 ; Calculate the right side first. Load Y into a register,
 ; use an immediate subtract with 7, then store into
 ; variable X1.
 LDAA Y
 SUBA #7
 STAA X1

 eg,
 X1 := Y - Z ; Calculate the right side first. Load Y and Z into
 ; registers, subtract the two registers together, store
 ; the result into variable X1.
 LDAA Y
 LDAB Z
 SBA ; subtract bx from ax, Z from Y
 STAA X1

 eg,
 X1 := Y - Z - 3 - T ; Calculate the right hand side first. If the
 ; number of variables/constants exceed the number of
 ; registers available, parenthesise and calculate portions
 ; at a time. Finally, store the result back into the left
 ; side variable X1. Take special note of the order of
 ; evaluation, in this case Z is subtracted from Y, 3
 ; subtracted from that and so on.
 LDAA Y
 LDAB Z
 ABA ;Y-Z
 SUBA #3 ;-3
 LDAB T
 SBA ;-T
 STAA X1

 Compound Statements

 X1 := Y + 4 - Z * 7 ; Calculate the right hand side first. If the
 ; number of variables/constants exceed the number of registers available,
 ; parenthesise and calculate portions at a time. Finally, store the result
 ; back into the left side variable X1. Take special note of the order of
 ; evaluation, in this case multiplication occurs before addition or
 ; subtraction.
 ; The statement can be interpreted as, X1 := (Y + 4) - ( Z * 7 )
 ; or X1 := Y + (4 - Z) * 7
 ; Assuming that the real intention is the first grouping, first calculate
 ; the term (Z * 7), then the term (Y + 4), and subtract the first term
 ; from the second, storing the result into X1.

 LDAA Z
 LDAB #7 ;
 mult A,B ; (Z * 7)
 LDAB Y
 ADDB #4 ; (Y + 4)
 ABA
 STAA X1

 WHERE THE EXPRESSION IS COMPLEX AND INVOLVES A LARGE
NUMBER OF TERMS, THIS
 WILL REQUIRE THE USE OF TEMPORARY STORAGE LOCATIONS
FOR STORING
 INTERMEDIATE RESULTS.

 IF STATEMENTS
This involves the use of an appropriate branch false instruction after a comparison
test to the end of the if body.
1. An IF label with a comparison test
2. Branch false to an endif label
3. The if body statements preceed the endif label

 if: ; comparison test
 ; jump false to endif
 ; if body statements
 endif:

 Comparing a variable and a constant
1. Load the variable into a register
2. Compare the register against the constant
3. Branch false to a label after the body of the if statement

 IF X1 < 10 then Y := Z * 2;

 if: LDAA X
 CMPA #10
 BCC endif
 LDAA Z ; if body, Y := Z * 2
 LDAB 2 ; mult a, b
 STAA Y
 endif:

 Comparing a Variable against another Variable
1. Load the second variable into a register (t2)
2. Load the first variable into a register (t1)
3. Compare the two registers (t1-t2 > t1)
4. Branch false to a label after the body of the if statement

 IF X1 >= Z then Y := X;

 if: LDAB Z
 LDAA X
 CBA
 BLT endif
 LDAA X1 ; if body, Y := X1;
 STAA Y
 endif:

 Comparing a Variable for Logic 1 or TRUE
1. Load the variable into a register (t1)
2. Compare the register against zero
3. Branch equal to a label after the body of the if statement

 IF X1 then Y := X / 2;

 if: LDAA X
 CMPA #0
 BEQ endif
 LDAA X1 ; if body, Y:=X1 / 2;
 LDAB #2 ; div A, B
 STAA Y
 endif:

 Comparing a Variable for Logic 0 or FALSE
1. Load the variable into a register (t1)
2. Compare the register against zero
3. Branch above or greater to a label after the body of the if statement

 IF X1 then Y := X + 2;

 if: LDAA X1
 CMPA #0
 BHI endif
 LDAA X1 ; if body, Y:=X1 + 2;
 ADDA #2
 STAA Y
 endif:

 WHERE THE CONDITION OF THE IF STATEMENT IS EXPRESSED
NEGATIVELY, USING A
 NOT INSTRUCTION, THEN A BRANCH TRUE INSTRUCTION SHOULD
BE USED INSTEAD OF
 A BRANCH FALSE INSTRUCTION. eg,

 IF X1 = 2 then Y := 4;

 if: LDAA X1
 CMPA #2
 BNE endif
 LDAA #4
 STAA Y
 endif:


 IF NOT X1 = 2 then Y := 4;

 if: LDAA X
 CMPA #2
 BEQ endif
 LDAA #4
 STAA Y
 endif:

 IF THEN ELSE STATEMENTS
This involves an extension to the previous IF body. The conditional false branch
now jumps to an else clause, and the if body jumps unconditionally to the end of
the if else statement.

 if: ; comparison
 ; branch false to else clause
 ; if body statements
 jmp endif
 else:
 ; else statements
 ;
 endif:

The same principles apply to the various forms that expressions can take. eg,

IF X = 2 THEN Y := Y + 4 ELSE Z := 0;

if: LDAA X
CMPA #2
BNE else
LDAA Y
ADDA #4
STAA Y
JMP endif
else: LDAA #0
STAA Z
endif:

 WHILE LOOPS
This involves the use of a conditional test at the entry of the while body, which
branches or jumps false to an endwhile label. The last statement in the while body
is a jump unconditional to the start of the while body.
1. Use a while label, comparison test
2. Branch false to an endwhile label
3. The last statement of the while body is a jump to the while label

 while: ; comparison test
 ; branch false to endwhile
 ; while body statements
 jmp while
 endwhile:


 WHILE X < 10 DO
 BEGIN
 Y := Y + X;
 X := X + 1
 END;


 while: LDAA X
 CMPA #10
 BHI endwhile
 LDAB X ; Y := Y + X
 LDAA Y
 ABA
 STAA Y
 LDAA X ; X := X + 1
 ADDA #1
 STAA X
 JMP while
 endwhile:

 PREVIOUS RULES CONCERNING NEGATION ALSO APPLY. NOTE
THAT ALL PREVIOUS
 FUNDAMENTALS OF STATEMENT ASSIGNMENT AND TESTING OF
VARIABLES AGAINST EACH
 OTHER OR CONSTANTS ARE STILL BEING RIGIDLY APPLIED.

 FOR NEXT LOOPS
1. Initialise the loop variable
2. Use a for label, Perform the comparison test with the final value
3. Branch false to an endfor label
4. Inside the for loop body, the last statement, should adjust the loop
variable, and use an unconditional branch back to the for label

 initfor: ; initialise loop variable
 for: ; comparison test
 ; jump false endfor
 ; for body statements
 ; adjust loop variable for next step jmp for
 endfor:


 FOR X := 1 to 10 do
 BEGIN
 Y := Y + X
 END;

 initfor: LDAA #1
 STAA X
 for: LDAA X
 CMPA #10
 BHI endfor
 LDAB X ; Y := Y + X
 LDAA Y
 ABA
 STAA Y
 LDAA X ; NEXT X
 ADDA #1
 STAA X
 JMP for
 endfor:

 PREVIOUS RULES CONCERNING NEGATION ALSO APPLY. NOTE
THAT ALL PREVIOUS
 FUNDAMENTALS OF STATEMENT ASSIGNMENT AND TESTING OF
VARIABLES AGAINST EACH
 OTHER OR CONSTANTS ARE STILL BEING RIGIDLY APPLIED.

6802 Processor Examples

1. The IF statement
In comparing the value of operands, consider the following example.
2.
3. IF X = 2 THEN Y = X
4.

The compare statement must be coded in such a way as to compare the value of X
against the constant 2. As the variable X is stored in memory, the programmer
should first load a register with the variable X before making the comparison
(because most processors do not support a compare between memory contents
and immediate data).This example gets coded as,

X: DFB 10
Y: DFB 00
....
LDAA X ; load A acc with value of X
CMPA #02 ; compare A acc with immediate data
BNE IF1 ; exit if false
LDAA X ; get value of X
STAA Y ; store value of X at variable Y
IF1: ..... ; next statement after if construct

Lets consider another example.

IF X = Y THEN Y = 0

In this case, the code to be generated by the assembler for the compare statement
depends upon the addressing modes available. The options available are,

memory to memory compare


CMP [X], [Y]
register to memory compare
CMPA [Y]
register to register compare
CMPAB

Both X and Y variables are memory based, so if the processor supports a


comparison of two memory operands, it could be coded as,

CMP [X],[Y] ; sample only

However, most processors do not support this. The most common option is the
comparison of a register variable against memory contents. This is coded as
follows,

LDAA X ; get variable X


CMPA Y ; compare with variable Y
BNE IF1 ; exit it not equal
LDAA #00H ; set variable Y to zero
STAA Y
IF1: ....

This code can clearly be optimized (ie, some instructions can be removed without
affecting the original intent of the code). So far we have considered comparisons
for equality. The conditional branch instruction will vary depending upon what
type of comparison test is used. The following tables illustrate common
comparison tests and their associated conditional branch instructions.

+-----------------+-------------+--------------+
| Signed Operands | Branch True | Branch False |
+-----------------+-------------+--------------+
|r>m | BGT | BLE |
+-----------------+-------------+--------------+
| r >=m | BGE | BLT |
+-----------------+-------------+--------------+
|r=m | BEQ | BNE |
+-----------------+-------------+--------------+
| r <=m | BLE | BGT |
+-----------------+-------------+--------------+
|r<m | BLT | BGE |
+-----------------+-------------+--------------+

If ....... Then ---- Use Branch False


If NOT ... Then ---- Use Branch True

+-----------------+-------------+--------------+
|UnSigned Operands| Branch True | Branch False |
+-----------------+-------------+--------------+
|r>m | BHI | BLS |
+-----------------+-------------+--------------+
| r >=m | BCC | BCS |
+-----------------+-------------+--------------+
|r=m | BEQ | BNE |
+-----------------+-------------+--------------+
| r <=m | BLS | BHI |
+-----------------+-------------+--------------+
|r<m | BCS | BCC |
+-----------------+-------------+--------------+

The following table represents a cross-reference between branch instructions and


the flags they test.

+------+----------------+----------+
| 6802 | Flags Tested | 8088 |
+------+----------------+----------+
| BCC | C = 0 | JNB, JAE |
+------+----------------+----------+
| BCS | C = 1 | JB, JNAE |
+------+----------------+----------+
| BNE | Z = 0 | JNE, JNZ |
+------+----------------+----------+
| BEQ | Z = 1 | JE, JZ |
+------+----------------+----------+
| BPL | N = 0 | JNS |
+------+----------------+----------+
| BMI | N = 1 | JS |
+------+----------------+----------+
| BHI | C + Z = 0 | JNBE, JA |
+------+----------------+----------+
| BLS | C + Z = 1 | JBE, JNA |
+------+----------------+----------+
| BGE | N EOR V = 0 | JNL, JGE |
+------+----------------+----------+
| BLT | N EOR V = 1 | JL, JNGE |
+------+----------------+----------+
| BGT |Z + (N EOR V)= 0| JG, JNLE |
+------+----------------+----------+
| BLE |Z + (N EOR V)= 1| JLE,JNG |
+------+----------------+----------+
These tables are useful in determining the correct conditional instruction to use
for a particular comparison on specific data types. Coding the following statement
applicable to two unsigned 8bit data values

IF X <= Y THEN Y = 4

X: DFB 10H
Y: DFB 12H

IF: LDAA X
CMPA Y
BHI IF1
LDAA #04H
STAA Y
IF1: ....

5. The IF THEN ELSE statement


In comparing the value of operands, consider the following example.
6.
7. IF X = 2 THEN Y = X ELSE X = Y

This becomes coded as,

X: DFB 00
Y: DFB 00
IF: LDAA X
CMPA #02D
BNE ELSE1
LDAA X
STAA Y
JMP IF1
ELSE1: LDAA Y
STAA X
IF1: ....

8. The WHILE WEND statement


Consider the following example for unsigned values.
9.
10. WHILE X < 10 DO
11. Y=Y+1X=X+1
12. WEND
This becomes coded as,

X: DFB 00H
Y: DFB 00H
DO1: LDAB X
CMPB #10D
BCC EXIT1 ; for signed use BLT
LDAA Y ; increment value of Y
ADDA #01
STAA Y
LDAA X ; increment value of X
ADDA #01
STAA X
JMP DO1
EXIT1: ...

Consider the coding of the following HLL program into 6802 assembler.

Program HLLTest();
var loop, val1, val2 : Byte;
Begin
val1 := 0;
val2 := 0;
loop := 0;
while loop <= 10 do
begin
val2 := val2 + loop;
loop := loop + 1
end;
if val1 < val2 then val1 := val2
else val2 := val1
end.

The 6802 assembler version is

; HLLtest.asm
CPU 6802
HOF MOT
ORG 100h
loop: dfb 0
val1: dfb 0
val2: dfb 0
ORG 120h

Begin: LDAA #0 ; val1 := 0


STAA val1
LDAA #0 ; val2 := 0
STAA val2
LDAA #0 ; loop := 0
STAA loop
While: LDAA loop ; while loop <= 10 do
CMPA #10
BGT if1
LDAA val2 ; val2 := val2 + loop
LDAB loop
ABA
STAA val2
LDAA loop ; loop := loop + 1
ADDA #01
STAA loop
JMP While ; endwhile
if1: LDAA val2 ; if val1 < val2 then
CMPA val1
BGE Else
LDAA val2 ; val1 := val2
STAA val1
JMP endif
Else: LDAA val1 ; else val2 := val1
STAA Val2
Endif: NOP
SWI
End Begin

ASSEMBLY LANGUAGE PROGRAMMING, Part 4

DATA CONVERSION ROUTINES


Computer systems use character based keyboards and displays for inputting and
outputting data. Conversion routines are necessary to convert data types to character
strings and back again.

Consider the entry from a keyboard of an integer value 276. This represents a three
character sequence of '2', '7' and '6'. This character sequence will need to be converted
into an appropriate 16bit value representing an integer. Also consider displaying the
value of a byte as two hex digits. Each nibble must be converted to an ASCII character
before displaying on the terminal screen.

 HEX BYTE TO ASCII CHARACTERS


This routine converts a byte to TWO ASCII characters. eg,

 AFH becomes 41H 46H

The algorithm for this is

GET DIGIT
MASK OFF HIGH NIBBLE
CONVERT TO ASCII
MASK OFF LOW NIBBLE
CONVERT TO ASCII

The following code shows an MC6802 implementation.

CPU 6802
HOF MOT
ORG 0100H
Val1: DFB 3FH
Result: DFS 02H
ORG 120H
Start:
LDAA Val1 ; get val1
PSHA ; save val1
ANDA #0F0H ; mask high byte
LSRA ; shift to low nibble
LSRA
LSRA
LSRA
JSR Conv ; convert high nibble
STAA Result ; store it
PULA
ANDA #0FH ; mask low nibble
JSR Conv ; convert low nibble
STAA Result+1 ; store it
Exit: BRA Exit

Conv: CMPA #9H ; check for digit


BLS ASCZ
ADDA #07 ; adjust for letter
ASCZ: ADDA #30H ; adjust to ASCII
RTS

END Start

 ASCII STRING TO HEX BYTE


This routine converts a two character sequence into a hexadecimal byte, eg.

 41H 46H becomes AFH

The algorithm for implementing this routine is,

Get Digit
Subtract 30H from Digit
If Digit greater than Nine
Subtract 07H from Digit
EndIf
Shift into High Nibble and Store
Get Next Digit
Subtract 30H from Next Digit
If Next Digit greater than Nine
Subtract 07H from Next Digit
EndIf
OR Next Digit with stored High Nibble and Store

The following code shows an MC6802 implementation.

CPU 6802
HOF MOT
ORG 0100H
ASC1: DFB 33H
ASC2: DFB 46H
HexB: DFB 00H
ORG 120H

Start:
LDAA ASC1 ; get first digit
SUBA #30H
CMPA #09H
BLS If1
SUBA #07h
If1: ASLA
ASLA
ASLA
ASLA
STAA HexB
LDAA ASC2 ; get next digit
SUBA #30H
CMPA #09H
BLS If2
SUBA #07h
If2: ORAA HexB
STAA HexB
END Start

 8BIT MULTIPLY
This routine multiplys two 8bit values together generating a 16bit result. The
following algorithm (for two unsigned 8bit values) is based on processors which
do not have MULTilpy instructions.

 Set Product equal to Zero
 Set Counter equal to Eight
 While counter not equal to zero
 Left Shift Product (Multiply by 2)
 Shift Multiplier so bit goes into Carry
 If Carry bit is Set
 Product equals Product plus Multipicand
 Endif
 Subtract one from Counter
 EndWhile

The following program implements this for an MC6802.

CPU 6802
HOF MOT
ORG 0100H
Val1: DFB 10H
Val2: DFB 20H
Result: DFS 02H
ORG 0120H

Start: CLRA ; product MSB = zero


CLRB ; product LSB = zero
LDX #0008H ; multiplier = 8
Shift: CPX #0000H
BEQ Exit
ASLB ; shift product left 1 bit
ROLA
ASL Val1 ; shift multiplier left to
BCC Decr ; examine next bit
ADDB Val2 ; Add multiplicand to
ADCA #00H ; product if carry is set
Decr: DEX
BRA Shift ; loop till all 8bits are done
Exit: STAA Result
STAB Result+1
Finish: BRA Finish
END Start

 8BIT DIVIDE
This routine divides two 8bit values generating an 8bit quotient and 8bit
remainder.The following algorithm (for two unsigned 8bit values) is based on
processors which do not have DIVide instructions.

 Set Quotient equal to Zero
 Set Counter equal to Eight
 While Counter not equal to zero
 Left Shift Dividend (Multiply by 2)
 Left Shift Quotient
 If 8 MSB's of Dividend >= Divisor then
 MSB of Dividend = MSB of Divident - Divisor
 Add one to Quotient
 EndIf
 Subtract one from Counter
 EndWhile
 Remainder = MSB of Dividend

The following program implements this for an MC6802.

CPU 6802
HOF MOT
ORG 0100H
Val1: DFB 10H ; Dividend
Val2: DFB 20H ; Divisor
Quot: DFB 00H
Rem: DFB 00H
ORG 0120H

Start: LDX #0008H ; Number of bits in Divisor


CLRA
LDAB Val1 ; Get Dividend
Div: CPX #0000H
BEQ Exit
ASLB ; Shiftv Dividend and Quotient
ROLA
CMPA Val2 ; is subtraction successful
BCS ChkCnt
SUBA Val2 ; Yes, subtract and set bit in quotient
INCB
ChkCnt: DEX
BRA Div
Exit: STAB Quot
STAA Rem
END Start

 ASCII TO INTEGER
This routine converts an ASCII character string into a 16bit signed integer value.
To implement this routine, the following variables are used.

 OFFSET DFB ;offset into ASCII string
 BUFFER DFS ;space for ASCII string
 BINV DFW ;integer result
 BASE DFW ;base 10 value

The algorithm for implementing the routine is,

Position Offset to last character in Buffer


Set Base equal to 1
Set BinV equal to zero
While Offset is not zero do
Get Character stored at Buffer[Offset]
If character not '-' sign then
Mask out high nibble
Multiply by Base value
Add result to Binv
Base equals Base * 10
Subtract one from Offset
Else
Set HighBit of BinV to 1
Set Offset equal to zero
Endif
EndWhile
 INTEGER TO ASCII
This routine converts a 16bit signed integer into an ASCII character string. To
implement this routine, the following variables are used.

 OFFSET DFB ;offset into ASCII string
 BUFFER DFS ;space for ASCII string
 BINV DFW ;integer value
 BASE DFW ;base 10 value

The algorithm for implementing the routine is,

Offset equals last position in Buffer


Get BinV
Save BinV value for later use
While BinV not less than 10
Divide BinV by 10
BinV equals remainder added with 30H
Store result at Buffer[Offset]
Offset equals Offset - 1
EndWhile
Add 30H to BinV
Store result at Buffer[Offset]
Restore original BinV value
If highbit set on BinV
Subtract one from Offset
Store '-' sign at Buffer[Offset]
Endif

 PACKED BCD TO DECIMAL


This routine converts a two digit packed BCD number into an 8bit decimal
number. 93 becomes 5DH. The algorithm for performing this is,

 Get Packed BCD Value into Byte
 Move High Nibble to Low Nibble of Byte
 Zero High Nibble of Byte
 Multiply Byte by 10
 Add Low Nibble of BCD Value to Byte

The following program shows how this is implemented on the MC6802.

CPU 6802
HOF MOT
ORG 0100H
Val1: DFB 00H
Val2: DFB 0AH ; multiply by 10 decimal
Result: DFS 02H ; result of Val1 * Val2
PackBCD:DFB 93H
DecVal: DFB 00H
ORG 0110H

Start: LDAA PackBCD


LSRA ; shift high nibble to low nibble
LSRA
LSRA
LSRA
STAA Val1 ; multiply high nibble by 10
JSR Multiply
LDAA PackBCD
ANDA #0FH ; mask high byte
ADDA Result+1 ; add to high byte * 10
STAA DecVal ; store decimal value
Finish: BRA Finish

Multiply:CLRA ; product MSB = zero


CLRB ; product LSB = zero
LDX #0008H ; multiplier = 8
Shift: CPX #0000H
BEQ Exit
ASLB ; shift product left 1 bit
ROLA
ASL Val1 ; shift multiplier left to
BCC Decr ; examine next bit
ADDB Val2 ; Add multiplicand to
ADCA #00H ; product if carry is
setDecr: DEX
BRA Shift ; loop till all 8bits are done
Exit: STAA Result
STAB Result+1
RTS
END Start

ASSEMBLY LANGUAGE PROGRAMMING, Part 5

MODERN 16 BIT MICROPROCESSORS


[8086] In the code examples so far, we have separated out the coded instructions from the
data. Modern processors like the 8088 have separate registers which deal with each
section of a program.

CS and IP = instructions
DS, BX, SI= data
ES, BX, DI= extra data
SS, SP, BP= stack

In writing programs for modern processors like the 8088, the program is structured with a
minimum of three sections, called SEGMENTS. The three segments represent the
CODE, DATA and STACK areas of the program. Information within each segment is
accessed differently depending upon the segment type. To access data in the stack
segment requires the use of the SS, SP and or BP registers. The following diagrams
illustrates how information in the stack and data segments are accessed.
Special assembler directives are used to specify the different segments

SEGMENT DIRECTIVES
The following directives illustrate how to define the three basic segments for an 8088
assembly language program.

.STACK 100H
.DATA
.CODE

The value following the stack directive specifies the size of the stack segment.

The programmer is responsible for initializing the segment registers DS and ES to the
correct segments of the program. Failure to do so will result in a program which will not
access the data and extra data segments properly. The operating system will only
initialize the CS, SS, SP and IP registers.

The following code portion illustrates how to setup the data segment register. This is
performed at the beginning of the code segment.

.STACK 100H
.DATA
.CODE
MOV AX, @DATA ; initialize DS
MOV DS, AX

DIFFERENT SIZED MEMORY MODELS


The 8088 processor supports several different memory models. We shall look at the most
common types.

 SMALL memory model


The small memory model is limited to a single combined segment of 64k bytes.
This segment is a combination of the stack, code and data segments. The
assembler directive used to specify a small memory model is,

 .MODEL SMALL

 LARGE memory model
The large memory model supports multiple segments, each segment limited to
64k bytes. The code and stack segments are limited to 64k bytes each, but we can
have two data segments of 64k bytes each. The assembler directive used to
specify a large memory model is,

 .MODEL LARGE

Use this memory model for all your programs.

SUPPORT FOR DIFFERENT CPU TYPES


The following directives are used to specify the processor type.

.186
.286
.386
.8087
.8086

RETURNING TO PCDOS
When an assembly language program running under PCDOS terminates, it must return to
the operating system so that the user shell program can be re-loaded. The correct format
is to use the following code sequence

mov ax, 4c00h


int 21h

ASSEMBLER DIRECTIVES FOR IBM-PC PROGRAMS


The following is a discussion of the assembler directives applicable to packages like
Microsoft Masm and Turbo Assembler. These packages are used to write machine code
programs which run under PCDOS.

 EQUATES
The EQU directive creates absolute symbols and aliases by assigning an
expression or value to the declared variable name. Its format is,

 name EQU expression
An absolute symbol represents a 16bit value; an alias is a name that represents
another symbol. The declared name must be unique, one that has not been
previously declared.

pi EQU 3.14159
clearax EQU xor ax,ax

The first example directs the assembler to replace every occurrence of the name
pi with the value 3.1459, whilst the second example instructs the assembler to
replace every occurrence of clearax with the instruction xor ax,ax

 BYTE STORAGE
The DB directive allocates and initializes a byte (8bits) of storage for each
argument. Its format is,

 name DB initialvalue,,,

The name portion is optional.

value1 DB 16
form DB 6*2
text DB "Enter your name:"

In the first example, value1 is assigned a byte, and is initialized to 16, the second
example sets form equal to 12 and assigns it a byte, and in the last example, text
is defined as a sequence of bytes which each contain a character from the
specified string. The first byte will be initialized to 'E', whilst the last byte will be
initialized to a space character.

 WORD STORAGE
The DW directive allocates a word (2bytes) of storage for each initialized value.
Its format is,

 name DW initialvalue,,,

The name portion is optional.

DW ?
mess DW 'ab'

The first example allocates one word of storage, but does not define its initial
value (?). The second example defines mess as a word initialized with the
character string 'ab'.
Strings when using the DW directive must not contain more than two characters.
The 'b' will be placed in the low-order byte, and the 'a' will be placed in the high
order byte. If only one character is specified, the high-order byte will contain
00H. The low-order byte appears FIRST for Intel Processors.

 TITLE
The title directive specifies the program listing title.

 TITLE Graphics

This appears at the top of each page in the assembler list file, after the source file
name.

 NAME
The name directive is used to set the name of the current module. The module
name is used by the linker when displaying error messages. If no module name is
used, the linker will use the name specified using the title directive.

 NAME Calculate_Gross

 PAGE CONTROL
The PAGE directive can be used to designate the line length and width for the
program listing; normally used to generate a page break in the assembler listing
file.

When assembly is taking place, and the page directive is encountered, the
assembler generates a form-feed character to set a new page, and continues the
assembly on the new page. In this way, the programmer can organize a printout of
modules on a per page basis, so that the printout of more than one module per
page does not occur.

PAGE 66,132 ; 66 lines per page


; 132 characters wide
PAGE ; go to new page in list file

 PROCEDURES
These directives are used to implement small procedures (modules).

 name PROC codetype .... ret name
 ENDP

The last instruction in a procedure is a RETurn instruction. The codetype is FAR


for large memory models, NEAR for small memory models. A procedure must be
entered using the appropriate CALL instruction.
 DEFINE DOUBLE WORD, DEFINE QUAD WORD and DEFINE TEN
The DD directive defines a double word [4bytes] of storage. This is used to
reserve storage for 32 bit integers, floating point numbers, or far pointers to code
or data [segment:offset pair].

The DQ directive defines a quad word [8bytes] of storage for double precision
floating point numbers.

The DT directive defines 10bytes of storage. This is normally used for Packed
BCD numbers and a 10 byte temporary real floating point value, as this storage
format is also used by the 80x87 arithmetic co-processor.

 OFFSET
The offset directive returns the number of bytes a variable begins at, relative to
the start of the segment it is in. This is necessary when calling PCDOS routines.

 .DATA
 temp db 10
 mess db 'Hi there','$'

 .CODE
 start: mov ax, @data
 mov ds, ax
 mov ah, 9h
 mov dx, OFFSET mess ;1 byte in .DATA segment
 int 21h ;print message
 mov ax, 4c00h ;return to PCDOS
 int 21h
 END start

SAMPLE PROGRAM FOR IBM-PC

TITLE Doscall ;Doscall.asm source file


.MODELSMALL
CR equ 0ah
LF equ 0dh
EOSTR equ '$'

.stack 200h
.datamessage db 'Hello and welcome.'
db CR, LF, EOSTR

.code
print proc near
mov ah,9h ;PCDOS print function
int 21h
ret
print endp

start: mov ax, @data


mov ds, ax
mov dx, offset message
call print
mov ax, 4c00h
int 21h
end start

The program is assembled by typing

$ TASM DOSCALL
Turbo Assembler V1.0 Copyright(c)1988 by Borland International
Assembling file: DOSCALL.ASM
Error messages: None
Warning messages: None
Remaining memory: 257k
$

This produces an object file named DOSCALL.OBJ which must be linked to create an
executable file which can run under PCDOS.

$ TLINK DOSCALL
Turbo LinkV2.0 Copyright (c) 1987, 1988 Borland International
$

The program when run, produces the following output.

$ DOSCALL
Hello and welcome.
$

MACROS
The macro directive allows the programmer to write a named block of source statements,
then use that name in the source file to represent the group of statements. During the
assembly phase, the assembler automatically replaces each occurrence of the macro name
with the statements in the macro definition.

Macros are expanded on every occurrence of the macro name, so they can increase the
length of the executable file if used repeatably. Procedures or subroutines take up less
space, but the increased overhead of saving and restoring addresses and parameters can
make them slower. In summary, the advantages and disadvantages of macros are,

Advantages

 Repeated small groups of instructions replaced by one macro


 Errors in macros are fixed only once, in the definition
 Duplication of effort is reduced
 In effect, new higher level instructions can be created
 Programming is made easier, less error prone
 Generally quicker in execution than subroutines

Disadvantages
In large programs, produce greater code size than procedures

When to use Macros

 To replace small groups of instructions not worthy of subroutines


 To create a higher instruction set for specific applications
 To create compatibility with other computers
 To replace code portions which are repeated often throughout the program

MACRO DEFINITION
Defining Macros is done as follows,

name MACRO [optional arguments]


statements
statements
ENDM

Consider the following macro to return to PCDOS from an assembly language program.

exittodos MACRO mov ax,4C00h


int 21h
ENDM
Macros are expanded when the program is assembled. This means that every occurrence
of the macro name (apart from the definition) is replaced by the statements in the macro
definition. An example will demonstrate this.

TITLE dosmacro
.MODELsmall
exittodos MACRO mov ax,4C00h
int 21h
ENDM

.STACK 100h
.DATA
message DB 'Hello and Welcome', '$'

.CODE
start: mov ax, @data
mov ds, ax
mov ah, 9h
mov dx, OFFSET message
int 21h
exittodos
END start

When assembled, the macro is replaced and the internal representation of the file looks
like,

TITLE dosmacro
.MODELsmall
exittodos MACRO mov ax,4C00h
int 21h
ENDM

.STACK 100h
.DATA
message DB 'Hello and Welcome', '$'

.CODE
start: mov ax, @data
mov ds, ax
mov ah, 9h
mov dx, OFFSET message
int 21h
mov ax,4C00h
int 21h
END start

Macros can also accept values (parameters).

addup MACRO ad1,ad2, ad3


mov ax, ad1
mov dx, ad2
mov cx, ad3
ENDM

In this example a macro named addup is created. It accepts three parameters, ad1, ad2
and ad3. The code which follows, consisting of the mov statements, will be used to
replace every occurrence of the macro name addup in the source file. The macro is
terminated with the ENDM statement.Calling a macro with arguments is done as
follows,

addup bx, 2, count

This has the effect of loading the ax register with the contents of the bx register, the dx
register with the value 2, and the cx register with the value of count.

Macro definitions may include other macro names, and macros may also be recursive:
they can call themselves, eg,

pushall MACRO reg1, reg2, reg3, reg4, reg5, reg6


IFNB <reg1> ;; If parameter not blank push reg1
;; push one register and
;; repeat
pushall reg2, reg3, reg4, reg5, reg6
ENDIF
ENDM

pushall ax, bx, si, ds


pushall cs, es

This shows a recursive macro called pushall that continues to call itself until it
encounters a blank argument. In effect, it pushes the registers specified in the macro call
onto the stack.

The ;; indicates that the comment field of the macro should not be expanded with the
macro statements.
IMPLEMENTING FP NUMBERS, ARRAYS, RECORDS AND JUMP TABLES

Floating Point Numbers


The following example shows the declaration of a single precision floating point decimal
number (stored in IEEE 754 standard).

FPnum1 DD 1.32740

BCD strings
The following example declares a packed BCD constant.

BCDval DT 123456

Ten bytes are allocated, giving a number range of 0 to 99,999,999,999,999,999,999.

HANDLING ARRAYS
Arrays and array elements are dealt with using pointers. This involves either based or
indexed addressing.

 Manipulating an Array Element



 1: Load a base/index register with the address of the first element
 2: Calculate the offset position of the required element (1 byte for
characters, 2 bytes for integers etc)
 3: Perform the operation by either
 a) incrementing the base/index register by the required
amount
 b) use based indexed addressing eg,
 X := IntArray[4];
 mov bx, offset IntArray ; base address
 mov ax, 4 ; calculate offset
 mul ax, 2
 mov si, ax
 mov X, [bx + si]

 Cycling through an Array using a Loop count variable
The principles are the same, but the offset is the loop count variable adjusted by
the number of bytes per element.eg,

 FOR Loop := 1 to 10 do
 BEGIN
 sum := sum + IntArray[Loop]
 END;

 initfor:mov ax, 1 ; Loop := 1
 mov Loop, ax
 mov bx, offset IntArrat ; setup base register
 for: mov ax, Loop
 cmp ax, 10
 ja forexit
 mov ax, Loop ; calculate offset
 mul ax, 2
 mov si, ax
 mov ax, [bx + si]
 mov cx, sum ; add sum and intArray[Loop]
 add ax, cx
 mov sum, ax ; update sum
 jmp for
 forexit:

Integer Arrays
Integer arrays occupy two bytes per element. A typical operation is to sum the contents of
an integer array. The following code for an 8086 shows this.

TITLE IntArray
.MODEL Large
.STACK 200h
.DATA
mess db 'The total is ','$'
result dw ?
IntArry dw 10, 34, 76, 25, 14, 9, 3, 22
IntAlen dw ($ - IntArry) / 2
buff db 6
dup( 20h ) db '$'

.CODE

binasc proc far ; convert result to ascii string


mov ax, 0
mov ax, [result] ; get number to convert
push ax ; save it
mov si, offset buff[5] ; point to string area
mov cx, 10 ; divide base factor
shl ax, 1 ; clear sign bit
shr ax, 1
do1: cmp ax, 10 ; compare with base fact
jb exit1
mov dx, 0 ; clear upper numerator
div cx ; divide by base factor
add dl, 30h ; convert to ASCII
mov [si], dl ; and store it
dec si ; next character
jmp do1
exit1: add al, 30h ; convert last character
mov [si], al ; and store it
pop ax ; recover
or ax, ax ; and test for sign bit
jns exit2
dec si ; store '-' sign
mov bl, 2dh
mov [si], bl
exit2: ret
binasc endp

start: mov ax, @data


mov ds, ax
mov [result], 0000h ; clear result
mov cx, IntAlen ; count of elements
mov bx, offset IntArry ; point to IntArry
mov si, 0000h ; first element
xor ax, ax ; clear total
lp1: add ax, [bx + si] ; add value to total
inc si ; next element
inc si
dec cx
jne lp1
mov [result], ax ; store total
mov dx, offset mess ; print message
mov ah, 9h
int 21h
call binasc ; convert result to ASCII
mov dx, offset buff
mov ah, 9h
int 21h
mov ax, 4c00h ; exit to DOS
int 21h
END start
Other typical operations involve the determination of the minimum and maximum values.

Records (Structures)
Records in Pascal support the use of different sized field items. Consider the storage of
the following record.

Var example_record = RECORD


int_number : integer;
fp_number : real;
letter : character;
END;

The same record is implemented in assembly language by first defining its composition.

ex_rec STRUC
int_num dw
fp_num dd
lett db
ex_rec ENDS

The next step creates a record which has the composition of the previous records
definition.

my_rec ex_rec <22, 3.2, 'Hi there.$'>

Each field of the record is accessed in a similar method to that of Pascal, eg,

ex_rec.lett

accesses the lett field of the record ex_rec. The following program shows an
implementation for the 8088 processor.

TITLE Records
.MODEL Large

ex_rec STRUC
int_num dw
fp_num dd
mess db ""
ex_rec ENDS

.STACK 200h
.DATA
myrec ex_rec <22,1.30, "Hello there.$">

.CODE
start: mov ax, @data
mov ds, ax
mov dx, offset myrec.mess
mov ah, 9h
int 21h
mov ax,4c00h
int 21h
END start

Jump Tables
Jump tables are an efficient method of implementing switch/case type statements. A jump
table consists of an array of addresses. Using an offset into the array selects the address
of the routine which handles that particular value.

Jump tables are efficient, because it always take the same time to select any routine from
the table. The order may be re-arranged or new routines added simply be increasing the
size of the table.

The following program implements a jump table.

TITLE Jump.asm
.MODEL Large
.STACK 200h
.DATA
help db 'This program exits when a function key is pressed.'
db 10, 13, 'Ctrl A generates underline.', 10, 13
db 'Ctrl B generates bold.', 10, 13
db 'Ctrl C generates blinking.', 10, 13
db 'All other control codes return to normal text.', 10, 13
db 10, 13, 'Start typing characters.', 10, 13, '$'attrib
db 07h ; screen attribute byte
; a table of addresses used to decipher recieve control codes
; each entry is the address of the appropriate routine
ctl_tbl label word
dw ctrl_null ;0
dw ctrla ; 1
dw ctrlb ; 2
dw ctrlc ; 3
dw ctrld ; 4
dw ctrle ; 5
dw ctrlf ; 6
dw ctrlg ; 7
dw ctrlh ; 8 10
dw ctrli ; 9 11
dw ctrlj ; a 12
dw ctrlk ; b 13
dw ctrll ; c 14
dw ctrlm ; d 15
dw ctrln ; e 16
dw ctrlo ; f 17
dw ctrlp ; 10 20
dw ctrlq ; 11 21
dw ctrlr ; 12 22
dw ctrls ; 13 23
dw ctrlt ; 14 24
dw ctrlu ; 15 25
dw ctrlv ; 16 26
dw ctrlw ; 17 27
dw ctrlx ; 18 30
dw ctrly ; 19 31
dw ctrlz ; 1a 32
dw ctrl_lbkt ; 1b 33
dw ctrl_bslash ; 1c 34
dw ctrl_rbkt ; 1d 35
dw ctrl_carat ; 1e 36
dw ctrl_ul ; 1f 37

.CODE
bumpcur proc far ; move cursor right one character
mov ah, 3
xor bh, bh
int 10h ; read int dh, dl
inc dl ; next column
cmp dl, 80 ; end of line?
jle short bpcur1
xor dl, dl ; go to start of next line
inc dh
cmp dh, 24 ; end of screen?
jl short bpcur1
mov ax, 0601h ; then scroll up
xor cx, cx
push dx
mov dh, 24
mov dl, 80
mov bh, [attrib]
int 10h
pop dx
mov dh, 24 ; position bottom
linebpcur1:
xor bh, bh ; set cursor position
mov ah, 2
int 10h
ret
bumpcur endp

ctrl_code proc far ; process Control CODES


push bx
cbw ; convert AL to AX
mov bx,ax ; use bx and an index into
shl bx,1 ; the ctrl_tbl
jmp ctl_tbl[bx] ; jump to key routine
ctrla: and byte ptr [attrib], 0f9h ; underline
jmp ctrl_exit
ctrlb: or byte ptr [attrib], 08h ; bold
jmp ctrl_exit
ctrlc: or byte ptr [attrib], 80h ; blink on
jmp ctrl_exit
ctrld: ; all others normal
ctrl_null:
ctrle:
ctrlf:
ctrlg:
ctrlh:
ctrli:
ctrlj:
ctrlk:
ctrll:
ctrlm:
ctrln:
ctrlo:
ctrlp:
ctrlq:
ctrlr:
ctrls:
ctrlt:
ctrlu:
ctrlv:
ctrlw:
ctrlx:
ctrly:
ctrlz:
ctrl_lbkt:
ctrl_bslash:
ctrl_rbkt:
ctrl_carat:
ctrl_ul:
mov byte ptr [attrib], 07h ; normal attribute
ctrl_exit: pop bx
ret
ctrl_code endp

start: mov ax, @data


mov ds, ax
mov ah, 9h ;print help message
mov dx, offset help
int 21
hlp1: mov ah, 06h ; read character from keyboard
mov dl, 0ffh
int 21h
jz lp1 ; repeat if character not ready
cmp al, 00h ; if function key then exit
je exit
cmp al, 32 ; else if control code
jae disp1
call ctrl_code ; then process control code
jmp lp1
disp1: push bx
xor bx, bx ; page zero on video memory
mov bl, [attrib] ; get character attribute
mov cx, 1 ; one character to write
mov ah, 9 ; write char + attribute
int 10h ; use BIOS call
call bumpcur ; next cursor position
jmp lp1 ; repeat
exit: mov ax, 4c00h
int 21h
END start
ASSEMBLY LANGUAGE PROGRAMMING, Part 6

PARAMETER PASSING
Parameter passing refers to the exchange of data between modules. There are many ways
this information can be exchanged.

1. GLOBAL DATA USING COMMON BUFFER OR MEMORY


The data is stored in memory accessible to all modules. The disadvantage of this
technique is that the data may be modified by any module, which makes
debugging harder.

Consider the following simple program which adds two numbers together, storing
the result. All data has been declared as common.

TITLE CommonData
.MODEL Large
.STACK 200h
.DATA
num1 dw 22
num2 dw 32
result dw 0

.CODE
addnum proc far
mov ax, [num1]
mov bx, [num2]
add ax, bx
mov [result], ax
ret
addnum endp

start: mov ax, @data


mov ds, ax
call addnum ; add num1 and num2
mov ax, 4c00h
int 21h
END start

2. REGISTER VARIABLES
This technique involves passing and returning values using processor registers.
Routines must ensure that they do not corrupt any registers other than those which
have been specified. The programmer first determines which registers will be
used and which can be altered (contents destroyed).

Consider the following implementation of the previous addition program to use


register variables.

TITLE CommonData
.MODEL Large
.STACK 200h
.DATA
num1 dw 22
num2 dw 32
result dw 0

.CODE
addnum proc far
; accepts num1 in ax, num2 in bx, returns result in dx
push ax
add ax, bx
mov dx, ax
pop ax
ret
addnum endp

start: mov ax, @data


mov ds, ax
mov ax, [num1]
mov bx, [num2]
call addnum ; add num1 and num2
mov [result], dx
mov ax, 4c00h
int 21h
END start

The advantage is that only the calling module alters the data, whilst the module
addnum only works on copies of the data. In this way, it is easier to track which
modules affect the data variables.

3. STACK VARIABLES
Parameters may also be passed using the stack. This involves pushing the values
onto the stack before the module is called. This may also involve pushing space
onto the stack for a return result.
The module then accesses the parameters on the stack using the appropriate
addressing mode.

Upon return to the calling module, the stack space is deallocated using
appropriate pop or stack pointer adjustment instructions.

There are two ways in which data may be referenced using the stack.

1. Call by Value
This refers the placing of copies of the data value on the stack. Only the
copy is worked with, the original remains unmodified.
2. Call by Reference
This refers to the passing of the address of the variable using the stack.
This address is used to access the data, thus the original data is used.

Call by value is normally used for simple data types, whilst call by reference is
used for data types like arrays and records, because of the amount of memory
space they occupy (and stack space is normally limited).

Consider the following program for an MC6802 processor which uses Call by
Value to add two variables together.

CPU 6802
HOF MOT
ORG 100H
Num1: DFB 10
Num2: DFB 20
Result: DFB 0

Start: PSHA ; Make room for result on stack


LDAA Num1
LDAB Num2
PSHA ; Place copy Num1 on stack
PSHB ; Place copy of Num2 on stack
JSR Addup
PULB ; remove copy of Num2
PULA ; remove copy of Num1
PULA ; get result from Addup
STAA Result
Exit: BRA Exit

Addup: TSX ; transfer SP into IX register


PSHA ; save registers
PSHB
LDAA 02,X ; Get Num2
LDAB 03,X ; Get Num1
ABA ; Add Num1 and Num2
STAA 04,X ; Store on stack for return
PULB ; Recover original register values
PULA
RTS
END Start

PARAMETER PASSING FOR THE 8088 PROCESSOR

ACCESSING THE STACK FRAME INSIDE A MODULE


Lets look at how a module handles the stack frame. Because each module will use the BP
register to access any parameters, its first chore is to save the contents of BP.

push bp

It then transfers the address of SP into BP; BP now points to the top of the stack.

mov bp,sp

thus the first two instructions in a module will be the combination,

push bp
mov bp,sp

ALLOCATION OF LOCAL STORAGE INSIDE A MODULE


Local variables are allocated on the stack using a

sub sp, n

instruction. This decrements the stack pointer by the number of bytes specified by n. For
example, a module might want to use temporary storage space for an integer i, which
equates to the machine code instruction

sub sp, 2

Pictorially, the stack frame looks like,


+---------+
| ihigh |<-- SP
+---------+
| ilow |
+---------+
| BPhigh |<-- BP
+---------+
| BPlow |
+---------+

The local variable i can be accessed using SS:BP - 2, so the statement,

i = 24;

is equivalent to

mov [bp - 2], 18

Note that twenty-four decimal is eighteen hexadecimal.

DEALLOCATION OF LOCAL VARIABLES WHEN THE MODULE


TERMINATES
When the module terminates, it must deallocate the space it allocated for the variable i on
the stack. Referring to the above diagram, it can be seen that BP still holds the top of the
stack as it was when the module was first entered. BP has been used for two purposes,

 to access parameters relative to it


 to remember where SP was upon entry to the module

The deallocation of any local variables (in our case the variable i) will occur with the
following code sequence,

mov sp, bp ;this recovers SP, deallocating i


pop bp ;SP now is the same as on entry to module

THE PASSING OF PARAMETERS TO A MODULE


Consider the following module call in a high level langauge.
add_two( 10, 20 );

The language pushes parameters (the values 10 and 20) right to left, thus the sequence of
statements which implement this are,

push ax ; assume ax contains 2nd parameter, ie, integer


; value 20
push cx ; assume cx contains 1st parameter, ie, integer
; value 10
call add_two

The stack frame now looks like,

+---------+
| Return |<-- SP
+---------+
| address |
+---------+
| 00 | ;1st parameter, integer value 10
+---------+
| 0A |
+---------+
| 00 | ;2nd parameter, integer value 20
+---------+
| 14 |
+---------+

Remembering that the first two statements of module add_two() are,

add_two: push bp
mov bp, sp

The stack frame now looks like (after those first two instructions inside add_two)

+---------+
| BPhigh |<-- BP <-- SP
+---------+
| BPlow |
+---------+
| Return |
+---------+
| address |
+---------+
| 0A | ;1st parameter, integer value 10
+---------+
| 00 |
+---------+
| 14 | ;2nd parameter, integer value 20
+---------+
| 00 |
+---------+

ACCESSING OF PASSED PARAMETERS WITHIN THE CALLED MODULE


It should be clear that the passed parameters to module add_two() are accessed relative to
BP, with the 1st parameter residing at [BP+4], and the 2nd parameter residing at [BP+6].

DEALLOCATION OF PASSED PARAMETERS


The two parameters passed in the call to module add_two() were pushed onto the stack
frame before the module was called. Upon return from the module, they are still on the
stack frame, so now they must be deallocated. The instruction which does this is,

add sp, 4

where SP is adjusted upwards four bytes (ie, past the two integers).

INTERFACING TO HLL ROUTINES


There are times that high level languages need to call assembly language modules. This
results due to constraints like speed and memory space.

We shall look at interfacing a Pascal program to an assembly language module.

The Pascal program will declare an integer based array, and pass the address of this array,
and the number of elements in the array, to an assembly language module.

Using the address, the assembly language module will add the sum of the array, returning
the result to the Pascal program.

The assembly language module is shown below.

TITLE Addup88
.MODEL TPASCAL
.CODE
PUBLIC Addup
Addup Proc Far Array : DWORD, Elements : WORD RETURNS Reslt :
WORD
push ds ; save ds register
push cx ; save cx register
push si ; save si register
lds si, Array ; point DS:SI to array element1
mov cx, Elements ; count of elements
xor ax, ax ; clear total
lp1: add ax, [si] ; add value to total
inc si ; next element
inc si
dec cx
jne lp1
pop si
pop cx
pop ds
RET ; exit to Pascal Module with
; result in AX
Addup ENDP
END

This is compiled to OBJECT code by the command

$TASM ADDUP88

The Pascal module is shown below.

Program ADDDEMO (input, output);


Uses DOS, CRT;
Type IntArray = Array[1..20] of Integer;
Var
Numbers : IntArray;
Result : Integer;
Loop : Integer;
{$F+}

Function Addup( var Numbers : IntArray; Elements : Integer )


: Integer ; EXTERNAL;
{$L ADDUP88.OBJ}
{$F-}

begin
for loop:= 1 to 20 do
Numbers[loop] := loop;
Result := Addup( Numbers, 20 );
Writeln('The sum of the array is ', Result)
End.

When compiled under Turbo Pascal, the two object modules are linked together, creating
an executable file.

ASSEMBLER OPTIONS
Various options are supported by most assemblers. These options provide for

 increase productivity
 to check operation of assembler - macros, equates
 to simplify control
 provide flexibility

COMMAND FILES
Command files are text files which contain commands to the assembler.

$TASM @MYCMDFIL

will invoke the assembler using the options specified in the file Mycdfil. If this file
contained the following,

/a /e myprog, myobj, mylst;

this is equivalent to typing

$TASM /a /e myprog, myobj, mylst;

This simplifies the process of having to repeat all the command line options whilst the
program is being debugged.

CONDITIONAL ASSEMBLY OF SOURCE CODE STATEMENTS


The following directives are used to specify to the assembler, whether or not to assemble
the bracketed group of statements which follow.
IF
ELSE
ENDIF
IFDEF
IFNDEF

The IF directives and the ENDIF and ELSE directives can be used to enclose the
statements to be considered for conditional assembly.

The conditional block of statements is used as follows,

IF debug
xor ax,ax
ELSE
xor bx,bx
ENDIF

If the symbol debug equates to true (non-zero), the ax register will be cleared, otherwise
the bx register will be cleared.

The IFDEF and IFNDEF directives test whether or not the given variable name/symbol
has been defined.

IFDEF buffer
buf1 DB 10 DUP(?)
ENDIF

In this example, buf1 is allocated only if buffer has previously been defined. It consists
of ten bytes whose initial value is undefined.

THE INCLUSION OF SOURCE MACROS AND DEFINITIONS


A macro or definition file is a collection of definitions or program code which can be
included into the source code program. A macro file is simply a file containing macro
definitions.

The programmer adds these definitions to the source file using the include directive, and
may remove unwanted definitions using the purge directive.

The include directive inserts the definitions or code statements from the specified file
into the current source file during assembly, and allows any variables or declarations in
the include file to be referenced or accessed in the source program being written.
INCLUDE entry
INCLUDE b:\include\c_stuff

LIST FILES
List files have already been covered under section 6 dealing with CRS8.

The format for invoking the 8088 assembler is,

TASM sourceasmfile, objfilename, listfilename

or the /l option can be specified on the command line.

The following 8088 assembler directives can disable and enable the output listing.

%NOLIST
%LIST

Consider the following 8088 assembly language program.

TITLE Doscall ;Doscall.asm source file


.MODEL SMALL
CR equ 0ah
LF equ 0dh
EOSTR equ '$'
.stack 200h
.data
message db 'Hello and welcome.'
db CR, LF, EOSTR
.code
print proc near
mov ah,9h ;PCDOS print function
int 21h
ret
print endp

start: mov ax, @data


mov ds, ax
mov dx, offset message
call print
mov ax, 4c00h
int 21h
end start

When assembled with the following command line options,

$TASM /l /n Doscall;

It generates a listing file. The list file for the program looks like,

Turbo Assembler Version 1.0 21-05-89 13:27:31


Page 1
DOSCALL.ASM
Doscall
1 0000 .MODEL SMALL
2
3 = 000A CR equ 0ah
4 = 000D LF equ 0dh
5 = 0024 EOSTR equ '$'
6
7 0000 .stack 200h
8
9 0000 .data
10 0000 48 65 6C 6C 6F 20 61 + message db 'Hello and welcome.'
11 6E 64 20 77 65 6C 63 +
12 6F 6D 65 2E
13 0012 0A 0D 24 db CR, LF, EOSTR
14
15 0015 .code
16 0000 print proc near
17 0000 B4 09 mov ah,9h ;PCDOS print function
18 0002 CD 21 int 21h
19 0004 C3 ret
20 0005 print endp
21
22 0005 B8 0000s start: mov ax, @data
23 0008 8E D8 mov ds, ax
24 000A BA 0000r mov dx, offset message
25 000D E8 FFF0 call print
26 0010 B8 4C00 mov ax, 4c00h
27 0013 CD 21 int 21h
28
29 end start
The s on line 22 indicates a segment register value which is filled in by the DOS loader
when the program is loaded into memory. The r on line 24 indicates a relative value
which is also filled in by the DOS loader.

SYMBOLIC INFORMATION
Symbolic information is useful in determining the size and location of variables,
segments etc. This information is used when debugging the program or locating the
program in Eprom.

The 8088 assembler options available are,

/c cross-reference in list file


/l listfile generated
/n suppress symbol table in list file
/zd line numbers in object code
/zi debug info in object code for debugger

When the previous program Doscall.asm is assembled with a list file and symbol plus
cross-referencing, the additional information appended to the list file is,

Turbo Assembler Version 1.0 21-05-89 13:35:03


Page 2
Symbol Table
Symbol Name Type Value Cref defined at #
??DATE Text "21-05-89"
??FILENAME Text "DOSCALL "
??TIME Text "13:35:03"
??VERSION Number 0100
@CODE Text _TEXT #1 #15
@CODESIZE Text 0 #1
@CPU Text 0101H
@CURSEG Text _TEXT #9 #15
@DATA Text DGROUP #1 22
@DATASIZE Text 0 #1
@FILENAME Text DOSCALL
@WORDSIZE Text 2 #9 #15
CR Number 000A #3 13
EOSTR Number 0024 #5 13
LF Number 000D #4 13
MESSAGE Byte DGROUP:0000 #10 24
PRINT Near _TEXT:0000 #16 25
START Near _TEXT:0005 #22 29
Groups & Segments Bit Size Align Combine Class Cref defined at #
DGROUP Group #1 1 22
STACK 16 0200 Para Stack STACK #7
_DATA 16 0015 Word Public DATA #1 #9
_TEXT 16 0015 Word Public CODE #1 1 #15 15

This also shows which lines variables and labels were defined and referenced.

PROGRAM MANAGEMENT TOOLS


These tools are designed to make the process of maintaining programs easier.

MAKE
This utility is designed to ease updating of programs, especially multiple module
programs.

It works by using a list of dependencies. These dependencies illustrate the relationship


between the source, include, object and executable versions of the program.

The dependencies are stored in a file called makefile.

Consider a program which has the following dependencies.

MYDBASE.EXE comprises the modules


start.obj
search.obj
fileio.obj
keybdio.obj
videoio.obj

Each object file is generated from an assembler source file of the same name.

The command sequence to create the executable program is,

tasm start
tasm search
tasm fileio
tasm keybdio
tasm videoio
tlink start search fileio keybdio videio, mydbase;

The dependencies and command sequences required are entered into the makefile as
follows.
mydbase.exe: start.obj search.obj fileio.obj keybdio.obj videoio.obj
tlink start search fileio keybdio videio, mydbase;
start.obj: start.asm
tasm start
search.obj: search.asm
tasm search
fileio.obj: fileio.asm
tasm fileio
keybdio.obj: keybdio.asm
tasm keybdio
videoio.obj: videoio.asm
tasm videio

The program is assembled and linked by typing

make

It works by comparing date and time stamps of the files in each dependency list. Consider
the lines

keybdio.obj: keybdio.asm
tasm keybdio

It compares the date/time stamp of keybdio.asm against keybdio.obj. If the object file is
newer than the assembly file, it will not re-assemble.

If the assembler file has a newer date/time stamp, it will execute the command tasm
keybdio to generate a new object file.

The use of make files simplifies the re-assembly by only assembling those files which
have been modified.

SOURCE CODE REVISION SYSTEMS


Source code revision systems are used to keep track of different versions of a program. It
keeps a record of all the changes made to the program.

Previous revisions can be extracted from the database, and a printout detailing the
changes (time, who, line#) can be obtained.
LIBRARY MAINTENANCE
This applies to the maintenance of OBJECT code libraries.

An Object code library contains routines which can be reused in any program. The code
for the routine is extracted from the library and combined with the users object code at
linking time.

Users can create their own library routines. The source files are assembled into object
code then added to a library.

The following code represents a routine for placement into a Video routines library.

TITLE SetCur
.CODE
PUBLIC setcur

setcur proc far ; set cursor to position in DX register


mov ah, 2 ; dh = y co-ordinate, dl = x co-ordinate
xor bh, bh
int 10h
ret
setcur endp
END

After assembling into Object code, the object code is placed into a video library using the
TLIB utility.

TLIB video +setcur.obj

The following source file shows how to use the code in a library.

TITLE Libdemo
.STACK 200h
.CODE
EXTRN setcur:far
start: mov dx, 0 ; cursor 0,0
call setcur
mov ax, 4c00h
int 21h
END start

After assembling the file Libdemo.asm, the command to link the object and library code
together is,
TLINK Libdemo,libdemo.exe,libdemo.map, video

LINKERS
The assembler for 8088 PCDOS programs generates object code files. These cannot be
executed directly on the computer system, but require further processing in order to
generate a runfile. This further process is called the linking phase.

Functions performed by a linker include:

 combines object modules together


 combines segments of the same type together
 resolves addresses unknown at assembly time
 allocates storage
 generates symbolic information
 generates a load module

8088 LINKER OPTIONS


The following options are used to obtain information which is helpful in debugging
programs; or generate code for 386 processors.

/m add public symbols


/x no map file
/s map file with segments, publics symbols and start address
/t generate .COM file
/v add debug info
/3 386 code

The Map File Facility


If the linker is requested to generate a map file, it will list the names, load addresses, and
lengths of all segments in a program. It also lists the names and load addresses of any
groups in the program, the start address, and messages about any errors the linker may
have encountered.

The map file generated by the linker for the program DOSCALL.ASM is,

Start Stop Length Name Class


00000H 00014H 00015H _TEXT CODE
00016H 0002AH 00015H _DATA DATA
00030H 0022FH 00200H STACK STACK
Address Publics by Name
Address Publics by Value
Program entry point at 0000:0005

DEFINITION OF LINKING TERMS

 Relocatable/Relative
The code generated by the linker is all relative to the location counter. This means
that all references to memory is relative to a base/index register, segment register
or program counter. This allows the operating system to load the program
anywhere in physical memory.

Relocatable code is a must for multi-user and multi-tasking operating systems.


The program is preceded by a header file, which the operating systems loader uses
to perform relocation.

 Absolute/Fixed
If the linker generates code which is absolute, all memory references are to
absolute addresses, thus the program must reside in a designated memory space. If
this space is unavailable, the program cannot be run and must wait.

Absolute code is normally used on small single processor systems (ie, CPM), and
is not suitable for multi-user environments.

Absolute code does not contain a header file used for relocation, if a header file
exists, it will specify the absolute load address of the code which follows the
header file.

 Common
Variables, labels or symbols may be designated as common. In this way, they are
made accessible to those modules which wish to reference them by way of calls or
data usage. The common data is shared by the various modules.

The linker combines multiple definitions into a single overlayed segment.

 External/Public
Public data segments are located in one module but called from another.

The Public directive makes the variable, label or symbol in the current segment
available to all other modules. It thus transforms locally defined symbols into
global symbols.

The Extern directive makes a global symbols name and type known in a source
file so that it may be used/referenced in that file. An extern item is a variable,
label or symbol that has been declared using the public directive in another
module of the program.

Example of program using public/extern directives:

Main Module
NAME main
.MODEL small
PUBLIC exit ;defines exit as being known to other modules
EXTERN print:near ;defines print as existing in another module
.STACK 100h
.DATA
.CODE
start: mov ax, @data ; Load segment location
mov ds, ax ; into DS register
jmp print ; goto PRINT in other module
exit: mov ax, 4C00h ; call terminate function
int 21h
END start

Task Module
NAME task
.MODEL small
PUBLIC print ;defines print as public so it can
;be used by the calling module
EXTERN exit:near ;defines exit as existing in another module
; outside this one
.DATA
string DB "Hello",13,10,"$"
.CODE
print: mov dx, OFFSET string ;Load location of string
mov ah, 09h ;call string display function
int 21h
jmp exit ;go back to main module
END

In this example, the symbol exit is declared public in the main module so that it
can be accessed from another source module (task).

The main module also contains an external declaration of the symbol print. This
declaration defines print to be a near label so that it can be accessed from the
module main, even though it is assumed to be located and declared public in
another source module.

A jmp instruction later in main has the label print as its destination.
The symbol print is declared public in the task module so that it may be accessed
from another module (main).

The symbol exit is defined as a near label so that it can be accessed from this
module, even though it is assumed to be located and declared public in the other
module.

Before this program can be executed, the two source files (one containing main,
the other task) must be assembled individually, then linked together using a
linker.

The symbol listing for each source file shows the segment allocations.

MAIN.ASM Symbol Table


Symbol Name Type Value
??DATE Text "21-05-89"
??FILENAME Text "MAIN "
??TIME Text "14:20:27"
??VERSION Number 0100
@CODE Text _TEXT
@CODESIZE Text 0
@CPU Text 0101H
@CURSEG Text _TEXT
@DATA Text DGROUP
@DATASIZE Text 0
@FILENAME Text MAIN
@WORDSIZE Text 2
EXIT Near _TEXT:0008
PRINT Near ----:---- Extern
START Near _TEXT:0000
Groups & Segments Bit Size Align Combine Class
DGROUP Group
STACK 16 0100 Para Stack STACK
_DATA 16 0000 Word Public DATA
_TEXT 16 000D Word Public CODE

TASK.ASM Symbol Table


Symbol Name Type Value
??DATE Text "21-05-89"
??FILENAME Text "TASK "
??TIME Text "14:20:14"
??VERSION Number 0100
@CODE Text _TEXT
@CODESIZE Text 0
@CPU Text 0101H
@CURSEG Text _TEXT
@DATA Text DGROUP
@DATASIZE Text 0
@FILENAME Text TASK
@WORDSIZE Text 2
EXIT Near ----:---- Extern
PRINT Near _TEXT:0000
STRING Byte DGROUP:0000
Groups & Segments Bit Size Align Combine Class
DGROUP Group
_DATA 16 0008 Word Public DATA
_TEXT 16 000A Word Public CODE

The map listing form the linker clearly shows how these segments have been
combined.

MAIN.MAP (Output from Linker)


Start Stop Length Name Class
00000H 00017H 00018H _TEXT CODE
00018H 0001FH 00008H _DATA DATA
00020H 0011FH 00100H STACK STACK

Detailed map of segments


0000:0000 000D C=CODE S=_TEXT G=(none) M=MAIN.ASM ACBP=48
0000:000E 000A C=CODE S=_TEXT G=(none) M=TASK.ASM ACBP=48
0001:0008 0000 C=DATA S=_DATA G=DGROUP M=MAIN.ASM
ACBP=48
0001:0008 0008 C=DATA S=_DATA G=DGROUP M=TASK.ASM
ACBP=48
0002:0000 0100 C=STACK S=STACK G=DGROUP M=MAIN.ASM
ACBP=74

Address Publics by Name


0000:0008 EXIT
0000:000E PRINT

Address Publics by Value


0000:0008 EXIT
0000:000E PRINT

Program entry point at 0000:0000


SEGMENT DIRECTIVES
So far, 8088 programs have been implemented using single segments with the directives

.CODE .STACK .DATA

This simplifies writing programs, but has several drawbacks.

 little control over segment placing and combining


 limited to three segments

The use of the segment directives provide the necessary controls for implementing large
multiple segment programs. The programmer can specify which segments should be
overlayed, combined, or stand alone.

Segment over-ride prefixs may be applied to certain instructions.

mov ax, cs:20h

obtains data from the code segment rather than the data segment.

The format for declaring a segment is,

name SEGMENT align combine_type class


name ENDS

Align specifies whether the segment starts at a byte, word or paragraph (10 byte)
boundary. The default is paragraph.

Combine_type specifies whether the segment is PUBLIC, COMMON, MEMORY,


PRIVATE, PUBLIC or STACK.

PUBLIC The linker concatenates all segments with the same


name to form a single contigous segment. The length
is the sum of all the segments.

COMMON The linker locates all segments with the same name
at the same address (overlayed on top of each
other). The length becomes the longest segment.

MEMORY Same as Public


PRIVATE The linker does not combine this segment with any
other segment.

STACK The linker concatenates all segments with the same


name to form a single contiguous segment. The
length is the sum of all the segments. SS is
initialised to the beginning of the segment, SP to
the length of the segment.

Class controls the ordering of the segments at linking time. Segments with the same class
name are loaded together. A segment of class CODE would be loaded before a segment
of class STACK. The class name is enclosed using single or double quotes.

An example program follows.

TITLE Segdemo
stck segment para private 'STACK'
db 200h dup (?)
stck ends

data segment byte public 'DATA'


message db 'Hello there','$'
data ends

data2 segment byte public 'DATA2'


message2 db 'Segment Data2','$'
data2 ends

code segment para private 'CODE'


assume ds:data, ss:stck
start: mov ax, seg data
mov ds, ax
mov ah, 9
mov dx, offset message
int 21h

assume ds:data2
mov ax, seg data2
mov ds, ax
mov ah, 9
mov dx, offset message2
int 21h
mov ax, 4c00h
int 21h
code ends
end start

The map file for segdemo.exe is,

Start Stop Length Name Class


00000H 001FFH 00200H STCK STACK
00200H 0020BH 0000CH DATA DATA
0020CH 00219H 0000EH DATA2 DATA2
00220H 0023CH 0001DH CODE CODE

Detailed map of segments


0000:0000 0200 C=STACK S=STCK G=(none) M=SEGDEMO.ASM
ACBP=60
0020:0000 000C C=DATA S=DATA G=(none) M=SEGDEMO.ASM
ACBP=28
0020:000C 000E C=DATA2 S=DATA2 G=(none) M=SEGDEMO.ASM
ACBP=28
0022:0000 001D C=CODE S=CODE G=(none) M=SEGDEMO.ASM
ACBP=60

This clearly shows the ordering (class) and concatenation of segments which are the same
type.

WHAT ARE TRANSLATORS

Translation is an activity comprising the interpretation of the meaning of a text in one language—
the source text—and the production of a new, equivalent text in another language—called the
target text, or the translation.

Вам также может понравиться