Вы находитесь на странице: 1из 10

Wollega University Assembly Language Programming

Department of CS&IT Chapter – IV

CHAPTER IV
MACHINE LANTER GUAGE :
Elemental language of computers, consisting of a string of 0s and 1s. Because machine
language is the lowest-level computer language and the only language that computers
directly understand, a program written in a more sophisticated language (e.g., C, Pascal)
must be converted to machine language prior to execution. This is done via a compiler or
assembler. The resulting binary file (also called an executable file) can then be executed
by the CPU.

Every computer has its own machine language, which is the only language understood by
the computer. Originally, programs were written in machine language. But now programs
are written in special programming languages, but these programs must be translated in to
the machine language of the computer before the program can be executed. Machine
language instructions are represented by binary numbers i.e., sequence consisting of
zero's and one's.
For e.g:001010001110
could represent a 12-bit machine language instruction. This instruction is divided into
two parts an operation code(or op code) and an operand, e.g:
Op code 001, Operand 010001110

The op code specifies the operation(add,multiply,move.....) and the operand is the address
of the data item that is to be operated on. Besides remembering the dozens of code
numbers for the operations, the programmer also has to keep track of the addresses of all
the data items. Thus programming in machine language is highly complicated and subject
to error. Also, the program is program is machine independent.

Machine code is extremely difficult for humans to read because it consists merely of
patterns of bits (i.e., zeros and ones). Thus, programmers who want to work at the
machine code level instead usually use assembly language, which is a human-readable
notation for the machine language in which the instructions represented by patterns of
zeros and ones are replaced with alphanumeric symbols (called mnemonics) in order to
make it easier to remember and work with them (including reducing the chances of
making errors). In contrast to high-level languages (e.g., C, C++, Java, Perl and Python),
there is a nearly one to one correspondence between a simple assembly language and its
corresponding machine language.

ASSEMBLY LANGUAGE:

Assembly language is essentially the native language of your computer. Technically


the processor of your machine understands machine code (consisting of ones and
zeroes). But in order to write such a machine code program, you first write it in
assembly language and then use an assembler to convert it to machine code.

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV
However nothing is lost when the assembler does its conversion, since assembly
language simply consists of mnemonic codes which are easy to remember (they are
similar to words in the english language), which stand for each of the different
machine code instructions that the machine is capable of executing.

Because it is extremely low level, assembly language can be optimized extremely


well. Therefore assembly language is used where the utmost performance is
required for applications.

Assembly language is also useful for communicating with the machine at a hardware
level. For this reason, it is often used for writing device drivers.

A third benefit of assembly language is the size of the resulting programs. Because
no conversion from a higher level by a compiler is required, the resulting programs
can be exceedingly small. For this reason, assembly language has been a language
of choice for the demo scene.

Typically a modern assembler creates object code by translating assembly instruction


mnemonics into opcodes, and by resolving symbolic names for memory locations and
other entities.[1] The use of symbolic references is a key feature of assemblers, saving
tedious calculations and manual address updates after program modifications. Most
assemblers also include macro facilities for performing textual substitution—e.g., to
generate common short sequences of instructions as inline, instead of called subroutines.

Assemblers are generally simpler to write than compilers for high-level languages, and
have been available since the 1950s. Modern assemblers, especially for RISC
architectures, such as SPARC or POWER, as well as x86 and x86-64, optimize
Instruction scheduling to exploit the CPU pipeline efficiently.

Number of passes

There are two types of assemblers based on how many passes through the source are
needed to produce the executable program.

• One-pass assemblers go through the source code once and assume that all
symbols will be defined before any instruction that references them.
• Two-pass assemblers create a table with all symbols and their values in the first
pass, then use the table in a second pass to generate code. The assembler must at
least be able to determine the length of each instruction on the first pass so that
the addresses of symbols can be calculated.

The advantage of a one-pass assembler is speed, which is not as important as it once was
with advances in computer speed and abilities. The advantage of the two-pass assembler
is that symbols can be defined anywhere in program source code. This lets programs be
defined in more logical and meaningful ways, making two-pass assembler programs
easier to read and maintain.[2]

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV

High-level assemblers

More sophisticated high-level assemblers provide language abstractions such as:

• Advanced control structures


• High-level procedure/function declarations and invocations
• High-level abstract data types, including structures/records, unions, classes, and
sets
• Sophisticated macro processing (although available on ordinary assemblers since
late 1950s for IBM 700 series and since 1960's for IBM/360, amongst other
machines)
• Object-oriented programming features such as classes, objects, abstraction,
polymorphism, and inheritance

A program written in assembly language consists of a series of mnemonic statements and


meta-statements (known variously as directives, pseudo-instructions and pseudo-ops),
comments and data. These are translated by an assembler to a stream of executable
instructions that can be loaded into memory and executed. Assemblers can also be used
to produce blocks of data, from formatted and commented source code, to be used by
other code.

Take for example, the instruction that tells an x86/IA-32 processor to move an immediate
8-bit value into a register. The binary code for this instruction is 10110 followed by a 3-
bit identifier for which register to use. The identifier for the AL register is 000, so the
following machine code loads the AL register with the data 01100001.[4]

10110000 01100001

This binary computer code can be made more human-readable by expressing it in


hexadecimal as follows

B0 61

Here, B0 means 'Move a copy of the following value into AL', and 61 is a hexadecimal
representation of the value 01100001, which is 97 in decimal. Intel assembly language
provides the mnemonic MOV (an abbreviation of move) for instructions such as this, so
the machine code above can be written as follows in assembly language, complete with
an explanatory comment if required, after the semicolon. This is much easier to read and
to remember.

MOV AL, 61h ; Load AL with 97 decimal (61 hex)

At one time, many assembly language mnemonics were three letter abbreviations, such as
JMP for jump, INC for increment, etc. Modern processors have a much larger instruction
set and many mnemonics are now longer, for example FPATAN for "floating point
partial arctangent" and BOUND for "check array index against bounds". Many assembly

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV
language statements consist of an opcode mnemonic followed by a comma-separated list
of data, arguments or parameters.[5]

The same mnemonic MOV refers to a family of related opcodes to do with loading,
copying and moving data, whether these are immediate values, values in registers, or
memory locations pointed to by values in registers. The opcode 10110000 (B0) copies an
8-bit value into the AL register, while 10110001 (B1) moves it into CL and 10110010 (B2)
does so into DL. Assembly language examples for these follow.[4]

MOV AL, 1h ; Load AL with immediate value 1


MOV CL, 2h ; Load CL with immediate value 2
MOV DL, 3h ; Load DL with immediate value 3

Language design

Basic elements

There is a large degree of diversity in the way the authors of assemblers categorize
statements and in the nomenclature that they use. In particular, some describe anything
other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-
op). A typical assembly language consists of 3 types of instruction statements that are
used to define program operations:

• Opcode mnemonics
• Data sections
• Assembly directives

Opcode mnemonics and extended mnemonics

Instructions (statements) in assembly language are generally very simple, unlike those in
high-level language. Generally, a mnemonic is a symbolic name for a single executable
machine language instruction (an opcode), and there is at least one opcode mnemonic
defined for each machine language instruction. Each instruction typically consists of an
operation or opcode plus zero or more operands. Most instructions refer to a single value,
or a pair of values. Operands can be immediate (typically one byte values, coded in the
instruction itself), registers specified in the instruction, implied or the addresses of data
located elsewhere in storage. This is determined by the underlying processor architecture:

Data sections

There are instructions used to define data elements to hold data and variables. They
define the type of data, the length and the alignment of data. These instructions can also
define whether the data is available to outside programs (programs assembled separately)
or only to the program in which the data section is defined. Some assemblers classify
these as pseudo-ops.

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV

Assembly directives

Assembly directives, also called pseudo opcodes, pseudo-operations or pseudo-ops, are


instructions that are executed by an assembler at assembly time, not by a CPU at run
time. They can make the assembly of the program dependent on parameters input by a
programmer, so that one program can be assembled different ways, perhaps for different
applications. They also can be used to manipulate presentation of a program to make it
easier to read and maintain.

Macros

Many assemblers support predefined macros, and others support programmer-defined


(and repeatedly re-definable) macros involving sequences of text lines in which variables
and constants are embedded. This sequence of text lines may include opcodes or
directives. Once a macro has been defined its name may be used in place of a mnemonic.
When the assembler processes such a statement, it replaces the statement with the text
lines associated with that macro, then processes them as if they existed in the source code
file (including, in some assemblers, expansion of any macros existing in the replacement
text).

Since macros can have 'short' names but expand to several or indeed many lines of code,
they can be used to make assembly language programs appear to be far shorter, requiring
fewer lines of source code, as with higher level languages. They can also be used to add
higher levels of structure to assembly programs, optionally introduce embedded
debugging code via parameters and other similar features.

Instruction operands

1. Indirect Memory Operands

• Like direct memory operands, indirect memory operands specify the contents of
a given address.
• However, the processor calculates the address at run time by referring to the
contents of registers.
• Since values in the registers can change at run time, indirect memory operands
provide dynamic access to memory.
• Indirect memory operands make possible run-time operations such as pointer
indirection and dynamic indexing of array elements, including indexing of
multidimensional arrays.
• For example, the following instruction moves into AX the word value found at the
address in DS:BX.
• mov ax, WORD PTR [bx]

where

o WORD specifies the data size

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV
o PTR re-casts memory location pointed by [BX] into the WORD-sized
value.
• When you specify more than one register, the processor adds the contents of the
two addresses together to determine the effective address (the address of the data
to operate on):
• mov ax, [bx+si]

2. Address Displacements

• Address displacement is a constant value added to the effective address.


• A direct memory specifier is the most common type of displacement:
• table WORD 100 DUP (0)
• .
• .
• .
• mov ax, table[ esi ]
• In relocatable expression table[ esi ] , the displacement is table, providing the
base address of an array
• ESI holds an index to an array element. The ESI value is calculated at run time,
often in a loop.

3. Multiple Address Displacements

• Each displacement can be an address or numeric constant.


• If there is more than one displacement, the assembler totals them at assembly time
and encodes the total displacement.
• For example, in the statement
• table WORD 100 DUP (0)
• .
• .
• .
• mov ax, table[bx][di]+6

both table and 6 are displacements.

4. Specifying Operand Size

• You must give the size of an indirect memory operand in one of three ways:
o By the variable's declared size
o With the PTR operator (which acts similar to C-style cast)
o Implied by the size of the other operand.
• The following lines illustrate all three methods, assuming the size of the table
array is WORD, as declared earlier.
• table WORD 100 DUP (0)
• .
• .

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV
• .
• mov table[bx], 0 ; 2 bytes - from size of table
• mov BYTE PTR table, 0 ; 1 byte - specified by BYTE
• mov ax, [bx] ; 2 bytes - implied by AX

5. Indirect Syntax Options

• The assembler allows a variety of syntaxes for indirect memory operands.


• However, all registers must be inside brackets.
• Each register can be enclosed in its own pair of brackets, or in the same pair of
brackets separated by a plus operator (+).
• The following variations are legal and assemble the same way:
• mov ax, table[bx][di]
• mov ax, table[di][bx]
• mov ax, table[bx+di]
• mov ax, [table+bx+di]
• mov ax, [bx][di]+table
• All of these statements move the value in table indexed by BX+DI into AX.

6. Scaling Factors

• You can use •


scaling to
index into
arrays with
different
sizes of
elements.
• For
example,
the scaling
factor is

1 for byte arrays


(no scaling
needed),

2 for word arrays,

4 for doubleword
arrays,

and 8 for
quadword arrays.

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV

There is no
performance
penalty for using a
scaling factor.

Scaling is
illustrated in the
following
examples:

• mov
eax,
darray[ed
x*4]
; Load
double of
double
array
• mov
eax,
[esi*8]
[edi]
; Load
double of
quad
array
• mov
ax,
wtbl[ecx+
2][edx*2]
; Load
word of
word
array

7. Relation of Base Registers to Memory Segments

• In indirect memory addressing the base register identifies which segment register
will be used to calculate the actual memory location.
• Therefore, we need to understand the rules that define which register is the base
register in indirect memory addressing mode.
• The default segment register is SS if the base register is EBP or ESP.
• However, if EBP is scaled, the processor treats it as an index register with a value
relative to DS, not SS.
• All other base registers are relative to DS.
• If two registers are used, only one can have a scaling factor.
• The register with the scaling factor is defined as the index register.
• The other register is defined as the base register.
• If scaling is not used, the first register is the base.

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV
• If only one register is used, it is considered the base for deciding the default
segment, unless it is scaled.
• The following examples illustrate how to determine the base register:
• mov eax, [edx][ebp*4] ; EDX base (not scaled - seg DS)
• mov eax, [edx*1][ebp] ; EBP base (not scaled - seg SS)
• mov eax, [edx][ebp] ; EDX base (first - seg DS)
• mov eax, [ebp][edx] ; EBP base (first - seg SS)
• mov eax, [ebp] ; EBP base (only - seg SS)
• mov eax, [ebp*2] ; EBP*2 index (seg DS)

8. Addressing Instruction Operands Summary

• Immediate Mode (memory is not accessed) - operand is part of the instruction.


For example, a constant encoded in the instruction:
• mov eax,567
• mov ah, 09h
• mov dx, offset Prompt
• Register Addressing (memory is not accessed) - operand contained in register:
• add ax, bx
• Direct Mode (memory accessed once) - operand field of instruction contains
address of the operand:
• value dword 0
• ..
• add eax, value ; Either notation does the
• add eax, [value] ; same thing
o Assembly code
o tbl DW 20 DUP (0)
o ..
o mov [tbl], 56

is equivalent to C statement

tbl[ 0 ] = 56; // C code

• register indirect addressing (aka indirect addressing mode) often used for
addressing data arrays inside programming loops:
o Effective address of operand contained in a register.
o For 32-bit addressing, all 32-bit registers can be used.
o For 16-bit addressing, the offset value can be in one of the three registers:
BX, SI, or DI:
o mov bx, offset Table ; Load address
o add ax, [bx] ; Register indirect
addressing
o Square brackets [ BX ] indicate that BX is holding a memory offset.
o Operand [ BX ] serves as a pointer to data in memory.

Nooli.V.K.Rao
Wollega University Assembly Language Programming
Department of CS&IT Chapter – IV
o Register indirect can be used to implement arrays. For example, to sum an
array of word-length integers,
o mov cx, size ; set up size of Table
o mov bx, offset Table ; BX <- address of Table
o xor ax, ax ; zero out Sum
o Loop1:
o add ax, [bx]
o inc bx ; each word is 2 bytes long, so
o inc bx ; need to increment BX twice!
o loop Loop1
• Indexing: constant base + register.
o Fixed Base (address) + Variable Register Offset (operand field contains a
constant base)
o Effective address is obtained by adding value of operand field to contents
of register.
o This is known as array type addressing, also called displacement
addressing.
o mov eax, [ ebx + 5 ]
o mov eax, [ ebx + esi + 5 ]
o There are restrictions on the combinations of registers allowed within the
brackets: you can have ESI or EDI, but not both, and you can have EBX
or EBP, but not both.
o The following instructions are equivalent:
o add ax, Table[ bx ]
o add ax, [ Table + bx ]
o add ax, Table + [ bx ]
o add ax, [ bx ] + Table
• Indexing With Scaling: Base + Register Offset * Scaling Factor
o Operand field contains base address.
o Useful for array calculations where size of component is multiple bytes
long.
• Stack Addressing: PUSH and POP, a variant of register indirect with auto-
increment/decrement using the ESP register implicitly.
• Jump relative addressing, EIP + offset
o Operand field contains a displacement.
o Used by near and short jump instructions on the Intel 80x86

Nooli.V.K.Rao

Вам также может понравиться