Вы находитесь на странице: 1из 7

Reader's Note

The entire scope of this book is restricted to those computer systems


which are commonly referred to as "PC-Compatible" or "DOS-compatible".
Transferable skills may be acquired by the reader, but practical application
of the program examples in this book is intended solely for machines running
the MS-DOS or PC-DOS operating systems and which are processor-compatible
with the 80x86 family of microprocessors.



INTRODUCTION



The Computer Program - The idea defined
"PC-compatible" Systems - The underlying system resources
The Idea of Programming - Data and Instruction Sequences
Structural Algorithmics - Computational Time and Space

THE HIGHBROW'S GUIDE TO WHAT'S THERE
Program types: Batch, Source, and Executable DOS Command Processor
Text and Binary Files BIOS (Basic Input/Output System) DOS (Disk
Operating System) "DOS commands" vs. DOS/BIOS Resources Requesting
System Services Logical vs. Physical Devices Hardware, Peripheral
Components, Device Registers, and I/O Ports Interrupts-- Hardware and
Software --and the BIOS Subroutines The Meaning of Compatibility
Atomistic Binary Elements Simple PASCAL Data Types Complex Data
Structures Memory as a Linear Array of Storage Compartments Labels
and Locations Type Identifiers Structure vs. Instruction
Procedures, Algorithms, and Structure Diagrams Subdivisions of Programs
and Segmentation of Memory

A "computer program" can be a "batch program", an "executable program", or a
"source program". Of these three types of programs, only the executable program
exists as machine language instructions which the computer's microprocessor can
execute. Both the batch program and source program exist as sequences of
statements in languages which are comprehensible to human beings but which
require translation into machine language in order to be executed.
The batch program is stored on disk in a text file, and when it is invoked by
entering the name of the text file on the DOS command line, the text file is
read by the DOS command processor program, "COMMAND.COM", which processes one
line of text from the text file at a time, beginning with the first line in the
text file. Such text files are often called "batch files". Each line of a
batch file may invoke an executable program or a batch program, or it may
instruct COMMAND.COM to perform one of the "internal" DOS commands which are
"built into" COMMAND.COM; but in any event, a batch program requires a command
processor to execute its instructions.
The source program is nearly always stored in a disk text file, but it is
never executed directly, although in certain languages such as BASIC it may be
interpreted by the BASIC interpreter at the time the machine language
translation of its commands is executed by the microprocessor. Most languages,
such as "C" or "PASCAL", are not interpreted each time the program is invoked
but are instead translated into machine language by a program which "compiles"
all of the source program statements in the text file, resulting in an
executable program which can be stored on disk in a binary file. When the
executable program is then invoked by entering the name of the binary file on
the DOS command line, the DOS command processor, COMMAND.COM, then reads the
binary file from the disk into the "transient program area" of the computer's
electronic memory and then relinquishes control of the system to this binary
image of the program in memory. It is thus necessary for the executable
program, when it has finished its task, to return control of the system to the
command processor which loaded it into memory, in order that further DOS
commands or other programs may be invoked. Both tasks- turning over control
to a program and assuming control once the program has finished running -are
performed by COMMAND.COM.
The characters in batch or source programs are upper- or lower- case
alphabetic characters, digits, or sometimes, special symbols, depending on the
nature of the source language. For executable binary files, however, the
contents of the file consist largely of unprintable characters, which are not,
in any event, intended to be readable by the human eye. These are the actual
binary instructions which are fed to the microprocessor as the program is
executing.
Executable programs which are loaded by the DOS command processor may make use
of the resources provided by the operating system through two primary avenues.
The first, and most 'elemental' of these, is the Basic Input/Output System,
known as the BIOS. The second, which is built upon the foundation provided by
the BIOS, is the Disk Operating System, known as DOS. Although it is common
practice to refer to the resources provided by COMMAND.COM as if those were the
resources provided by DOS, to do so is not technically correct. In fact, the
set of program names and internal COMMAND.COM commands which are collectively
referred to as "DOS commands" by all of us are quite different from the system
resources known as DOS and the BIOS.
The BIOS and DOS system resources have no "command line" as such; they are
made available to executable programs through a special list of memory addresses
which is known as the interrupt vector table. Executable programs make use of
these resources by invoking the BIOS and DOS sub-programs located at those
addresses; such invocations take the form of requests for system services which
are satisfied by the BIOS and DOS subprograms installed in memory during the
system "boot" process, when IBMBIO.COM (or IO.SYS) and IBMDOS.COM (or MSDOS.SYS)
are read from the boot disk.
The major criterion which distinguishes the BIOS and DOS is that DOS is used
to access logical devices, but the BIOS is used to interface such logical
devices with actual physical devices. A logical device may be a disk volume
which is assigned a drive name such as "A:", "B:", "C:", etc. But the physical
device to which information is written or from which it is read may actually not
be a disk drive, but could be a video screen or even an area in electronic
memory. A physical device is accessed by loading commands into special areas
set aside by the device as "registers".
These registers are assigned addresses called "I/O port numbers", and are
accessed through special IN and OUT instructions, because these memory locations
are external to the computer's main memory space. This distinction between
system memory and I/O port address space is a consequence of the physical
construction of the machine, which must electronically switch its address signal
circuits between memory and port access, and cannot read or write to external
devices in the same way it can read or write system memory locations.
Physically, such devices as video controller cards, modem cards, sound or
multimedia interface cards, disk drive controller cards, etc., which plug into
slots on the system backplane "under the hood" of the computer, are external
devices which must be assigned port addresses. When we speak of information
storage "in" the computer, we usually distinguish between volatile storage in
memory and the relatively permanent (if need be) storage in the magnetic
material of a disk or diskette, and when we do make this distinction it is
usually important that we do not confuse memory and disk storage. From a
programmer's standpoint, it may even be necessary at times to investigate the
details of the register layout of a particular external device in order to learn
how to direct the various parts of the system to communicate with it through its
port addresses.
Our present task is much less involved with the intricacies of computer
hardware, and although it is not as complicated by detail it is certainly more
ramified by scope. We generally leave the BIOS subprograms to the task of
handling device control and instead simply identify a device by a drive letter
(to DOS) or a drive number (to the BIOS), and we only care that data is written
or read and not in particular just how this is done.
The DOS and BIOS subprograms are collectively referred to as the interrupt
subsystem because they interrupt a program (sometimes at the request of that
program), taking control of the system temporarily whenever they are required to
perform a service, and the INT instruction is generally the trigger which sets
them into action, although certain operations performed at regular intervals,
such as updating of the system clock, are initiated through "hardware
interrupts" which are actually electronic signals sent from an interrupt
controller circuit to the microprocessor along special interrupt request signal
lines. Signals from the keyboard are also tied to hardware interrupts, but once
the BIOS processes these signals, interpreting them as special numeric key codes
and storing these codes in the BIOS Data Area, programs then generate "software
interrupts" by means of the INT instruction in order to cause the BIOS to
retrieve these codes and pass them on to the program by storing them in the
microprocessor's accumulator register where they may be immediately used.
Our access to the data from many such underlying physical or logical
operations is thus simplified by BIOS or DOS placing the information in
locations where we can readily make use of it, either in the microprocessor or
in special areas of memory that we set aside for the purpose. And, when we want
to send information to a disk, for instance, we inform the BIOS (or DOS,
depending on the surface we require) how much information we have ready to be
sent and where it is located, by placing the size and location of the data area
in the microprocessor's registers, identifying the particular system service we
are requesting by placing its function number in the accumulator register, and
then, without further ado, issuing the INT instruction to the microprocessor to
cause it to turn over control of the system to the subprogram from which we
requested the service. Most of the time, we do not need to deal with the
interrupt subsystem in even this level of detail because the PASCAL statements
Read() and Write() can handle all of the details of reading or writing from or
to files or devices, but our understanding of the computer is vastly enhanced by
examining the many services available through the interrupt subsystem.
There are many programming reference books which list the various BIOS and
DOS subprograms which are accessed through the interrupt subsystem. These
give a description of the functions which are performed by each subprogram
as well as the format of command blocks which must be prepared by the calling
program when certain subprograms are called, and they describe values which
must be loaded into registers beforehand as well as values which may be
returned in the microprocessor registers (such as error codes) when the
subprogram terminates.
This book contains a number of examples of DOS or BIOS subroutine calls,
but the focus here is not specifically on the system recources, so no list
is given, although it is recommended that the reader attempt to become
familiar with these resources through the use of a DOS/BIOS programmers'
reference containing such a listing.
PASCAL language statements provide a fairly complete set of instructions
for accomplishing most tasks, and recourse to direct DOS or BIOS calls is
rarely necessary, although at times it may lead to a more compact executable
program. This is usually the motivation for which the author originally
included such subroutine calls in the programs in this book.
It should be noted that the term "PC-compatible" refers to the constraints
which a system must conform to if its interrupt subsystem is to provide the
same resources that a true "PC" computer does. If a system is "BIOS compat-
ible" then it is almost certainly "DOS compatible" as well. What this
concept of compatibility means is that the format of procedure calls as well
as the details of the function performed must conform to the standards which
have been established, firstly by the IBM company, and then, cooperatively,
by the computer industry which has developed based on the IBM-PC platform.
We may characterize an executable program as a series of instructions which
perform various operations on data. But, exactly what is this data, and how
is it stored and manipulated in the computer?
The most elemental, 'atomistic', units of information, and the manipulations
that are performed on them by native CPU instructions at the assembly language
level are as follows:

Atomistic Binary Elements Elemental Operations

1) Byte 1. Selection
2) Word (2 bytes) 2. Comparison
3) Double-word (4 bytes) 3. Change
A) Organizational
1) Movement
2) Concatenation
3) Permutation
B) Ordinal
1) Logical
2) Arithmetical
3) Positional



These elemental operations are also performed through high-level languages
(BASIC, C, FORTRAN, PASCAL, etc.) on a set of elemental data types that are
atomistic in the context of the particular language. In Turbo PASCAL, these
simple data types are:

Identifier Size Range of values

Ordinal BYTE 1 0..255
SHORTINT 1 -128..127
WORD 2 0..65535
INTEGER 2 -32768..32767
LONGINT 4 -2147483648..2147483647
COMP 8 (-(2^63)+1)..((2^63)-1)
BOOLEAN 1 FALSE,TRUE
CHAR 1 #0..#255

Real REAL 6 2.9E-39 .. 1.7E38
SINGLE 4 1.5E-45 .. 3.4E38
DOUBLE 8 5.0E-324 .. 1.7E308
EXTENDED 10 1.9E-4951 .. 1.1E4932


There are a few more standard PASCAL data types which have an extended
structure. These are the string, enumerated, set, array, and record types,
which will be introduced later on.



Large-scale ordering of the elemental data types is achieved by the creation
of complex data structures. The study of such structures is one of the
primary interests of computer science, and a great deal of intensive labor has
been dedicated to the study of efficient and reliable methods of creating and
transforming them. The most well-known of these complex structures and the
common operations performed on them are:

Some Complex Data Structures Common Operations on them

1. Lists 1. Typecasting
A) Linear 2. Creation
B) Circular 3. Destruction
2. Trees 4. Rearrangement (sorting)
A) Binary 5. Traversal (searching)
B) Multi-way 6. Assimilation (grafting)
3. Queues 7. Deletion (pruning)
A) First-In-First-Out
B) Last-In-First-Out
4. Deques
5. Stacks

If you envision the memory of the computer as a long line of tiny trays,
each of which is large enough to hold one word of data, you can imagine that
it is a rather complicated affair to make up these trays into anything
resembling a tree structure or some other complex structure. These trays
being indistinguishable except by their position in the line of trays, how
is it possible to make use of them? By virtue of two simple facts:
1) The contents of each tray may be altered or moved elsewhere
2) Each tray has a unique address corresponding to its position
in the line of trays.
In programming languages, trays or groups of trays can be referred to by
means of 'labels', which are used as names that are often synonymous with the
contents of these memory locations. In fact it is accepted practice to refer
to the labels which serve to locate items as POINTERS, and to refer to the
labels which are used in performing operations on the contents of these
locations as VARIABLES (although certain types of variables may be used as
pointers to memory locations). This concept is discussed in detail later on.
When we speak of adding variable X to variable Y, what we really refer to in
an elemental sense is adding the contents of the memory area described and
located by the label 'X' to the contents of the memory area described and
located by the label 'Y'. In high-level languages like PASCAL we need not
(although we may choose to) be concerned with the details of just where in
the line a tray is located or of exactly how the contents of different trays
get added (or multiplied, subtracted, divided, etc.) or assigned (moved) to
other trays. We are able to be aloof regarding many of the details primarily
because of the existence in PASCAL of 'type identifiers' which provide easy
access to groups of trays, individual trays, or the two halves of a tray,
by assigning to each data type a corresponding size (number of trays) and the
allowable operations which may be performed on a memory area referred to by a
variable of that type.
From the foregoing, you should be able to see that it is by the use of
pointers that various memory areas can be linked together, as long as we keep
a record somewhere of our pointers, to form arbitrarily complex trees or lists
of various kinds. It is usually most practical to save the pointers to any
previous or next item in a list where they are most quickly found- and this
usually happens to be right next to the item's data, in adjacent trays. This
is accomplished with encouraging simplicity through the use of the RECORD
structured type, as we will see later.
Just as large, complex data structures can be made up out of trays in this
long line, the elemental operations on atomistic binary elements (trays) can
be strung together in a sequence to make arbitrarily complex programs. Here,
the use of labels is once again the key- but here those labels are procedure
names. These labels are synonyms for tray locations where sequences of
instructions begin, and we usually try to keep things simple by trusting the
PASCAL compiler program to make sure it keeps track of our procedural pointers
along with making certain that each separate sequence of instructions occupies
a separate group of trays and does not overlap with any other sequence.
Just as we call a particular data structure a 'list' because of its form, we
call a particular sequence of instructions a 'procedure' because of its form.
A PASCAL procedure has one entry point (the first instruction of the sequence),
and one exit point (the last instruction of the sequence), but it may have a
large number of possible paths through the sequence because of the likelihood
that conditional branching instructions are present. If A and B are two
sequences of instructions, then our purpose might require a conditional
statement like "IF m=n THEN A ELSE B;", or perhaps a conditional loop such as
"WHILE m=n DO B;".
The need for a particular sequence of instructions is determined by our
ultimate purpose for writing the program, and our design for the program will
ultimately involve two distinct but inseparable concepts: A choice of data
structure and a choice of algorithm. For each of the well-known data
structures already listed, there are various well-known methods (algorithms)
for creating, rearranging, traversing them, etc. Much study has been made of
ways of compactly storing trees or lists in memory and of methods for searching
through them that are both simple and quick. These algorithms for the optimal
use of data structures have received as much attention- perhaps more, in
recent years -as the algorithms for computational procedures, and a good deal
of theory has been expounded in an effort to provide a solid basis for the
mathematical analysis of their various dimensions.
An algorithm is like a navigational tool which we can use to avoid getting
lost in the deep waters of computing while also avoiding known hazard in
well-charted areas. With reference to computation, extensive work has been
done in methods of analysis of error and the precision of machine arithmetic
related to systems of equations or approximation of transcendental functions,
and a wealth of methods exist for numerical integration, differentiation,
interpolation or extrapolation, curve fitting and data modeling, and
optimization (maximization or minimization) of the parameters of a system.
An algorithm has a kind of structure that usually can be graphically
illustrated in a flowchart or structure diagram. This is its 'computational
structure'. A simple algorithm for computing the product (factorial) of the
integers from 1 to a chosen number N is shown below.
Start Exit


Initialize: / \ Output T
t <- 1; WHILE i N DO
i <- 1; \ /
Input N



t <- t * i i <- i + 1

In this diagram there is an entry point, an initialization block, a conditional
loop which causes two computation blocks to be repeatedly executed until the
condition is no longer met, a finalization block in which the result is output,
and an exit point.
The essential point here is that a method or algorithm needs to be transform-
ed into a series of programming language statements in order to be in a form
that is useful as far as the computer is concerned, and a structure diagram is
an important intermediate form of expression whereby our idea about what is to
be done is metamorphosed into what can be done by computer methods.
The PASCAL language allows use to use labels- strings of letters and/or
digits -that are quite long, and they may be upper- or lower- case. You have
more than 150 quintodecillion possible labels of 32 or less characters which
are distinguished as unique by the Turbo PASCAL compiler, of which only a few
dozen are reserved for use as PASCAL identifiers. This implies that, the
greater your mastery of language and creativity, the wider your horizons as
a programmer.
A great part of the battle in setting forth your ideas on paper is finding
the most clear and concise expression of them in words. The structure diagram
is thus a bridge and a mnemonic aid which guides us to encapsulate our ideas
of order as ideas of process and transformation. It is thus a meeting point
between a mind that can juxtapose and synthesize and a machine that can only
transmute. Once a structure diagram is drawn, the problem is essentially
solved, provided the diagram is correct. (There exist a multitude of software
engineering tools for performing the more-or-less mechanical task of
transforming such diagrams into working computer programs.)
What we intend to do here is to become familiar with the myriad details of
PC-compatible machines and of how assembly-language and PASCAL can be used
together to produce smaller, faster, and more flexible programs than would
ordinarily be produced by a compiler or an automatic code generator alone.
And, although an idea or algorithm is always the guiding principle in program
design, we first must examine the most fundamental and elemental features of
the microprocessing unit so that we understand fully the complexity we are
ultimately faced with and the means for mitigating to some extent the effect
which that complexity has on our thinking.
Programs are not only subdivided into procedures or functions occupying
separate locations in memory, but are also divided according to a memory
image map that keeps the data used by the program in its own segment apart
from the instruction sequences kept in one or more code segments. Each program
also has a stack segment which keeps track of the subroutine exit destinations
as the program is running and which also provides space for workspace variables
during subroutine calls. These concepts of memory segmentation and of the
usage of the data and stack segments are critically important in assembly-
language programming, and will be discussed in lurid detail further on.

Вам также может понравиться