Вы находитесь на странице: 1из 7

Teaching X86 assembly language programming

with MS visual studio


Dr. S. S. Limaye
Principal, Jhulelal Institute of Technology, Nagpur, India
e-mail: shyam_limaye@hotmail.com ; limayess@jit.org.in
Abstract: In many colleges, 8086 assembly language practicals are still conducted using the
outdated MASM on DOS platform. This paper describes how to use Microsoft Visual Studio for
teaching assembly language programming of Intel 80X86 processor and its derivatives like
Pentium. It also illustrates how to integrate assembly program into a C program.
Keywords : Pentium, 80X86, learning assembly language, Visual studio, Microsoft

1 Introduction
1.1 Why learn assembly language?
The UG syllabus for any electronics or computer science course usually includes X86 assembly
language programming. This is important because it gives the student, a thorough understanding
of the hardware. Moreover, programs written in assembly language are faster to execute and
consume less memory space. Also, you have full control over the machine, which is needed for
writing I/O drivers or boot loaders. Microsoft corp. itself writes critical sections of code in
assembly. In general, the difficulty level of assembly level programming is high. Therefore,
things like GUI (Graphical User Interface) should be better left to the high level languages and
special cases should be dealt with assembly. Writing a console program in assembly is fairly
simple as we shall see shortly.

1.2 Why Visual Studio?


Earlier, the standard packages for X86 programming were MASM and DEBUG under DOS.
Unfortunately Microsoft stopped supporting MASM after 1991 and proper documentation for
advanced versions of processors is not available. However the visual studio can be used for
assembly language development without any change. It is high time that we stop using the
obsolete packages and switch over to modern IDEs.
Today MS Visual Studio is a standard development tool useful for various programming
languages. By using Visual Studio, we can not only re use the skills for other languages, but also
make integration into other languages easier. We are able to call an assembly program from C
and vice versa. Also, the disassembly mode in Visual Studio lets you to see how your C
instructions are converted to assembly. MASM has now been renamed as ML and is shipped
with visual studio. However there are user groups like MASM32 who have kept MASM alive
and have created useful libraries.

2 Changes from DOS based assembler and debugger


MASM was designed for the real mode 8086 having 64K sized segments. We used SEGMENT
and ASSUME directives for defining segments. It does not work in the protected mode in which
windows runs. Here, we need to use MODEL, CODE and DATA directives. The segment
registers are initialized by the OS when the program starts and they should not be disturbed. We
use the FLAT memory model in which all segments overlap and are of size 4 GB. In MASM,
when the program was over, it could be terminated by calling INT 21 Function 0. It does not
work in windows because the system calls are dynamically linked. A simple way to terminate a
program is to use debug break, i.e. INT 3. It is not a clean way but it works. A better way is to
invoke Exit Process call from the kernel32 library though it requires some work. We will study it
in later examples. We need to remember that in FLAT model, all offsets are 32 bits and hence all
base and index registers must be 32 bits, i.e. we should always use EBX, EBP, ESP, ESI, EDI
rather than BX, BP, SP, SI, DI. The string oriented instructions MOVSB, MOVSW, MOVSD
must use ESI, EDI and ECX rather than SI, DI and CX.

3 Assembly Program skeleton


A typical Pentium program has following skeleton.
.686P
; Pentium Pro or later
.MODEL flat, stdcall ;Use windows API calling convention
.STACK 4096
;define a 4K stack
option casemap :none;No Upper case lower case mapping
.DATA
- --<Data declarations>
- --.CODE
start:
-

--<Program>
end
start
---

The first line indicates Pentium Pro instruction set. The other choices are 386, 486 or 586.
The second line uses a flat model, i.e. it uses all near pointers and overlapping segments of 4GB
each. There exist some other models but they are complicated to use, and we will not discuss
them here. The second parameter stdcall indicates that we will use windows API calling
convention. i.e. it pushes the arguments on the stack from right to left and stack popping is done

by the called program. STDCALL is useful even if you are not calling windows API because
otherwise the linker mangles the procedure names( start becomes _start).
The third line reserves a 4K stack.
The fourth line specifies no case mapping, i.e. upper case and lower case symbols are treated
differently. This is necessary to avoid confusion when we are using windows API.
The fifth line starts the data segment. Define your data elements with DB or DW directives here.
The next line marks the beginning of code segment. Enter the program here. It starts with the
start lable. The last line is end and it specifies start as the entry point.

4 Examples
4.1 Example 1: Stand alone assembly program - Addition of two numbers
Start Visual studio. From main menu, select New>Project. In the project dialog box, select
Visual C++ > Win32 in the left pane and Win32 console application in the right pane. Enter
tut1 as project name and press OK. Press the NEXT button on the application wizard window
and in the next screen, click the check box labelled Create empty project. Then click FINISH.
In the explorer pane on the left side, right click on Source files. From pop up menu, select Add>
New item. In the dialog box, click on C++ source file. Enter file name as main.asm. Click the
Add button. Visual Studio does not know how to handle asm files, so, in the solution explorer
pane, right click on the tut1 entry and in the POP up menu, select Custom build rules. In the
dialog box, assert the checkbox for Microsoft Macro Assembler and press OK. Enter following
program in the editor pane.
.686P
; Pentium Pro or later
.MODEL flat, stdcall
;Use windows API calling convention
.STACK 4096
;define a 4K stack
option casemap :none;No Upper case lower case mapping
.data
n1
dw
5
n2
dw
6
n3
dw
?
.code
start:
mov
ax,n1
add
ax,n2
mov
n3,ax
int
3
end
start

Click the build tool button. If there are errors, then correct them and build again till errors are
removed. Place cursor on the line next to start label. Right click and in pop up menu, select
Breakpoint > Insert Breakpoint. Note that we cannot set the break point on the start line. Press
the run button. The program will halt at the break point. Invoke registers window, memory
window and disassembly window through menu DEBUG>Windows. You can step through the
program with the step button and watch how registers and memory change.

4.2 Example 2: Search a substring in a string


The problem is to find the offset of the key string Computer in the given message and place it
in the EBX register. If the key is absent in the message, then EBX should return 0FFFFFFFFH.
The program uses CMPS instruction to compare the key string with various substrings of the
message starting at an offset specified by the EBX register. If the key is not found till the offset
reaches 12, then the program terminates with 0FFFFFFFFH in the EBX register.
Enter the following program.
.686P
; Pentium Pro or later
.MODEL flat, stdcall
;Use windows API calling convention
.STACK 4096
;define a 4K stack
option casemap :none;No Upper case lower case mapping
;Program to find where "Computer" is located in message
.data
message
db
"Trash","Computer","Garbage"
key
db
"Computer"
.code
start:
mov
ebx,0
;Initially, offset =0
cmpr:
mov
esi,offset message;esi at start of message
add
esi,ebx
;ebx holds offset from start
mov
edi,offset key
;edi points to search key
mov
ecx,8
;count of chars in key
repe cmpsb
; compare 8 chars. Z is set if match
found
jz
found
;match found
inc
ebx
cmp
ebx,12
js
cmpr
;Not yet reached end of message
mov
ebx,0FFFFFFFFH
;key not found in message
found:
int
3
end
start

Let us now learn how to create listing and map files. In the solution explorer, right click on the
tut2 entry and select properties. The property pages dialog pops up. In its left pane, click on the +
sign to the left of Microsoft Macro Assembler entry to expand the branch. Select listing File. In
the right pane, click on the Assembled code Listing file and enter file name as $(InputName).lst.
In the left pane, expand the Linker entry by clicking the + sign and select debugging. In the right
pane, click on Generate map file option and change to yes. Click on map file name and enter it as
$(InputName).map.
Build and debug the project as before.

4.3 Example 3 Separate out even and odd numbers


20 numbers are stored at array called buffer. Copy all even numbers to array even and all
odd numbers to array odd. The program is self explainatory and it is given below. Note that we
have used a segment directive DATA? To define uninitialized data arrays odd and even. We
could have included them in the DATA segment but it would have forced the loader to initialize
them to 0, making the program inefficient.

.MODEL flat, stdcall


;Use windows API calling convention
.STACK 4096
;define a 4K stack
option casemap :none;No Upper case lower case mapping
;Program to sort odd and even numbers
ExitProcess PROTO :DWORD
.data
buffer
db
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
.data?
evn db 20 dup(?) ;even array will be stored here
odd db 20 dup(?) ;odd array will be stored here
.code
start:
mov
esi,offset evn
; esi at start of even
mov
edi,offset odd
;edi points to odd array
mov
ebx,offset buffer ;bx points to input array
mov
ecx,20
;count of chars in buffer
L1:
mov
al,[ebx]
;Get a byte from buffer
ror
al,1
;Get a0 in Cy
jc
got_odd
;Cy is set for odd numbers
rol
al,1
;restore al
mov
[esi],al
;Store in odd array
inc
bx
;increment source pointer
inc
esi
;increment even destination pointer
jmp
loop_end
;
got_odd:
rol
al,1
;restore al
mov
[edi],al
;Store in odd array
inc
bx
inc
edi
loop_end:
loop L1
;Decrement ecx and go back
over:
invoke ExitProcess, 0
end
start

4.4 Example 4: Using system calls.


In the above program, replace
int 3 with
ExitProcess , 0
ExitProcess is an external system call. We need to define it using PROTO directive similar to the
declaration of a function in C. Add following line before data segment declaration.
ExitProcess PROTO :DWORD

The linker binds it to the appropriate function in kernel32.dll. The program now gracefully exits
with following message.
The program '[0x808] tut3.exe: Native' has exited with code 0 (0x0).

To read and write on the console, we will use functions StdIn and StdOut. But before using them,
we first need to do some work. Create a directory C:\MASM 32 and download MASM32
package into it from www.masm32.com. This is a huge package. If you are not bothered about
the disk space, then leave the entire package there but we need only two files from it - C:\MASM

32\lib\masm32.lib and C:\MASM 32\include\masm32.inc. Give following commmands before


the DATA declaration.
includelib C:\masm32\lib\masm32.lib
include C:\masm32\include\masm32.inc

The masm32.inc file contains the necessary PROTO directives for StdIn and StdOut. The
includelib command causes the linker to search the masm32.lib for the object modules of StdIn
and StdOut.
The console message output can be achieved by following command.
invoke StdOut,ADDR message; Where message is address of buffer (0 terminated)

The console message input can be achieved by following command.


invoke StdIn,ADDR buffer,100 ;Read upto 100 characters and store at buffer

Following code illustrates the usage.


.data
prompt
db
"Enter something and press ENTER",13,10,0
Found
db
"Key found",0
.data?
buffer db 100 dup(?)
.code
start:
invoke StdOut,ADDR prompt
;Send messsage to console
invoke StdIn,ADDR buffer,100 ;Read upto 100 chars from keyboard and
;store in buffer

How about invoking a windows style messagebox? It is quite simple using the MessageBoxA
function in the user32.lib. Since this library is already in the search path of visual studio linker,
we dont need to write a includelib statement. Define a proto like this.
MessageBoxA PROTO :DWORD,:DWORD,:DWORD,:DWORD

You can invoke it with the command


invoke MessageBoxA, 0, addr message, addr Found, 0

The message string is printed in the body and the Found string is the title.

4.5 Example 5 Calling assembly program from C


We will write a small program asm_add for adding two integers. The program will be called
from a C program listed below.
extern int add(int x, int y);
main()
{
int a=5,b=3,c;
c=asm_add(a,b);
}

The assembly program listing is given below.


.586
; Use instructions for Pentium class machines
.MODEL FLAT, C
;Use the flat memory model. Use C calling conventions
.STACK
;Define a stack segment of 1KB (Not req for this example)
.DATA
;Create a near data segment.(Not required for this example)
.CODE
;Indicates the start of a code segment.
asm_add PROC
;Define procedure
push ebp
;Save base pointer
mov ebp,esp
;Copy ESP to EBP
mov eax,[ebp+8]
;Get a
add eax,[ebp+12]
;Add b

pop ebp
ret
asm_add ENDP
END

;Restore base pointer


;return

Conclusion
A methodolgy for teaching 80X86 assembler using Microsoft Visual Studio has been presented.
Integration of assembly module in a C program is also explained. This will make the students
more upto date with latest technology.

Вам также может понравиться